intro

This website is built upon a framework I wrote to ease devlopment and maintenance of a large site. The hardest part for me in maintaining a website is dealing with the formatting code necessary to keep pages consistent throughout the site. Furthermore, changing the formatting of the elements on the site was out of the question due to the scope of the task. I had a desire to abstract the HTML code from the content of my site, and chose PHP to implement the framework. Those of you who are versed only in conventional HTML, and not in dynamic, scripted sites, just hold on. I'll make things clear in time.

My first attempt to revise my site was a disappointment. It eased some things slightly, but the overall formatting of the pages was still intertwined with the content of the site. I failed to implement the framework the way I had hoped, mostly because I was unable to find sufficient documentation on how others had done it, and because my knowledge was based on conventional web development, much of which had to be thrown out the window.

After reading an article on PHPbuilder.org, it all came together. I realized how I could finally execute a single PHP script for every URL on the site, and still have all URLs readable as standard URLs (no HTTP GET queries). The method involved just one line in my Apache configuration file, and a specialized script that would interpret the URI given by the client and use it to determine which content to serve. Using this method, the content does not need to be formatted, because all the formatting is done by one script (actually, a single script with auxillary include files). This may sound complex at first, but its actually no more difficult to create than a typical site layout, and saves a great deal of time down the road.

background

Traditional HTML

In conventional web design, every page you see has its own HTML code contained in a file on the server. All the page formatting, image positioning, and global elements (navigation bars, etc), are duplicated in every HTML file in a site. To maintain consistency, care must be taken that elements on every page use the exact same formatting code. If a section is to be added in the navigation bar, or if the colors are to be altered on an element, every element on the site must be manually updated. As sites get larger, this gets more and more cumbersome, and eventually, more time is spent on formatting than content.

For this reason, many web developers have tried to separate the formatting of a site from the content of a site. This is done through use of a scripting language that dynamically translates the content from its native state to the associated HTML code. This allows content to be maintained in a native, easy-to-maintain state, where the HTML formatting is handled by a set of scripts.

Dynamic Formatting

This method has a whole slew of advantages. First and foremost, the content can be stored in any format that is convenient to the developer. Second, formatting hardly needs consideration when writing content, because formatting is handled by scripts. Third, because the formatting is generated dynamically, it is easily changed, and multiple formats are possible by simply changing the formatting script.

The data for a modern webpage may be stored in any format, from text files to SQL databases. The content file type is chosen so that it is easily changed and updated. This allows the page to be maintained by those without explicit knowledge of HTML, because the content can be created and maintained in any format that is convenient. Even with knowledge of HTML, the burden of formatting multiple pages consistently can be alleviated. I have chosen to store mine in XML, because it yields clean code, yet translates easily into HTML.

Because all the formatting is done by scripts, the formatting does not need to be considered when writing content. The developer plugs all the data into the appropriate places, and the scripts handle styling of text, positioning of objects, and sometimes even complex tasks such as thumbnailing images or creating directory lists. Data formatting can assume any level of complexity, from simply enclosing text in a set of tags to database manipulation and advanced formatting. But to the one creating content, all that needs to be known is that if the content and parameters are in the right place, the scripts will handle making it all look as desired.

Dynamic formatting allows the formatting of any type of page element to be changed through a single script definition. For example, if I were to change every Polaroidª style photo on my site to a gaudy flashing pink frame, it would only require changing the formatting of a single script file. I could even have multiple script files to generate content in different styles, for example a text only page or a frames version. Dynamic formatting does not limit itself to generation of web pages, however. It can be used to generate PDFs, print documents, and WML (for mobile devices), among other things, all from the same content. Therefore, whatever medium you choose to publish to, dynamic formatting can make your content work with it, without maintaining multiple copies.

my-approach

This webpage has all of its content in XML. The XML is converted to an object tree by a scripting library I wrote for this purpose. This object tree's html_render() method is then called to render elements in HTML by the root level HTML script. Every aspect of the system is abstracted and compartmentalized, making it straightforward to adjust any part of the site, local or global.

XML Content

XML code looks a lot like HTML code, only the tags are not rigorously defined. By this, I mean that you make up your own tag names, and the associated meanings for them. I chose XML because it translates easily to just about any other format. Furthermore, PHP has an XML parsing library built in, so a great deal of work can be saved. I have made up various tags that represent the different formatting elements of my site. Each tag generates HTML code that is sent to the browser. Sometimes the translation is a simple one, and other times it is much more complex.

Take a look at the XML code for the front page and you'll see what I mean. If you compare the XML code to the HTML code, you will find that they closely follow each other. However, the HTML code has been dressed up with a great deal of formatting, which makes the HTML code many times longer and more complex. By developing content in XML, I avoid having to deal with the complexity of formatting on a regular basis. Furthermore, when I want to change the formatting of the site, I don't have to change the XML files, I simply change the translation script.

Object Tree

The object oriented code of my website is both the most exciting to me and the least apparent to the end user. The code builds a tree of objects from an XML source file, and this tree can then be sent messages to perform the various formatting and rendering routines. The code is very well compartmentalized, and XML tag behaviors are easily added and modified. It is a model of simplicity and adaptability.

The majority of the complexity is in the generic base class used for the root object. This class has a few generic methods defined, only one of which is normally overloaded, the one that renders the content of the element. The other methods handle start and end tags, and add data to the tree. Each XML tag type has its own class which is inherited from the generic base class, and the objects are allocated dynamically based on the tags encountered when parsing the file. Fortunately, except for rendering its contents, every inherited class shares the same functionality. Therefore, when I want to add a new class, I only have to define how the class renders itself.

At startup, the XML parsing function library is sourced, which in turn sources a directory containing the code for the various classes. These files, when sourced, define classes and build an associative array that links each XML tag type to its associated class. This associative array is consulted when a new object is created, so that the object is an instance of the correct class. To complete the object model, I wrote handlers for PHP's built in XML parsing functions. These include functions that are called when an opening tag is encountered, when a closing tag is encountered, and when the text within an element is encountered. These functions are called as the parser scans the file sequentially from top to bottom, and call the object methods necessary to create an object tree from the XML file. When its all done, the object tree looks very much like the XML tree, only in built-in PHP data structures that facilitate data retrieval.

The classes that define the objects in the tree only differ in a few ways. All classes have the same data storage variables; the text within the tag, the type of the tag, the arguments passed to the tag, and the objects contained with the tag. All of the classes have a few common functions; one to create a new tag, one to close an existing tag, and one to add data to the current tag. The classes only differ from each other in one way, the rendering functions. This is why it is so easy to add new XML tag behaviors and modify existing ones. Because the majority of the code for each class is inherited from the parent class, only the rendering function needs to be defined for each class. Furthermore, all the rendering code is just an extension of the base class's rendering code, further reducing the effort necessary to create new classes.

The Process

Now, to explain the steps required to bring the webpage to the client. There are three discrete stages, the Apache server request, the PHP parsing of the XML file, and the content formatting for the client.

The process begins with some Apache tricks used to get the server to properly execute the script with the arguments necessary to retrieve the appropriate content. When it gets a URI, Apache will stop walking down directories once it reaches a PHP script file. For example, if you put a script called "index.php" at the root level or your server, then every URI beginning with /index.php (ie. http://www.yourserver.com/index.php/lots/of/extra/directories/) would call your script, regardless of how many sub-directories were specified after index.xml in the URI. Using this fact and an Apache ForceType directive in my configuration file, I am able to force the PHP script file "html" at the root level of my server to be called for every URI beginning with /html. The script then reads the REQUEST_URI server variable to determine what content to serve. This makes every URL look normal to users and search engines, without any HTTP GET queries.

Once the script is called, it determines the path to the XML file that corresponds to the URI by replacing /html with /media and adding /index.xml to the end. It then loads the XML parsing library I wrote, and hands the XML file off to the main parsing routine. The parsing routine creates one root object, and initializes a counter indicating the current depth of the tree. The file is then parsed sequentially from top to bottom. When an opening tag is encountered, the counter is incremented, a new object of the appropriate class type is created, and it is placed at the last position at the counter's current depth. When data is encountered, it is placed at the last position at the counter's current depth. And when a closing tag is encountered, the counter is decremented. The beauty of this system is that all the tree creation is handled by three recursive routines.

Once the object tree has been created, the root object's html_render() method is called. This renders all the content of an element, passing the html_render() method on to the sub-objects as necessary. Each sub-object will do the necessary code to render itself in HTML, and pass the html_render() method to all of its sub-objects. This continues until every object has been rendered.

That's it. With the exception of the code for the individual tags, the entire script is only a few pages long. The code for each tag has its own file, and each one ranges from 10 lines long to a few pages. I like this system because there is almost no redundant code, and it is all very neatly organized. It also allows me to make global site changes without fear, something I have a habit of doing.

outro

I hope this article was inspiring, yet not too confusing. I have found that this framework makes maintaining my web site much less tedious, and allows me to add much more content than I ever have in the past. Furthermore, when I have my occasional whim to change or add global formatting, its not such a large task. If you would like more advice on how to do this yourself, or just want to send me questions or comments, email site@jmack.net.


Causes cancer in laboratory animals.