How ExSite Creates Web Pages
This document describes, in moderate technical detail, how ExSite
constructs web pages for display on client browsers. It is
intended as a primer for technically-oriented graphic designers who
will be tinkering with templating tools at an advanced level, and as an
introduction to the ExSite content management framework for web
developers.
Before reading this document, it would be wise to brush up on ExSite's Content Model.
Overview
ExSite manages any number of websites (or website sections) that are collections of individual web pages.
Every web page produced by ExSite has both a static and dynamic
representation. The static version is prepublished to disk, and
is served as a traditional HTML document. The dynamic version is
composed on the fly by the webserver, and served to the viewer
immediately.
If the web page publishes to a simple HTML file and contains no
recursive web applications, then the static representation of the page
is the only one the public will normally see. A static web page
typically has a URL that ends in filename.html. On the server side, this really is a simple, flat HTML file.
If the web page is configured to always render dynamically, then the
dynamic representation is the one the public will normally see. A
dynamic web page typically has a URL that ends in page.cgi?_id=NNN.
If the web page publishes to a simple HTML file, but contains web
applications that may regenerate the page content, then the web page
can appear both ways. The default view of the page will be
published to a static file (and will appear as filename.html), but the page can be regenerated under different conditions (in which case subsequent page views appear as page.cgi?_id=NNN...).
In most cases, ExSite will automatically choose whether to display the
static or dynamic representation of the page. Generally, it
chooses the static representation (which is usually better for
performance), unless one of the following is true:
- the page has restricted access. In this case, ExSite
chooses dynamically whether to display the page, or issue an
authentication challenge.
- the page has accepted parameters or form inputs to regenerate itself with different content.
- an administrator is previewing a new version of the page that has not been published yet.
- an administrator has explicitly configured the page to always
render dynamically. (This is done in cases where the page is
displaying content that changes too quickly to make republishing the
page practical, such as with some database reports.)
Page identifiers
Pages can be referred to by their filename, or by their numeric page ID
(the _id parameter to page.cgi). Either identifier can be used
to locate the page.
Note however, that every page in the system has a unique page ID, but only
pages in a particular site/section have a unique filename. If you specify
a page by filename, ExSite will only look in the current site for a matching
page.
How ExSite finds content
All page content is located in content objects, which have various
names (text labels) that identify the content. For example, a
particular block of text might be called "body" because it comprises
the main body of text on a page. An image might be called logo, or banner_ad, or MyPhoto, or PZ10114s.jpg. The names are arbitrary - it's nice if they are meaningful to you, but ExSite doesn't really care.
A page is a collection of references to different content objects (eg.
text, images, stylesheets, etc.) that are assembled to create the final
page. Each time a new content object is referenced, ExSite must
find that content object somewhere in the system. Here is how it
searches:
- First, it looks inside the page that is being built. Each
page can define its own content objects that are unique to that page.
- Next, it looks inside the graphic design template that the page
uses to format itself. Each template can define content objects
that can be used by every page that uses it. If the graphic design
is derived from (or "spun off
of") another graphic design, then those parent design(s) are also
checked to see if they define our missing content object.
- Next it looks inside any content libraries that exist in the
current site or site section. Content libraries are places to put
re-useable content objects that might need to be shared by different
pages. ExSite will also look in shared content libraries in our parent
sections and sites, in case the content is being provided to us by
another site.
ExSite takes the first match it finds. If we're looking for a content object called logo,
then a match in the current page is preferred over a match in the
graphic design template, which is preferred to a match in the
libraries. Note that matches might exist in all of these
locations, but the preferred versions override the others. (This
lets you define general-purpose content that will be used as a
fall-back in cases where specific content has not been provided.)
How ExSite builds a page
To begin constructing a page, ExSite needs an initial starting point. It looks for a content object named page to begin. It searches for this content object using the above algorithm.
The page content object provides the initial HTML framework for the page. That is, when we load the page
content object, and ask it for it's actual content data, we will
receive a block of HTML that lays out the overall page structure, and
will refer to other content objects that are needed to complete the
page (such as images, stylesheets, menus, and text). These
references to other content objects are specified using special tags in
the HTML that are understood by ExSite (although they will look like
HTML comments to any other HTML-handling program).
To illustrate, here is a very simple example of the type of HTML content that you might get in a "page" content object:
<html>
<head>
<title><!--$title--></title>
</head>
<body>
<img src="[[logo]]"<
<!--content(body)-->
</body>
</html>
From this you can see that the bare essentials of the page are
included: a head section, a body section, and a few other items
inside those. A more realistic example would probably include
all sorts of layout instructions (tables, divs, css, javascript, etc.).
There are three special tags in this example that refer to other content objects that are needed to complete the page:
<!--$title-->
This tells ExSite to insert the appropriate meta-data into this
spot. Meta-data is not really content, but it can be inserted
into the page just like content can.
[[logo]]
This tells ExSite to insert a URL to the logo content object into this spot. ExSite will perform a search for logo (as described above), and when it is found, will determine the best URL to use.
<!--content(body)-->
This tells ExSite to insert the HTML for the body content object into this spot. ExSite will perform a search for body (as described above), and when it is found, will determine the best HTML representation of the content to use.
Once these substitutions are complete, ExSite looks at the new,
expanded, version of the page's HTML. New content references may
have appeared as a result of the substitutions we performed, in which
case, we have to repeat the process and search for the missing content,
until no more references can be found. At that point, the page
construction is complete, and it can be delivered to its destination -
either the client browser (if the page is being rendered
dynamically) or the publishing program (if the page is meant to be
viewed statically).
More information on how to use substitution tags is given in the ExSite Templating Guide.
PHP, ASP, & Server-side Includes
From the point-of-view of ExSite, a static page is one that gets
written out to disk. ExSite does not care what happens to this file after
it is written out. That means that if your webserver is configured to
perform additional operations on published web page files, it can do so.
That makes it possible to include PHP, ASP, and server-side includes in
your published files, and they will function correctly.
ExSite considers anything that is written to the published file as a
form of content. Therefore if you required embedded code or SSI directives
in your published files, simply embed them directly into your HTML.
For example:
<!-- following is a SSI -->
<!--#echo var="DOCUMENT_NAME" -->
<p>This is normal HTML.</p>
<?php
/* this is PHP code */
echo "Hello, world!";
?>
In the case of PHP (or alternatively, ASP), you may want to keep the
PHP code isolated into its own content object. That allows you to manage
the code separately from the regular HTML around it, including access
controls and version handling. Simply make sure the MIME-type of the
content is text/html so that it gets inlined into the page without
modification.
Note that your published pages will have to be named appropriately for
the server to recognize that they require special processing. ExSite
accepts file names with the following suffixes:
- .htm, .html - normal HTML files
- .shtm, .shtml - files that may contain SSI directives
- .php - files that may contain PHP code
- .asp - files that may contain ASP code
Note that servers can be configured differently than these assumptions
imply. For example, SSI may be supported on regular .html files.
But these rules will cover most common cases.
Normally you cannot expect SSI, PHP, or ASP to work on ExSite pages
that are dynamically generated through CGI. That means that the raw code
may get pushed through to regular page previews; pages will
need to be published before you can see a full preview. That also means
that ExSite's conventional mechanism for restricting page access will not
be useable, since restricted-access pages are rendered by CGI.
Code Path
This section is intended for developers only. It describes the key
code units that get executed as ExSite goes through the page
construction process. This code execution is initiated by the
page.cgi program.
- ExSite::ContentBase::new() - generic CMS constructor, which
creates the "Page" code object.
- ExSite::Page::setup() - locates the page in the
database, and loads its metadata.
- ExSite::Page::errorpage() - called if no valid page was found.
This locates an error-handling template, if one is defined.
- ExSite::Page::expand() - inserts the error message into the
error-handling template, so that it will be displayed with the correct
graphic design for the site.
- If there is no error-handling template, the bare HTML error message
is dispatched.
- If we get here, EXIT.
- ExSite::ContentBase::set_context() - finds the
page content object, from which all the page's HTML begins.
Also finds the current version of the page object, and the
page's section.
- ExSite::Page::errorpage() - called if the user does not have
permission to view the page. (See above for the flow-control logic within
errorpage().)
- ExSite::ContentBase::expand() - finds and inserts all
page-specific meta-data and content into the page.
- ExSite::ContentBase::get_start_html() - gets the starting
HTML to begin building the page from. For normal pages, this will usually
be taken from a precompiled template.
- Execute the page-expansion loop, looking for unresolved content
objects:
- ExSite::ContentBase::get_content_url() - replaces all CMS
tags of the form "[[name]]" with URLs to the named content object.
- ExSite::ContentBase::get_page_url() - replaces all CMS
tags of the form "{{name.html}}" with URLs to the named page.
- ExSite::ContentBase::get_content() - replaces all CMS
tags of the form "<!--content(name)-->" with an HTML representation
of the named content object. This may cause new CMS tags to appear in the
page.
- ExSite::ContentBase::get_dynamic_content() - replaces all
CMS tags of the form "<!--&Module(options)-->" with HTML that
is received from the named plug-in dynamic content module. This may
cause new CMS tags to appear in the page.
- ExSite::ContentBase::get_dynamic_content_indirect() -
replaces all CMS tags of the form "<!--&&Module(options)-->"
and "<!--&&&Module(options)-->"
with Javascript that will fetch and insert the content separately
(AJAX-style).
- Repeat the page-expansion loop until no more CMS tags are found.
- ExSite::Page::show() - dumps the page to stdout, along with
all headers.