posted on 5:49 PM, November 20, 2008
ExSite has a few features that allow you to automatically publish or republish pages at set times or intervals. You can also unpublish (remove published files) using the same features. Scheduled publishing is useful in a number of scenarios, for instance:
- You have made some updates, but do not want them to "go live" until a specific time, and do not want to have to log in or remember to do it at that time.
- You have an active campaign or promotion, but you want it to automatically disappear at a certain time.
- Your content reflects user-contributed material (such as comments) that arrive at unpredictable times, but you want certain published/static pages to reflect those contributions automatically without having to log in and publish them manually.
- You have a page whose content is dynamic, but you want the higher performance of a static page. (This is useful if the page receives large volumes of traffic, and you want to reduce its impact on the server while still providing a semi-dynamic view of the changing content.)
- Static pages are more secure, since there is no software execution or database access.
Publishing with the Task Scheduler
The Task Scheduler is a special plug-in that allows the administrator to create scheduled tasks. The Task Scheduler needs to be set up on your server to run every hour, otherwise this method will not work. See the documentation for the Task Scheduler for more information about this; otherwise, see the next section for alternative methods. Different plug-ins may or may not support scheduled tasks, so what you can and cannot schedule depends on which plug-ins you have installed. In this case, you want to use the Publish plug-in.
Open the Task Scheduler, and click the button to create a new task. Give your task a descriptive name, such as "Republish Home Page". The Module is "Publish" (the name of the plug-in). The action can be "publish" or "unpublish" depending on whether you want to write or delete files. The type is one of "section", "page", or "content", depending on whether you are publishing a whole site, a single page, or just one content item. The ID is the numeric identifier of the thing you want to publish. (For example, "page 33". You can find the ID if you inspect the item in the Website Manager.)
There are two basic methods for scheduling tasks: a fixed interval, or a preset time.
To run on a fixed interval, choose one of the intervals "hourly", "daily", or "weekly". Daily tasks run in the first hour of each day (just after 0:00 or midnight if your clock uses local time). Weekly tasks run on the first day of the week (which is Sunday). Hourly tasks will run at whatever minute of the hour your task manager process is scheduled to execute. If you choose to run on a fixed interval, ignore the Execution time field; this will be updated every time the task is run.
To run at a preset time, set the Method to "preset", and set the Execution Time to the time you want the task to run. It may not run at exactly this time; it will run the first time the task manager process executes AFTER this time. For example, if the task manager runs at 12 minutes past the hour, and you schedule your task for 6:00, it will actually run at 6:12. To re-run a preset task, edit it, set a new time, and change its status from "completed" back to "active".
Not just anybody is allowed to publish stuff on your site, so you must also select an appropriate administrator under whose account this this job will run "in abstentia". If this administrator is not allowed to publish the things you have selected, the job will fail to execute properly.
With either method, set the task status to "active", since non-active tasks are ignored.
Republishing using an embedded plug-in
If your site does not support running the task scheduler, there is another way to automatically republish a particular page on a regular interval. It embeds the Publish plug-in into your page, which allows it to take action whenever the page is viewed.
To set this up, go to the page you want to republish on a regular interval, and edit it using the Website Manager. Somewhere in the page, insert the Publish plug-in. Use half-AJAX notation for the plug-in tag, ie.
(This is because the plug-in has to call back to the server from the static page, and it uses AJAX to do this.)
The "options" string defines how often you want the page to republish. You can choose the usual "weekly", "daily" or "hourly" options, as above, or you can specify an integer number, which is understood to be a number of minutes. For instance set the options to "10" to republish every 10 minutes.
The way this method works, is that every time the page is viewed (in an AJAX-capable browser, that is), the server is notified, and we compare the current time with the time the page was last published. If more time has elapsed than the interval allows for, we republish the page at that time.
One important consequence of this is that the viewer gets the old version of the page, because the republishing does not happen until after the page has been served to them. The next viewer, however, will get the new page.
Scheduled publishing can be a big performance booster, because static pages are much more efficient than dynamic pages.
Examples of pages that can benefit by being converted from dynamic to static pages that are refreshed with frequent republishing:
- forum indexes
- event calendars
- public forms with dynamically-generated captchas
Using the task scheduler gives the biggest performance boost, because the static pages it produces will be the leanest of all the methods. Using the embedded plug-in method is less efficient, because it causes an AJAX call back to the server on each real page view. This is less resource-intensive than a full page view, so it should still be a net gain for a busy site, but it still has a cost.
Robots (automated web crawlers) are a big source of performance headaches for many websites, especially if the robots are poorly-behaved and ignore your robots.txt file. Scheduled publishing can mitigate some of these problems, because the robots are hitting more static pages, and fewer dynamic pages. Either method of republishing is useful for dealing with robots, because robots typically do not support AJAX.
To realize a performance boost, you must be serving page views more often than you are republishing. That is because publishing a page is a more resource-intensive operation than simply viewing it dynamically. This is usually a no-brainer for pages that are being viewed more than once per minute, but may be questionable for pages that get viewed hourly or less. In that case, the issue of updating pages to reflect pseudo-dynamic content updates may be the more important issue to consider.
The extreme case is the following use of the embedded plug-in method:
This tells the Publish plug-in to republish the current page if more than 0 minutes have elapsed since the page was last viewed. In other words, as soon as someone has viewed this page, republish it. This may have a negative performance benefit, but it may have other pseudo-dynamic content benefits, such as rotating through content on each page view, eg.
- generating new ads for the next viewer
- regenerating captchas after they are used on a static page
- rotating through a set of random images
Because the republish operation happens through AJAX methods, most robots will not execute the AJAX, and you will only go through these content rotations when real viewers hit the page. If robots invoking dynamic page views are a nuisance for you, then you may in fact enjoy a performance benefit from serving a static page to the robots, because the republishing only happens for AJAX-enabled visitors (who are generally real humans).
Static page views are the most secure, since they do not require any code execution or database access. (Unless they contain AJAX calls, that is.)
Private (member-only) pages cannot be published to static files, because their content depends on whether the user authenticates or not. However, because publishing also doubles as "approving content for public view" in ExSite, there is still a purpose to publishing private pages: it approves working revisions of content, allowing it to be shown to the viewers of a website.
That in turn brings up a final point: the essential point of publishing is to take working content public. If you are working on a page or site and have working revisions on file that are not ready yet when the scheduled publisher kicks in, those working revisions may go live unexpectedly. If necessary, you may want to disable the scheduled publishing while doing such work.