Because HTTP is a stateless protocol, the normal behaviour of
websites is to forget everything about a visitor after a page has been
served. If we are to remember things about a visitor on
subsequent page views, we must reload all of that data from storage
when the visitor returns. This is straightforward enough in
principle—simply re-execute all of the SQL queries to load that data,
every time the viewer requests a page.
This
"brute force" approach has its drawbacks, however. You are
repeatedly executing the same queries over and over, which can be
computationally expensive. This will impact how much traffic your
website can serve, since it may be spending most of its resources
repeating similar operations over and over. What is desired is a
way to remember data once you have gone to the effort of fetching it,
so that it is much less costly to re-fetch it again.
ExSite has an optional feature, the persistent data store, which is useful for managing data persistence in this way. The persistent data store can be used to:
- improve performance on repetitive queries and computations
- reduce traffic/load to the database server
- reduce start-up time by caching configuration files and settings
- preserve the state of a user's visit
The persistent data store is sometimes called the "store", meaning a place to store things (not an e-commerce store).
Enabling the Persistent Data Store
To use the store,
you first need to initialize the store database. The store.pl
utility script distributed in the bin directory of the ExSite
distribution can be used to do this. From your cgi-bin directory,
run the command:
../bin/store.pl --reset
This will create the STORE database and the STORE.lock lockfile in your cgi-bin directory.
Now you need to configure ExSite to make use of the store. The store configuration is encoded in the routine &store_conf() in the file cgi-bin/Local.pm. This routine is disabled by default by being renamed to store_conf_disabled().
Simply removed the "_disabled" part to enable it. (Add the
_disabled part back at any time to temporarily disable the store.)
The contents of this routine are configuration parameters that
can be modified to "tune" the store.
(The
reason that store configuration is not placed in the system
configuration file, exsite.conf, like all other configurations, is that
we try to cache the system configuration in the store itself for faster
loading, so we cannot read the general system configuration until after
the store is configured.)
Database Cache
SQL queries
are computationally expensive compared to fetching items from the
persistent data store. That means we can cache the results of our
SQL queries in the store and save a lot of time when we repeat those
queries. This can get quite complicated, however, since if we
alter the database, the results of previous queries may change.
We have to know which cached items are safe to continue using,
and which ones should be forcibly expired from the cache because they
may have been superceded by new data in the database.
This
is managed for you by the Cache class, which provides a specialized,
higher-level interface to the Persistent Data Store, specifically for
caching SQL query results. You do not need to make direct use of
%store to use the database cache; the Cache object will handle
all interactions with %store on your behalf.
Note
that the database cache is automatically configured and used whenever
you instantiate a database object, so you do not have to do anything to
get the benefit of this subsystem. However, you may want to make
direct use of the Cache object if you perform any custom queries that
you would like to cache the results of.
If
inspecting the contents of the Persistent Data Store directly, you will
see numerous items prefixed with the label "cache:...". These are
used by the Cache object to track the cached queries and which ones
need to be forcibly expired on updates.
The
database cache still works even if the Persistent Data Store is not
enabled. However, it only caches query results for the duration
of that request. On a subsequent page view, it will go back to
the original database again.
Disabling Persistent Caching
There
are some cases where you may not want your queries to be cached.
One case is when your database is being modified by 3rd-party
applications that do not know about ExSite's caching system. Then
the cache can get out of sync with the database, and data integrity
problems can result.
To disable persistent caching, without disabling the persistent store entirely, use the following configuration setting:
cache.persistent = 0
If
you do this, queries will still be cached, but only for the duration of
the current request. Subsequent requests will have to fetch the
data all over again.
Sessions
Session management is used to track the state of a user's visit. You can store arbitrary key/value pairs in the users' %session
hash, and this information will still be available in this hash on
subsequent page views if persistent storage is enabled. (If not
enabled, the values in %session are forgotten after each page view.)
Unlike the general %store, the contents of %session are unique to each user. In other words, a user sees only the contents of their own %session, and changes to the contents of %session are only seen by that user.
If
inspecting the contents of the Persistent Data Store directly, you may
see numerous items prefixed with the label "session:...". These
are individual private sessions, each one corresponding to a particular
user. When you manipulate the contents of %session, you are changing the values inside one of these only.
If persistent storage is not enabled, you can still read and write to %session, but the changes will not persist.
Configurations
If
ExSite detects that persistent storage is enabled, it will try to save
its configuration files there, to avoid reloading and parsing them on
each page view. These include the system conf files as well as the
dbmap database description files.
An important
consequence of this is that if you change any of these files, ExSite
will ignore those changes, preferring the stored version instead.
To work around this, you should go into the Persistent Data Store
plug-in, and clear any affected items in the store. This will
force them to reload.
Performance
The persistent data
store has a significant effect on performance, especially reducing the
amount of time spent on repetitive database queries. If similar
pages are being hit repeatedly, subsequent page views will often not
hit the database at all, because the necessary queries (eg. CMS content
lookups, authentication requests) have all been cached by earlier page
views.
The effect is most significant on
smaller systems that run the database on the same server as the
website, since you do not have to divert as many system resources to
the database. On systems with separate database servers, the
effect may be different, depending on how heavily loaded the database
servers tend to be.
Badly-behaved robots can
clobber a site with hundreds or thousands of hits in a very short time
span. In this case, the store caching may not help as much as you
would like because by the time a query result has been cached, the
other page views have already checked the cache, found nothing, and
issued their own queries to the database. The store works best
once the cache has been primed, so the effect of a bad robot will be
reduced on a site that receives regular traffic that keeps the cache
full. Conversely, the effect of a bad robot will be worst when a
robot strikes the site after a period of low activity that expires
everything from the cache. You can tune the lifetime of stored
items in the store configuration to mitigate these effects.
In an extreme denial-of-service situation,
the store may itself become a point of resource contention, as
different processes struggle to get the lock on the database that they
may need to make updates. We recommend utilizing the "busy"
((Kill Switches|kill-switch)) as a counter-measure against DOS and DDOS attacks
against your website, if this is a concern.
In
a plain CGI setup, the persistent data store can increase throughput by
25% or more. In a Persistent Perl setup, the increase can be more
than 250%. And Persistent Perl together with the persistent data
store can improve throughput over plain CGI by a factor of 10.
(Factors of 5 to 20 improvement are not unusual, depending on the
specifics of the system and the nature of the traffic.)
Maintaining the Persistent Data Store
storeAdm
Use the StoreAdm plug-in to perform manual maintenance of the store contents. For example:
- you want to change some system configurations, and need to clear the old configurations from the cache
- you want to inspect the contents of an item in the store for debugging or technical support purposes
- you want to terminate the session of a certain user (by deleting their session entry in the store)
- you want to clear and reset the entire persistent data store to a pristine state.
store.pl
ExSite
also ships with a command-line tool called store.pl, which can be used
to inspect and maintain the store from a shell session. This is
included in the bin directory of the distribution, but should be
executed from the cgi-bin directory. Examples of use include:
List all items in the store, and their expiry times:
../bin/store.pl --list
Display a particular item in the store:
../bin/store.pl itemname
Reset the store back to a pristine state:
../bin/store.pl --reset
Reclaim unused diskspace:
../bin/store.pl --rebuild
Note
that this last command is useful when your store data file grows large
over time. The store does not free unused disk space when it
expires old items, but holds onto the space for re-use. It is a
good idea to periodically rebuild the store file to free up disk space
that may not be needed.
There are other switches to store.pl, as well. Consult the comments for more information.
Task Manager
The
ExSite task manager can be used for automated maintenance of the
persistent data store. Tasks can be set up for the StoreAdm
plug-in, to run at hourly, daily, or weekly intervals, or at preset
times. The following task actions are accepted:
rebuild - compacts the store database file to reclaim unused disk space
reset - resets the store to its initial state (clears all data)
purge
- clear all expired data. Note that purging (ie. garbage
collection) should be done automatically by the store, so it is not
strictly necessary to set up an automated task to do this.
delete - clear a specific item (which should be named in the task ID field)
Programming with the Persistent Data Store
The persistent data store is accessed through
the tied hash %store. To save a piece of data persistently, simply place it into this hash under a unique key. For example:
$store{foo} = $bar;
$bar in this case can be a scalar, or a complex structure.
On a subsequent page view, you can quickly retrieve this data by looking it up under the key you specified:
my $bar = $store{foo};
By
default, data placed into the store will persist for a limited time.
(The default is one hour.) After that, garbage collectors will
dispose of your data to free up space for other items. That means if
you are using the store, you have to be prepared to reload the data
from the original source if you do not find it. (Then you can place it
back into the store again, if you like.) Used in this way, the store
acts as a cache for the original data.
Because the persistent data store is an optional feature, if it is disabled, then %store
behaves like a normal hash variable. Anything you place into it will
persist only for the remainder of the current request. There is no
harm in using this hash variable, and as long as you don't require data
to be present in the store, most code need not know whether the store
is enabled or not to take advantage of its benefits. Code should
always check the store for a "fast" copy of some data, and then fall
back on the original source of the data for a "slow" copy if the former
is not found. For example:
# this code works whether the store is enabled or not
my $bar = $store{foo};
if (! $bar) {
# not found in store; fetch from database instead
$bar = $db->get_query("ReallySlowQuery");
$store{foo} = $bar;
}
If there is a case where you do need to know whether the persistent store is enabled, you can use this:
(tied %store)->is_persistent(); # returns TRUE if store is persistent
Managing expiry times in the store
If you want to control
when your item expires from the store, you have two options at your
disposal. First, when you save the item in the store, you can
explicitly specify the expiry time (as a Unix timestamp, eg. from the
time() function):
(tied %store)->put($key,$value,$expiry);
Alternatively, if you want to change the expiry time of an item already in the store, you can modify the expiry time alone:
(tied %store)->renew($key,$expiry);
In
either of these cases, if you specify an expiry time of zero, that will
be understood to mean that the item should never be expired.
Store Implementation
The store is implemented at a low level
using a DBM database, and GDBM_File in particular. Data is
encoded in the store using the Storable package for linearizing Perl
data structures. See ExSite::Store.pm for the ExSite
implementation built on top of these technologies, which manages expiry
times, locking, garbage collection, etc.