A class for parsing and composing URIs (web addresses).
Note that a URI is composed of the following components:
scheme://authority/path?query#fragment
The scheme
defaults to ``http''.
The authority
is typically comprised of hostname, domain, and TLD,
delineated by ``.''.
The path
may be comprised of multiple sequential names, delineated
by ``/''. This typically consists of multiple path segments, the first of
which refers to a script_name
, and the remainder of which is the
path_info
, which may in turn consist of several sub-segments that are
concatenated together. For example, /cgi-bin/script.cgi/A/B/1/2
.
The query
is typically comprised of multiple key=value
pairs,
delineated by a separator character (which defaults to ``&'').
my $uri = new ExSite::URI(%option);
%option
can include:
separator => parameter separator character (eg. ";" or "&")
plaintext => output plaintext URLs if true otherwise, output HTML URLs
uri => a URI string to initialize the object with
secure_query => encrypt the query data to make it tamper-resistant
By default, the object will be initialized with the current URI, will use '&' as the parameter separator, and will output HTML URIs.
The only difference between HTML and plaintext URIs is whether or not HTML metacharacters such as '&' are escaped. (In plaintext mode they are left unescaped.)
You can change the separator character at any time:
$uri->separator(';');
The current separator character is used for both parsing URIs and composing new URIs, so you may need to switch if you want to use a different separator character for your input and output.
You can change the text mode with the following calls:
$uri->plaintext; # output plaintext URIs $uri->html; # output HTML URIs
At any time, you can extract a structure with all of the parsed URI data using:
%parsed_uri = $uri->info;
You can also fetch individual URI components using:
$data = $uri->get($component);
where component
is one of the keys in the hash returned by
info()
, namely ``scheme'', ``authority'', ``path'', ``path_info'',
``script_name'', ``query'', ``query_data'', or ``fragment''. Note that
``query'' is the raw query string, and ``query_data'' is a hash of parsed
keys/values. Also, ``path'' is the concatenation of ``script_name'' and
``path_info''.
This class can manage URIs from any source, in principle. Its defaults are optimized for handling ExSite URIs. ExSite URIs use a conventional format which assumes the following additional rules:
The path component of the URI constists of a script_name
and extra
path_info
concatenated together. For example:
/cgi-bin/script.cgi/extra/path/data
key=value
pairs,
joined by a separator character (``&'' by default).
These are common URI conventions, so this class should be fairly
versatile, even with non-ExSite URIs. You might encounter minor
issues with non-ExSite URIs that do not use the same conventions. For
example, not all query strings are sequences of key/value pairs, so
we might not be able to extract intelligible parameters from unconventional
query strings. Also, it may not be possible for URI to tell which part
of a path corresponds to a script_name
and which to a path_info
,
or even if those are sensible ways to divide the path. In that case,
you may get no script_name
or path_info
parsed out of the URI,
and it will all be aggregated into a single path
. Attempting to
set query parameters or path segments may not give expected results in
these cases.
If you do not pass an explicit URI, the object will initialize itself with the URI of the current request, as read from the Apache environment.
You can re-initialize the object with a different URI at any time:
$uri->setup($new_uri);
After modifying the URI (see below), it is often the case that you want to reset it back to its initial state. You can do this:
$uri->reset();
If the URI was explicitly passed to the object, this will restore the original state completely. If the URI was implicitly determined from the local environment, however, it may be different, depending on how local definitions have changed in the meantime. If the path or query data have been altered in ExSite's input buffers, then the URI will reflect those changes.
Sometimes you want this behaviour for explicit URIs. For example, the object may be forced to an explicit URI that is meant to reflect a local URI that would normally be implicit. (This happens when publishing, for instance, where we spoof the URI and environment for each page that we generate.) To get the implicit reset behaviour on a an explicit URI, do this:
$uri->use_input();
This tells the object to use any updated input data when constructing the implicit URI.
The query is the part of the URL after a question mark. It is typically broken into key=value pairs by a separator character, which is ``&'' by default.
To change a parameter in the URI:
$uri->parameter($key, $value);
To remove a parameter completely:
$uri->parameter($key,undef); # OR $uri->parameter($key);
To change multiple parameters:
$uri->query(%parameters);
The query string is written as key1=val1&key2=val2...
, although the
parameter separator character ``&'' can be changed as noted above.
If you make the URI object secure:
$uri->secure();
then your query strings will be encrypted, making them tamper-proof. This is not recommended for normal usage, as it is quite convenient to be able to inspect and alter query strings. However, you may wish to make exceptions in some cases where sensitive data may be exposed in the query string, or there are security issues associated with editable query strings.
To go back to normal query strings, use:
$uri->insecure();
(This is a misnomer, since there is nothing really insecure about a normal query string.)
The URI path includes the slash-separated values after the domain
name and before the '?'. This is typically broken down into two
parts, script_name
and path_info
.
/path = /script_name/path_info
The script_name is typically broken down into a diskpath to a CGI
program, while the path_info
is treated as path-like data that is
then passed on to this program. For example:
/script_name + /path_info = /cgi/page.cgi + /store/catalog.html/widgets/blue_grommet
In principle the path_info
can be further broken down into segments
that refer to different types of resources, which are concatenated
together, eg.
/path_info(CMS segment) + /path_info(Catalog segment) = /store/catalog.html + /widgets/blue_grommet
The breakdown of different path_info
segments is done using the
Input manager (ExSite::Input
), if this is an implicitly defined
URI. Once they are defined, you can redefine specific segments in
isolation in the URI object. For example, if the path_info
is
divided into the CMS and Catalog segments, as in the above example,
then we can redefine either segment alone as follows:
$uri->path("CMS","/store/catalog.html"); # scalar method $uri->path("Catalog","widgets","red_grommet"); # array method
These new path segments will replace the original path segments, without altering the remaining segments of the path.
If you define a new path segment unknown to the Input manager, then the new path segment will be appended to those that are already defined. For example,
$uri->path("extra","foo");
would result in ``/foo'' being appended to the existing path, resulting
in a new path_info
of:
/store/catalog.html/widgets/blue_grommet/foo
To delete a path segment, just pass nothing as the segment data:
$uri->path("Catalog",undef); $uri->path("Catalog"); # equivalent
To completely override the path segments defined by the Input manager, and explicitly define your path, use these:
$uri->script_name($path); $uri->path_info($path);
A service page is a special page in the ExSite CMS that services
requests for a particular plug-in. If a page generates a URL that
will be processed by that plug-in, it should automatically adjust the
target URL so that it redirects to the service page. This is done in
the URI class by the service_page()
method.
To change the current URI so that it directs to the service page instead of whatever page it happens to be on, use this:
$uri->service_page($module);
where $module
is the plug-in (either a module object, or simply the
name of the plug-in).
Not all plug-ins are configured to use service pages, but there is no harm in calling this method in those cases; it will leave the current URI unchanged.
Some URIs direct to pages/screens that require a certain level of user access to view. Simply using the URI is not sufficient to view the contents; you also need to be logged in as a user with sufficient access. If you do not have this level of access, you are likely to get a permission denied error message, or be prompted for a login and password.
There is a feature by which you can include authentication credentials in a URI so that the user will not receive an error or login prompt. This trick uses encrypted ``authtokens'' embedded into the parameter string.
There are two things to consider when using authtokens:
To generate an encrypted authtoken string:
my $authtoken = $uri->authtoken($login_id, $expiry_in_days);
To modify the current URI to include an authtoken granting that URI special access:
$uri->authorize($login_id, $expiry_in_days);
You then must output the URI (see below) to actually use it. You
cannot really modify the URI any further at this point, because then
the authtoken won't match the updated URI, and it will fail to
validate. It may be necessary to reset the URI or remove the _auth
parameter to get back to a working URI. To generate a URL with an
embedded authtoken, but leave the URI object in a normal working state
so that it can be further modified, use:
my $auth_url = $uri->authorize_url($lgin_id, $expiry_in_days);
After a URI has been modified using the above methods, you can obtain
the changed URI using the write
methods.
$newuri = $uri->write($type);
$type
can be ``relative'' or ``full'' (full is the default):
$newuri = $uri->write_relative();
This returns the URI after the authority. It presumes the same authority as the referrer.
$newuri = $uri->write_full();
This returns the full URI including the scheme and authority.
Modifications to the URI are cumulative, so you can make changes,
output the new URI, make more changes, output again, etc. If you want
to reset the URI to its original state so that changes are not
cumulative, use the reset
method:
$uri->reset();
This also syncs with the Input manager to retrieve any new path segments that were defined since the URI object was instantiated.