Debugging & Profiling

Basic Debugging

Standard perl debugging tools work with ExSite. In particular, we recommend using the build-in Perl debugger, and the Perl profiling tools, to help you solve difficult bugs. ExSite includes a utility program to simplify the use of these tools and integrate them more tightly with ExSite.

The developer should be familiar with the Perl debugging tools. The following Perl system docs are helpful:

man perldebug - overview of the Perl debugger and the various debugger commands
man dprofpp - overview of the Perl profiler

`cgi` Utility

ExSite comes with a utility program called cgi that emulates the ExSite (and Apache) execution environment. This ensures that when you are debugging, your programs are executing under the same conditions that they do on the web.

Usage:

cgi [-d] [-p][ -c COOKIE] [-e ENV_VAR] URL

This "executes" the URL (ie. runs the CGI program that the URL represents). This command should be executed from the CGI_BIN directory. If the cgi program is not in your execution path, you will have to prepend the path to it (eg. "./" if it also in CGI_BIN).

Switches:

-d

executes the URL under the Perl debugger

-p

executes the URL under the Perl profiler

-c COOKIE

defines a cookie for the executing script. It is not necessary to do this for login/identity cookies, as the cgi program will define these automatically. You only need to manually specify cookies if you require your own special cookies to be set. The cookie should be written as KEY=VALUE.

-e ENV_VAR

defines an environment variable for the executing script. The environment variable should be written as KEY=VALUE. It is not necessary to do this for the following APACHE environment variables, which are automatically set: HTTP_COOKIE (for login/identity cookie), DOCUMENT_ROOT, SCRIPT_NAME, SCRIPT_FILENAME, HTTP_HOST, QUERY_STRING, PATH_INFO, SERVER_NAME, REQUEST_URI, CONTENT_LENGTH, and REQUEST_METHOD You only need to manually specify environment variables if you require something other than the above to be set.

URL

The full URL, eg.

'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'

You can also provide a partial URL of the form

'page.cgi?_id=1&other_data'

but in this case, cgi may not configure the execution environment completely. Note: it is a good idea to enclose the URL in single quotes to protect it from the shell.

When you run cgi, you are first asked for a login and password. Enter a website login and password if the URL is meant to be executed with a certain level of privelege. If the URL is meant to be executed as a member of the public, simply hit ENTER in place of a login ID. Note: entering a login and password sets an identity cookie that is used to authenticate you. This cookie is printed out by cgi so you can see what is happening.

Handling CGI Inputs

If the URL processes form input, then you need to pass the input data to the CGI program. For GET method forms, the input data is specified right in the URL (in the QUERY_STRING part, after the question mark), and no special work is needed.

For POST method forms, the form data is read in separately. To get this data into your program, define the environment variable CGI_INPUT to contain the POST data, for example:

export CGI_INPUT='login=admin&password=mypwd'

cgi will echo this data and ask you to copy and paste it into the program when the program pauses to read it.

If it is a large or complicated form, you may want to see the POST data first, so you can clone it exactly rather than try to type it in manually. Here is a simple method to accomplish this:

Visit the page with the problem form.
Save the page as HTML on your local disk.
Edit the form target to point to the debug.cgi program, eg.
Save the page, and visit the saved page in your browser.
Fill out the form, and submit.
debug.cgi will print out the raw and parsed POST data so that you can inspect it. You can copy and paste the raw POST data into the CGI_INPUT environment variable.

Alternatively to CGI_INPUT, you can write the raw POST data to a file, and set the environment variable CGI_INPUT_FILE to point to this file. This method is especially useful in cases of extremely large input sizes, in which the shell may prevent you from setting CGI_INPUT correctly.

Debugging publish processes

Problems with publishing are slightly special, because the publish program is not perl, but a compiled C wrapper. In these cases, change the URL in the cgi statement to refer to publish.pl instead of publish.cgi.

Diagnosing Fatal Errors

Fatal errors are those that fail to generate any page at all, and result in an error message from Apache, such as "Internal Server Error". These are typically caused by failures to compile Perl, due to syntax or semantic errors in the source code. It is not necessary to go into debug mode at all to identify the cause of these problems. Just re-execute the URL from the command line, eg.

cgi 'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'

This should spit out the compiler errors, allowing you identify the problem directly. The following addition to the top section of a CGI program will also output the errors directly to the web browser, if you don't have command-line access:

use CGI::Carp qw(fatalsToBrowser);

Diagnosing Execution Errors in CGI Programs

(For errors caused by plug-in modules, see below.)

Execution errors that produce the wrong result usually require stepping through the code in the debugger to follow the execution path and see why the incorrect decisions are being taken. To run a URL in the Perl debugger, use the following method:

cgi -d 'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'

Remember, you may need to login or predefine your form inputs in order to get the precise conditions that cause the problem.

You will be deposited in the Perl debugger, which allows you to step through the program, inspect variables, set breakpoints, and so on. Consult the perldebug man page for details.

Diagnosing Errors in Plug-in Modules

Errors in plug-in modules typically manifest themselves in one of the following ways:

home.cgi - the plug-in fails to appear on the webtop
Note that this is not necesarily a bug - only plug-ins that have a defined control panel will appear on the webtop.

Otherwise, this is typically caused by a fatal error in the plug-in code (ie. the Perl fails to compile). To see the compilation error messages, you can compile the plug-in straight from the command line:
```
perl Modules/MyPlugin.pm
```
insert web application dialog in HTML editor - the plug-in does not appear in the list of available plug-ins
This is typically caused by a fatal error in the plug-in code (ie. the Perl fails to compile). See above for how to diagnose.
page.cgi, ctrl-panel.cgi - the plug-in throws a fatal error
In these cases, the fatal error message should be clearly displayed in the browser.
page.cgi - the plug-in produces the wrong result
You need to step through the plug-in code using the debugger, as described above for CGI programs. However, you will not be able to set a breakpoint in the plug-in before the plug-in has been loaded. The workaround is simple: load the plug-in manually from the debugger command line. This example preloads a plug-in, sets a breakpoint at the plug-in entry point, and then resumes execution:
```
  DB<1> require Modules::MyPlugin

  DB<2> b Modules::MyPlugin::write

  DB<3> c
```
ctrl-panel.cgi - the plug-in produces the wrong result
Same method as previous. If you want to break at the entry point of the plug-in, you will need to inspect the code to see what the control panel subroutine name is, since control panels are optional and there is no standard entry-point name.
```
  DB<1> require Modules::MyPlugin

  DB<2> b Modules::MyPlugin::ctrl_panel

  DB<3> c
```

Diagnosing Performance Problems

This section describes a few methods for understanding where a program is spending its time.

Basic benchmarking can be done two ways:

time cgi 'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'
This uses the Unix time command to measure the CPU usage of the executed program. In this case, it will include the CPU overhead of running the cgi program, as well as the general overhead of loading Perl, and compiling your code. In addition to the CPU time, it also reports the real time, which will help to assess the cost of external factors such as database queries (which are executed by a different program). This will give you a general measure of the system resources that are needed to satisfy a request to ExSite in a basic CGI setup.
export EXSITE_BENCHMARK=1 cgi 'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'
This uses built-in benchmarking code inside ExSite to measure code execution time only. The time does not include system, Perl compilation, or database overhead, so it is a good measure of the efficiency of your code in isolation from other factors.Â Note that this benchmarking method only applies to running page.cgi, not any other CGI programs.

If benchmarking a plug-in, you may want to benchmark a baseline page (with no plug-in) to compare to the benchmarks of the page with the plug-in. That allows you to factor out the general ExSite system overhead.Â If your plug-in code executes quickly relative to the general execution time overhead, there will be little to be gained by over-optimizing it.Â But if the plug-in page shows excessive time consumption relative to the baseline page, you will want to profile your code to diagnose which code blocks are responsible for the slowness.

cgi -p 'http://www.mydomain.com/cgi-bin/page.cgi?_id=1&other_data'

Follow the same procedure as if you were debugging, except the code will execute without pausing in the debugger to accept commands. You may wish to check the output to ensure that the code did in fact produce the result that you are intending to analyze. The profiling run will leave a data file in your working directory, tmon.out which contains the profiling data.

Inspecting Profile Data

Most profile reports return the top-15 time consumers. If you wish to see more, add the switch -O N, where N is the number of time consumers you want to see (eg. -O 25).

Analysing exclusive time usage:
```
dprofpp tmon.out 
```
This displays the CPU time spent in the most resource-intensive routines, exclusive of the time spent in calls from those routines. This can highlight particular routines that have costly algorithms.
Analysing inclusive time usage:
```
dprofpp -I tmon.out 
```
This displays the CPU time spent in the most resource-intensive routines, inclusive of the time spent in calls from those routines. This can highlight the most costly code paths, if there is a particular sequence of nested routines that collectively eat a lot of time.
Analysing real time usage:
```
dprofpp -r tmon.out 
```
This displays the real time spent in the most resource-intensive routines. This can highlight cases where the code is pausing for lengthly periods, waiting for data from an external source (eg. disk files or client-server calls, like SQL databases).
Analysing execution paths
```
dprofpp -t tmon.out 
```
This displays the complete execution path (every routine entered by the running program). This can give you a fast overview of the decisions your code is taking, and how code blocks are executed repeatedly, without having to step through the code manually.

Here are some typical culprits for time consumption. If these are the only items showing up in your profiles, your system may be working normally:

*::BEGIN: These items are related to the loading and parsing of Perl modules. As a general rule, they cannot be optimized out of a simple CGI setup, except by eliminating whole libraries from your set of use statements. (You can eliminate much of the Perl overhead in a persistent Perl setup, but that is beyond the scope of this document.)
DBI::st::execute: This item relates to calling your database engine. It is usually prominent on real-time profile listings since the system pauses to wait for the DB engine to return results, but it can grow if your queries are large or inefficient. See SQL Optmization, below.
ExSite::ContentBase::expand ExSite::ContentBase::find ExSite::ContentBase::find_prefetch ExSite::ContentBase::prefetch ExSite::ContentBase::set_context: These items relate to the CMS system, including templating, revision handling, content searches, and so on. It is part of the general overhead of ExSite. If most of your time usage comes from these items, your custom code is a relatively minor contributor to the overall page building process. On the other hand, if you aren't running custom code, but you've noticed that certain pages are exceptionally slow due to these calls, then you may have poorly organized content. For example, you may have your template content defined as "editorial" content, instead of "design" content; or you may have a lot of template content located in libraries instead of in the templates themselves. These cases will result in inefficient content searches, and more resource consumption when building pages.

SQL Optimization

If profiling indicates that SQL queries are responsible for your performance problems, the first step to take is to get a snapshot of the SQL queries that are being dispatched.

in exsite.conf, set
```
log.db = 3
```
This enables full diagnostic logging on your database objects. Ensure that your db log file (usually db.log) is world-writeable, ie. mode 666. You may want to remove old log entries from this file to make it easier to inspect.
execute the URL again, either via the cgi program, or just call it directly from your browser.
inspect the log file db.log. Among other messages you will find every SQL query that is executed via the SQL.pm library.
don't forget to set db.log back to its original value, or you will may end up generating giant log files in the meantime.

The complete list of queries may give you clues as to where inefficiencies lie, for example:

poorly composed or inefficient queries
repetitive or unnecessary queries
multiple similar queries that could be grouped, or prepared in advance
opportunities for caching improvements
tables that could benefit from indexes or other optimizations

Note that if you shortcut around the ExSite database objects to execute your queries (for instance by using the DBI objects directly), then your queries will not be logged using this method.

You can execute the SQL queries directly in MySQL to measure their individual performance. MySQL will print the execution time of the query, after showing the query results. Note that due to caching of results, subsequent runs of the same query will tend to be much faster than the initial query. Pay particular attention to the first result to get a sense of the uncached performance.

Debugging with Persistent Perl

Using Persistent Perl (aka SpeedyCGI) is one way to get a big performance boost out of your Perl CGI programs. One problem is that the ExSite debugging utility, cgi does not work with Persistent Perl. That is because the ExSite code is technically not being interpreted directly by Perl, so the attempts to invoke the debugger are ignored.

The simple work-around is to switch the CGI program back to basic Perl before commencing debugging work. This means changing the shebang line back to "#!/usr/bin/perl" or equivalent. Once your debugging is complete, you can revert the shebang line back to Persistent Perl.