The Webalizer is a fast web server log file analysis program. It produces highly detailed, easily configurable usage reports in HTML format, for viewing with a standard web browser.
Reports
By default, The Webalizer produces two kinds of reports - a yearly
summary report and a detailed monthly report, one for each analyzed
month.
The yearly summary report provides such information as the number of
hits, file and page requests, hosts and visits, as well as daily
averages of these counters for each month. The report is accompanied by
a yearly summary graph.
Each of the monthly reports is generated as a single HTML page
containing a monthly summary report (listing the overall number of
hits, file and page requests, visits, hosts, etc), a daily report
(grouping these counters for each of the days of the month), an
aggregated hourly report (grouping counters for the same hour of each
day together), a URL report (grouping collected information by URL), a
host report (by IP address), website entry and exit URL reports
(showing most common first and last visit URLs), a referrer report
(grouping the referring third-party URLs leading to the analyzed
website), a search string report (grouping items by search terms used
in such search engines as Google), a user agent report (grouping by the
browser type) and a country report (grouping by the host's country of
origin).
Each of the standard HTML reports described above lists only top
entries for each item (e.g. top 20 URLs). The actual number of lines
for each of the reports is controlled by configuration. The Webalizer
may also be configured to produce a separate report for each of the
items, which will list every single item, such as all website visitors,
all requested URLs, etc.
Overview
Website traffic analysis is produced by grouping and aggregating
various data items captured by the web server in the form of log files
while the website visitor is browsing the website. Some of the most
commonly-used website traffic analysis terms are listed below:
- URL
A Uniform Resource Locator (URL) uniquely identifies the resource requested by the user's browser. - Hit
Each HTTP
request submitted by the browser is counted as one hit. Note that HTTP
requests may be submitted for non-existent content, in which case they
still will be counted. For example, if one of the five image files
referred by the example page mentioned above is missing, the web server
will still count six HTTP requests, but in this case, five will be
marked as successful (one HTML file and four images) and one as a failed request (the missing image) - Page
A page is a successful HTTP request for a resource that constitutes
primary website's content. Pages are usually identified by a file
extension (e.g. .html, .php, .asp, etc) or by a missing extension, in
which case the subject of the HTTP request is considered a directory
and the default page for this directory is served. - File
Each successful HTTP request is counted as a file. -
Visitor
A visitor is the actual person browsing the website. A typical
website serves content to anonymous visitors and cannot associate
visitors with the actual person browsing the website. Visitor
identification may be based on their IP address or an HTTP cookie.
The former approach is simple to implement, but results in all visitors
browsing the same website from behind a firewall counted as a single
visitor. The latter approach requires special configuration of the web
server (i.e. to log HTTP cookies) and is more expensive to implement.
Note that neither of the approaches identifies the actual person
browsing the website and neither provides 100% accuracy in determining
that the same visitor has visited the website again. -
Visit
A visit is a series of HTTP requests submitted by a visitor with the
maximum time between requests not exceeding a certain amount configured
by the webmaster, which is typically set at 30 minutes. For example, if
a visitor requested page A, then in 10 minutes page B and then in 40
minutes page C, then this visitor has generated two visits, one when
pages A and B were requested and another when the page C was requested. -
Host
In general, a host is the visitor's machine running the browser.
Hosts are often identified by IP addresses or domain names. Those web
traffic analysis tools that use IP addresses to identify visitors use
the words hosts, domain names and IP addresses interchangeably. - User Agent
User agent is a synonym for a web browser.
In order to illustrate the difference between hits, pages and files,
let's consider a user requesting an HTML file referring to five images,
one of which is missing. In this case the web server will log six hits
(i.e. one successful for the HTML file itself and four for successfully
retrieved images and one for the missing image), five files (i.e. five
successful HTML requests) and one page (i.e. the HTML file).