Analog for Multiple Sites

Analog is a fast and flexible web log analysis tool.  Its configuration can consist of several files nested using include statements.  This allows common configuration items to be grouped in separate files.  The minimal site specific configuration items can be contained in small include files.  Similarly, time period specific include files allow for reports by time period to be easily configured.  Each report then requires a configuration file, which includes a few other files.

I have reviewed and updated my previous documentation for analog.  This site is hosted on a new server, and I needed to setup analog for the new server.  I also made changes to the list of virtual sites being hosted.  I generate report sets for each site as well as an overview report for all sites.  Each report set includes reports for covering the latest week, month, and year of data. 

Setting up Apache2

To be able to report on multiple sites, it is important to record the site information in the access log files.   The vhost log format is designed to do this.  This allows for a single log file include file for all sites.  Alternatively, each site can have its own access log file.   Analog can be configured to allow you mix both types of log files, should you wish to change format without modifying existing files.

The DEFAULTLOGFORMAT you use must match your log files.  You can specify multiple formats.  The access log format I use is a variation on the vhost_combined format. It differs from the Apache vhost_combined format as follows:

  • the remote host is recorded by address rather than by name;
  • the remote logname is replaced with the time taken to serve the request.
LogFormat "$v:%p %a %T %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" local
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

Common Configuration

The documentation includes a number of sample configuration files.  On Ubuntu or Debian systems these are located in /usr/share/doc/analog/examples. You may find bigbyrep.cfg is a good starting point for your reports.  This documentation also contains pointers to sites which have configurations for robots, search engines, typealiases, and Spam sites. The configuration below assumes you have downloaded these, and includes them.

Create a directory for your configuration files. This example uses ${HOME}/etc. Place all the files you have selected in it. This will simplify your configuration.

These examples assume all reports are on a common reporting site. You will need to create directories on a web site for the reports and images on each site hosting reports.  These directories need to be writable by the user running the reports, and readable by the web server.  These examples do not include securing access to the reports from the web site.  Do not run these report as root or the Apache servers user-id.

Tune bigbyrep.cfg configuration for one site using one or two log files.  This will become the basis for all your reports.   Create a copy of your bigbyrep.cfg file as bigbyrep.inc.  Ensure all the lines in Logfile Input lines are commented out.  Also ensure the lines for IMAGEDIR, CHARTDIR, and LOCALCHARTDIR, are commented out.  These lines will be supplied on a per report basis.  Now you can create your common.inc file containing something like.

# common.inc
#### Basic local configuration

# Header information - Modify as appropriate or move to vhost file if it varies by site
HOSTURL /webstats/index.html
LOGO    /graphics/icon.gif

# Cache DNS Look-ups - use default (/var/cache/analog/dnscache)
# User must be granted privileges on this file,
# or a different file may be specified here
DNS WRITE
DNSGOODHOURS 1440

# Enhanced vhost log format - Adjust for your format - you can supply multiple formats
# Only includes port number in vhost identification if it is not the default http port.
DEFAULTLOGFORMAT (%v:80 %s %t %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B")
DEFAULTLOGFORMAT (%v %s %t %u [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B")
DEFAULTLOGFORMAT (%s %j %j [%d/%M/%Y:%h:%n:%j] "%j %r %j" %c %b "%f" "%B")

# Exclude internal hosts
HOSTEXCLUDE 127.0.0.1
HOSTEXCLUDE 192.0.2.*
HOSTEXCLUDE *.example.com

# Exclude monitoring requests
BROWEXCLUDE check_http*

# Reporting Period - Starting Yesterday - default to Yearly report
FROM -01-00-00:0000
TO   -00-00-00:0000

#### External configuration files
CONFIGFILE etc/bigbyrep.inc
CONFIGFILE etc/SearchEngines.txt
CONFIGFILE etc/RobotInclude.txt
CONFIGFILE etc/RefSpam.txt

#### Overrides
DAILYREP OFF
DAILYSUM ON
REQINCLUDE *.php
PAGEINCLUDE *.php
WARNINGS ON
WARNINGS -MR

Setting up the first report

Create a configuration file for your first report.  Once you have these files created, test the report using the command analog +getc/example_month.conf. You could also use the run_analog.sh script provided below to run the report.

You should have a naming file for your configuration files.  These examples use a name consisting of the components: site, purpose, and time period.  Usually only two components are required to specify a file.

You will also need a naming standard for the OUTFILE and CHARTDIR parameters. The examples use the same approach as for configuration files. LOCALCHARTDIR prefixes, CHARTDIR with the directory path from OUTFILE. You will want a different standard if your are placing the reports on each vhosts’ site, and the directory path will change for each site.

The example_month.conf file contains seven configuration lines.  The first four lines specify the output. The remaining lines include the specifications.

# example_month.conf
HOSTNAME "Example.com - Month"
OUTFILE /var/www/webstats/example_month.html
CHARTDIR images/example_m_
LOCALCHARTDIR /var/www/webstats/images/example_m_
CONFIGFILE etc/common_month.inc
CONFIGFILE etc/example_vhost.inc
CONFIGFILE etc/log_month.inc

The nested files also contain minimal information.  The contents of  common_month.inc are site independent as are most include files used here.

# common_month.inc
CONFIGFILE etc/common.inc
FROM -00-01-00:0000
WEEKLY ON

The contents of vhost_example.inc specifies how to select records for example.com.   This file could include site specific header information instead of supplying this information in the common.inc file.  Any overrides for the site should also be included in this file.

# vhost_example.inc
VHOSTINCLUDE www.example.com

The contents of monthly_log.inc specifies which files to use.  This limits the number of records not required for the report.  If you have access logs separated by vhost, you will need a log file include per site or report.

# log_month.inc
LOGFILE /var/log/apache2/access.log
LOGFILE /var/log/apache2/access.log.1
LOGFILE /var/log/apache2/access.log.2.gz
LOGFILE /var/log/apache2/access.log.3.gz
LOGFILE /var/log/apache2/access.log.4.gz
LOGFILE /var/log/apache2/access.log.5.gz

Adding Weekly and Yearly reports

Copy example_month.conf to example_week.conf.  Change the HOSTNAME, OUTFILE, CHARTDIR, and LOCAL_CHARTDIR parameters to unique values.  Replace common_month.inc with common_week.inc, and log_month.inc with log_week.inc.  Test the configurations as above.

# example_week.conf
HOSTNAME "Example.com - Week
OUTFILE /var/www/webstats/weekly_example.html
CHARTDIR images/w_example_
LOCALCHARTDIR /var/www/webstats/images/w_example_
CONFIGFILE etc/common_week.inc
CONFIGFILE etc/example_vhost.inc
CONFIGFILE etc/log_week.inc

Create common_week.inc.  This selects the appropriate time period, and provides the common configuration.  As we have only one week we turn off the weekly report.

# common_week.inc
CONFIGFILE etc/common.inc
FROM -00-00-07:0000
WEEKLY OFF

Create log_week.inc.  This specifies the log files included for the weekly report.

# log_week.inc
LOGFILE /var/log/apache2/access.log
LOGFILE /var/log/apache2/access.log.1

Copy example_month.conf to example_year.conf.  Change the HOSTNAME, OUTFILE, CHARTDIR, and LOCAL_CHARTDIR parameters to unique values.  Replace common_month.inc with common_year.inc, and log_month.inc with log_yearly.inc.

# example_year.conf
HOSTNAME "Example.com - Year"
OUTFILE /var/www/webstats/example_year.html
CHARTDIR images/example_y_
LOCALCHARTDIR /var/www/webstats/images/example_y_
CONFIGFILE etc/common_year.inc
CONFIGFILE etc/example_vhost.inc
CONFIGFILE etc/log_year.inc

Create the common_year.inc file.   The supplies the common configuration and enables yearly reports.  The time period was specified in common.inc, and has been overridden in the other time period include files.

# common_year.inc
CONFIGFILE etc/common.inc
MONTHLY ON

Create the log_year.inc file. The example below includes all log files.  It could be modified appropriately if you retain far more than a years’ access log files in the log directory.

# log_year.inc
LOGFILE /var/log/apache2/access.log*

Adding new sites

Adding a new site consists of creating a few small files.  You will need a new vhost specification, and a new .conf file for each report.

If the access logs are separated by site you will need a logfile include  file per site or report.   Alternatively, include the LOGFILE specifications in the .conf file.

This example is for the site mail.example.com.

# mail_example_vhost.inc
VHOSTINCLUDE mail.example.conf

Copy example_month.conf to mail_example_month.conf. Edit as above changing the vhost include file. Create the weekly and yearly files as was done for the original site. Test these new reports.

# mail_example_month.conf
HOSTNAME "Mail.Example.com - Month"
OUTFILE /var/www/webstats/mail_example_month.html
CHARTDIR images/mail_example_m_
LOCALCHARTDIR /var/www/webstats/images/mail_example_m_
CONFIGFILE etc/common_month.inc
CONFIGFILE etc/mail_example_vhost.inc
CONFIGFILE etc/log_week.inc

Reporting all sites

Adding a all site consists of creating new .conf files. If your log files are split by site you will need a new log include files as well.

This example is for the allsites monthly report. Copy example_month.conf to all_month.conf. Edit as above replacing the dropping the VHOSTINCLUDE specification, and turning on the VHOST report. Create the weekly and yearly files as was done for the original site. Test these new reports.

# all_vhost.inc
VHOST ON
# all_month.conf
HOSTNAME "All Sites - Month"
OUTFILE /var/www/webstats/all_month.html
CHARTDIR images/all_m_
LOCALCHARTDIR /var/www/webstats/images/all_m_
CONFIGFILE etc/common_month.inc
CONFIGFILE etc/all_vhost.inc
CONFIGFILE etc/log_week.inc

Scheduling Report Generation

You will need a script to run the reports. The following script will run one or more reports. It defaults to running all the reports.

#!/bin/sh -x
# run_analog.sh - Run the analog jobs

CONF_DIR=$HOME/etc

# Needs to be above the configuration directory
cd ${CONF_DIR}/..

# Get the list of config files
CONFIGS="$*"
[ -z "${CONFIGS}" ] && CONFIGS='*.conf'

# Run all the conf files (1 per report)
for CONF in $CONFIGS; do
    CONF=$(basename ${CONF})
    nice analog +g${CONF_DIR}${CONF}
done

# EOF

Schedule this script to run at appropriate times. You can run all reports, or schedule report sets at appropriate times.  You may want to run the weekly reports Monday morning, and the Monthly and Yearly reports on the first of the month.  Avoid running two sets of reports at the same time.

Final Cleanup

If you have more than one report in a directory, create an index.html file for the directory.   If this is the target the HOSTURL parameter it will make it easier to navigate the reports.

Restrict access to the reports and images using .htaccess or changes to the Apache configuration.