Wednesday, March 27, 2013

Scheduling awstats report generation

Scheduling awstats report generation


We've looked at running awstats reports, but only manually. Let's automate report generation so all you need to worry about is looking at those sweet, sweet numbers.




Automating awstats


In the previous article in this series we set up awstats for your site and ran an update of the reports manually. That's all well and good, but the command to update is big and ugly, and it would be kind of a pain to have to run it every time you want to view updated stats.

Fortunately Linux is chock full of ways to automate stuff. One way is to create a cron script to do all the updating, but since we're shooting for simple we should use a tool that's already processing your web logs on a regular basis. We can piggyback our updates onto logrotate's regular rotation tasks.

Scheduling reports with logrotate


With this approach we'll take advantage of the fact that logrotate is already performing regular log rotation for your domain. I mean, you do have your logs rotating automatically, right?

If not, visit this article series on logrotate and follow the directions there to set up log rotation for your virtual host. Log rotation keeps those logs from becoming giant disk-space-eating behemoths, so it's a very good idea. Really, do it now. I'll wait.

Now that we're certain you have logrotate managing your web logs, let's add a step to what logrotate does when it performs the rotation.

Editing the logrotate entry


Let's look at the logrotate.d file for a virtual host that's just a modification of the default entry for apache on Ubuntu:
/home/demo/public_html/example.com/logs/*.log {
weekly
missingok
rotate 52
compress
delaycompress
notifempty
create 640 root adm
sharedscripts
postrotate
if [ -f "`. /etc/apache2/envvars ; echo ${APACHE_PID_FILE:-/var/run/apache2.pid}`" ]; then
/etc/init.d/apache2 reload > /dev/null
fi
endscript
}

Don't worry too much about most of those entries if you haven't seen them before. There are only a couple important bits to note here.

First, take a look at the "weekly" directive in that logrotate file. That's okay for simple log rotation, but you probably want your web traffic stats updating daily. In that case you'd want to change that to "daily" so the rotate script runs more frequently, and possibly modify the "rotate 52" entry to keep more archived log files.

Next we look at what we want to add to the logrotate process. See that "postrotate" block up there? Don't worry about what's inside, just note that it's there. What that does is run some stuff after the log rotation is done (in this case, tell apache to reload if it's running). The reason we're looking at it is because there's also a "prerotate" directive that we can use to have awstats run through a log file before it's rotated.

The "prerotate" directive should run the stats update and generate the reports. And it should run just like the command we ran to get our reports. We would create a prerotate block like:
prerotate
/usr/local/awstats/tools/awstats_buildstaticpages.pl -update -config=www.example.com -dir=/home/demo/public_html/example.com/webstats -awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl > /dev/null
endscript

The "> /dev/null" bit redirects the normal output of the command so it doesn't get sent to the console or emailed to root.

Inserted into the existing logrotate file for our virtual host, the whole thing would look like:
/home/demo/public_html/example.com/logs/*.log {
daily
missingok
rotate 52
compress
delaycompress
notifempty
create 640 root adm
sharedscripts
prerotate
/usr/local/awstats/tools/awstats_buildstaticpages.pl -update -config=www.example.com -dir=/home/demo/public_html/example.com/public/webstats -awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl > /dev/null
endscript
postrotate
if [ -f "`. /etc/apache2/envvars ; echo ${APACHE_PID_FILE:-/var/run/apache2.pid}`" ]; then
/etc/init.d/apache2 reload > /dev/null
fi
endscript
}

And with that, every time the logs get rotated, the web stats get updated too.

Why use logrotate?


Instead of using logrotate to run the web stats update we could just schedule the stats to update through cron. So why insert the commands into logrotate?

The main reason is accuracy. By sticking a web stats update into the log rotation process we make sure that awstats is looking at log entries up until the last possible moment, when the log is rotated. If you have the web server reloading instead of restarting there might be a couple log entries missed (since the old log file would still be used as the old connections finish). The information lost in that case is negligible and not worth the downtime required to fully restart the web server.

If you want the web traffic reports to update more often than the web logs are rotated, you can use cron to run the update and report-building scripts on a more frequent schedule (like hourly). Awstats will recognize log entries that have already been processed and skip them when analyzing the data.

Monthly reports


Something you'll notice if you leave awstats running the way we've set it up is that the reports being generated only go into detail for the current month. Dividing the information up by month is a decent interval, but you may want to look at previous months in detail instead of only the current one.

So it's entirely optional, but if you'd like to keep monthly reports around it's really just a matter of tailoring a shell script to fit your site.

A cron script


We'll name the script after the awstats config file for the site in question and put it in the cron monthly directory. Using "www.example.com", we would create the file:
/etc/cron.monthly/awstats.www.example.com

Inside the file put the following script:
#!/bin/sh
#
# Run the awstats build script to generate a report for last month.
#

# Modify these 3 variables for your environment:
#
# The location of the awstats installation
AWSTATSDIR=/usr/local/awstats

# Your main domain, as you reported it to awstats
DOMAIN=www.example.com

# The directory where you're storing the reports for this domain
REPORTDIR=/home/demo/public_html/example.com/public/webstats

LASTMONTH=`date -d "last month" +%B`
LASTMONTHNUM=`date -d "last month" +%m`
LASTMONTHDIR=$REPORTDIR/$LASTMONTH

mkdir -p $LASTMONTHDIR
cp -Rf $AWSTATSDIR/wwwroot/icon $LASTMONTHDIR/awstatsicons

$AWSTATSDIR/tools/awstats_buildstaticpages.pl -month=$LASTMONTHNUM -update -config=$DOMAIN -dir=$LASTMONTHDIR -awstatsprog=$AWSTATSDIR/wwwroot/cgi-bin/awstats.pl > /dev/null

Change the values of "AWSTATSDIR", "DOMAIN", and "REPORTDIR" to match your environment and site.

After saving the file make your new script executable:
sudo chmod a+x /etc/cron.monthly/awstats.www.example.com

You can even run that script now to test it (and to get a detailed report for last month's data, if you have some). If you do, be sure to run it using sudo.

What it does


When the script runs it creates (if it doesn't already exist) a directory named after last month. If the current month is September, the directory will be named "August" (or the equivalent for your machine's locale setting). The awstats icons directory will be copied there, then a report will be generated for last month and put in the month's directory.

To look at a monthly report you'll use a similar URL to viewing the current report, but you'll insert the name of the month you want to view in front of the page name. The first letter of the monthly directory names is capitalized, so it will have to be capitalized in the URL used to visit a monthly report as well.

To take our example URL from earlier and use it to look at the August statistics we would change it to:
http://www.example.com/webstats/August/awstats.www.example.com.html

Keeping more monthly reports


Note that the way the script is written the monthly reports get replaced every year (since they're only made available by month name, not by year).

If you want monthly reports to never get overwritten you can modify the script to add the year to the directory names. Change the line that defines "LASTMONTH" above to something like:
LASTMONTH=`date -d "last month" +%B-%Y`

If the month were September, the above line would cause the script to use the directory name "August-2010".

Further reading


You have a solid but basic installation of awstats now. If you want to get more out of awstats there are a few advanced features you can look into.

For starters, you can browse the awstats documentation online.

If you want to generate reports on the fly you can do so by setting up the main awstats.pl script to run as a CGI script from a web browser. It would be a good idea to still run the stats update through a schedule. You'll want to be familiar with using CGI with a web server, and be aware of the risks that can be involved (for both performance and security). Then check theawstats install docs and take a look at the script they provide for automating aspects of that configuration.

If you want to extend awstats there are plugins available on the project's web site. A couple of the more interesting ones let awstats determine the country of origin of visitor IP addresses without launching a lot of cumbersome DNS lookups.

With a little tweaking and help from the documentation, awstats can also be used to build reports for mail and FTP servers.

Digging through all the options in the config file will give you an idea of what sorts of changes you can make to reports and their formatting.

No comments:

Post a Comment