Wednesday, March 27, 2013

Generating and viewing awstats reports

Generating and viewing awstats reports


Now that awstats is installed we take a look at actually running the analysis and viewing the reports.




Awstats in action


If you followed along with the first part of this series you should have awstats installed and configured for your site. In this article we'll look at a simple approach to report generation from the command line.

This approach will create static html pages to display your web traffic.

Build a report


Time to tell awstats to generate your reports. Fortunately for our "start with something simple" approach, there's a script that rolls generating several reports into one step.

awstats_buildstaticpages.pl


We're going to use a script that's included with awstats, "awstats_buildstaticpages.pl". This script updates the stats and generates a bunch of standard reports, using the main "awstats.pl" script behind the scenes. For a closer look at what reports this script will build, check the awstats online documentation.

For our example the command would look like:
sudo /usr/local/awstats/tools/awstats_buildstaticpages.pl -update -config=www.example.com -dir=/home/demo/public_html/example.com/public/webstats -awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl

Okay, yeah. I admit that's kind of long. But it's not as scary as it seems, honest. Especially since you won't have to memorize it.

Let's break that down so you know what to put where.

The script itself


/usr/local/awstats/tools/awstats_buildstaticpages.pl

This part is the script we're running, "awstats_buildstaticpages.pl". If you installed to a location other than "/usr/local/awstats" you'll want to change this part to point to the actual location of the script on your machine.

The -update option


-update

Including "-update" at the beginning of the options tells the script to update the stats analysis before generating the reports.

The -config option


-config=www.example.com

The "config" value should be the main domain name for the site. Note that this domain matches up with the name of the config file you created in the first part of this series. The name of your config file should have "awstats." before the main domain name, and ".conf" after it, since that's pretty much what this script will be looking for.

In short, replace "www.example.com" with your main domain name.

The -dir option


-dir=/home/demo/public_html/example.com/public/webstats

The "-dir" option refers to the directory where you want awstats to create its reports. That directory should contain an "awstatsicons" directory containing awstats' standard image files.

The -awstatsprog option


-awstatsprog=/usr/local/awstats/wwwroot/cgi-bin/awstats.pl

For "-awstatsprog" you'll want the value to be the location of the "awstats.pl" script, which is the main awstats script. If you installed awstats someplace other than "/usr/local/awstats", adjust accordingly.

The script's results


Once you run that big command (all on one line) you should see that the script launches the awstats update process, then tells you about every one of the 20 reports it's generating.

If the script encountered an error it should give you some troubleshooting advice (like making sure you used the right "config" identifier).

Note that the last line, the "Main HTML page" line, gives the main page of the report.

If you take a look in your reports directory you should now see a bunch of html files there:
$ ls /home/demo/public_html/example.com/public/webstats                              
awstats.articles.slicehost.com.alldomains.html
awstats.articles.slicehost.com.allhosts.html
...

View the report


Now we get to see the results of our hard work. Point your browser to the "main html file" that was identified by the script we ran to generate the report.
http://www.example.com/webstats/awstats.www.example.com.html

The important part here is working out the address you'll use to view the reports. If you discover at this point that you created the reports in a directory you can't see from a browser, you may want to make a new reports directory. Edit your awstats config file accordingly, then run the report generation again to make sure it works with the new directory.

If all goes well you'll see something like:

Awstats example

(Without the smudges, of course.)

You might see less than a day's worth of traffic in this initial report, or perhaps a week, depending on how often your web logs are rotated. So not a lot that's interesting just yet, but enough to make sure the reports were generated properly.

Visits, hits, pages and bandwidth


There are a bunch of reports available, linked at the top of your main report's page. The main statistics you'll see at the beginning of the report bear some quick explanation, just so you know what you're looking at.

Unique visitors


The "unique visitors" stat tracks the number of different visitors your site received. For awstats this mostly means the number of unique IP addresses it saw in your web logs. This number isn't perfectly accurate, since visitors behind proxy servers and home routers can throw it off a bit (since those visitors would only appear in your web logs under the IP addresses belonging to the proxies or routers).

Number of visits


This stat tracks how many times visitors came back to the site. A "visit" for these purposes will encompass all page hits from a visitor within an hour or so of each other. If the same IP address appears in the web logs the next day that would count as a second visit.

Pages


A "page", in web traffic terms, is the main page of a visited URL. This would be the HTML or PHP file that was requested by the visitor. If a page includes the contents of other HTML files, only the main page is counted as a "page" in the traffic stats.

Hits


Pretty much everything a web browser asks for from a site is a "hit". The main page, headers and footers, images, videos — everything the browser has to ask for is a hit. A complex site will produce a fair number of hits per page visit.

Bandwidth


In the combined log format the web server records the size of all the requests and responses that get sent between the browser and the server. The total of all the outgoing response sizes is the "bandwidth" statistic in awstats. This is not necessarily the total bandwidth used by the site — it's just the total bandwidth that got recorded in your web server's access logs.

A note about referer spam


You may notice that the "referer" information in your reports contains links to referring web sites. This is useful for checking out sites that are linking to you but there's a potential drawback to putting this information on a web page, and that's "referer spam".

There's a school of thought among less-reputable web admins that encourages doing whatever you can to increase your search engine ratings. One of those tactics involves finding a site with a publicly-accessible web stats page and then running a script that visits the site a bunch of times using their web site as a referrer. The theory is that search engines will count the stats page as another site linking to their site.

In practice it doesn't work that well (most major search engines are wise to the practice and account for it), but that doesn't mean we should encourage the inconsiderate jerks to keep trying it.

The preferred method to keeping the stats pages from being used for spamming purposes is to protect the stats directory from unauthorized access. You can do that by password-protecting that part of the site, or by restricting access to that site to just localhost and using ssh tunneling to view your stats.

If you want to keep your stats public you should at least modify your site's "robots.txt" file to tell the major search engines not to index your stats pages. If you don't have a robots.txt file in the document root of your site this is a good time to create one.

Inside the robots.txt file you just need to add a "Disallow" rule for the web stats directory. If you don't have a robots.txt file already, you can use something like the following:
User-agent: *
Disallow: /webstats/

That would tell any robot that complies with the robots.txt file not to index the "webstats" part of the site. That way your stats site won't show up on major search engines at all, defeating the purpose of any efforts to manipulate your referers report.

If you want your web stats to show up on search engines for some reason, then at least tell robots not to index the referer page report:
User-agent: *
Disallow: /webstats/awstats.www.example.com.refererpages.html

No comments:

Post a Comment