Wednesday, March 27, 2013

Customizing apache web logs

Customizing apache web logs


You can create your own custom formats for apache web logs, to record more information or to make them easier to read. Here's how.




Changing the log format


If you know how to read web logs then you may have an idea of how you would want to write them differently — maybe add a little here, trim a little out there, switch the order around a bit. Luckily, you can do that with the access logs through a couple built-in commands and a handful of log variables.

LogFormat


Apache's "LogFormat" directive is what lets you define your own access log setup. Let's look at how that directive would be used to define the combined log format (CLF):
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

That first argument, in quotes, is the string that describes the log format. The last argument, "combined", gives a nickname to the format that can be used by CustomLog later on.

That format string contains a bunch of placeholders that describe the data to be included in the log. That first one, for example, is "%h" and represents the IP address of the visitor (the identifier for their host). A bit further on, "%t" represents the time of the request.

Components of the CLF


Let's look at that CLF format string side-by-side with an access log entry in the format:
%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"
123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "POST /wordpress3/wp-admin/admin-ajax.php HTTP/1.1" 200 2 "http://www.example.com/wordpress3/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.472.25 Safari/534.3"

Okay, they don't look too pretty together, but there is a correlation between each element in the format string and the components of the log entry below it. Breaking down what the stuff in the format string means:
%h      The remote host
%l The remote logname (usually just "-")
%u The authenticated user (if any)
%t The time of the access
\"%r\" The first line of the request
%>s The final status of the request
%b The size of the server's response, in bytes
\"%{Referer}i\" The referrer URL, taken from the request's headers
\"%{User-Agent}i\" The user agent, taken from the request's headers

So reading along, we see that in place of "%h" is "123.65.150.10" - the remote host. And after that, "%l" becomes "-" for the remote log host, "%u" turns into "-" for the remote user (since this connection didn't require authentication), "%t" is replaced with "[23/Aug/2010:03:50:59 +0000]" because it's the time the request was sent, and so on.

Note that the places in the log format where a quote character (") was used, it was escaped in the format string with a backslash (\"). The escape is there because if it were a quote symbol by itself, LogFormat would think the format string was complete at that point. The backslash tells it to keep reading.

The last two parts of the format, the referrer and the user agent, use a format component that requires an argument — in this case which header should be extracted from the request by %i. The referrer and user agent headers are, appropriately, named "Referer" and "User-Agent", respectively.

Well, mostly appropriately. "Referer" is misspelled. That's the spelling of the header name in the HTTP standards, however, so it is "Referer" for all time when talking about web link referrers. A bit of lexicographical trivia for you there. Enjoy.

Other format components


Apart from what we saw in our breakdown of the combined log format, there are other components you can include in a LogFormat entry. Some commonly-used components are:
%{cookie}C

The contents of the cookie named "cookie" for the request.
%{header}i

The contents of the HTTP header named "header" for the request.
%{VAR}e

The contents of the environment variable "VAR" for the request.
%k

The number of keepalive requests handled by the connection that spawned the logged request. The first time a request is sent the keepalive value will be zero, but each subsequent request that uses the same keepalive connection will increase that number by one. This can be handy for seeing how many requests a keepalive connection handles before it's terminated.

If keepalives aren't enabled this value will always be zero.

If you only see very low numbers for the keepalives value in a log but have a long keepalive timeout set, then it may be worth trying a much shorter timeout for keepalives. That way apache won't be maintaining connections in memory for longer than it needs to.
%T

How long the server took to serve the request, in seconds.
%v

The ServerName of the virtual host the request was sent to. This format code can be handy if you're writing more than one virtual hosts' accesses to the same log file.

For a full list of format components see the apache documentation for LogFormat.

Make your own log format


While the LogFormat entry is useful for interpreting what appears in the logs, it can also be used to create your own formats.

If you want your log to add the length of time it takes to serve requests to its access entries, you might make a LogFormat directive that looks like:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %T" timed_combined

All we have to do is add a "%T" to the end of the format string, then give it a new nickname — for our example, "timed_combined".

Using the new log format


Now, if you want to tell your virtual host to make an access log using the new format, you can include in the virtual host definition:
CustomLog /var/log/apache2/timed.log timed_combined

To recap: A LogFormat directive takes a format you give it and assigns it a nickname you choose. Then you use CustomLog to tell apache to write the access log using the new format by telling it where to write the log and the nickname of your log format.

Adding more custom logs


You can have more than one CustomLog directive for a virtual host. If you already have a CustomLog using the "combined" format, you don't have to remove it when adding your "timed_combined" log. This can be useful if you want to maintain one log in CLF that a web log analyzer program can read and another log file with just the information you care about when you're skimming the entries.

So if you wanted another log with just the stuff you wanted in it, you might take that "timed_combined" format and remove the things you feel are distractions. If you decided to remove the remote log entry, the user entry, and the user agent entry, you could create that format with:
LogFormat "%h %t \"%r\" %>s %b \"%{Referer}i\" %T" slim

And then add a new CustomLog directive to use the "slim" format:
CustomLog /var/log/apache2/slim.log slim

Precedence


Note that any logs defined in a virtual host will override log directives in the main apache config file. So if the main config file has the CustomLog entry:
CustomLog /var/log/apache2/access.log combined

And the virtual host has another CustomLog entry:
CustomLog /var/log/apache2/example.com.log combined

Then the virtual host will log its accesses to the "example.com.log" file, but not to the "access.log" file. If you wanted accesses to be logged to both files, you would need to include a line for the main access.log file in the virtual host definition, as in:
CustomLog /var/log/apache2/access.log combined
CustomLog /var/log/apache2/example.com.log combined

Rotating new logs


When you create any new logs, you should remember to configure logrotate to rotate them regularly. Otherwise they may grow and grow until they eat all your disk space right up. Any logs in the default apache log directory should get rotated under apache's default rules, but if you put a new log in another directory you may need to add a rule to logrotate.

No comments:

Post a Comment