When we were young, we played with hit counters. Now we are grown, we need something more. Log-based web analytics programs are hit counters for adults.
Hit counters aren’t the only programs that count hits. In fact, many programs will record those file access requests, and perhaps the most popular ones are the web servers, themselves. Since it’s the job of the web server to serve up those files, it only makes sense that web servers like Apache and Microsoft’s Internet Information Services (IIS) would keep track of that activity. These logs routinely include details of browser visits, an example of which is reproduced below, with identifying information altered to protect the source:
pool-71-0-107-3.locfl.dsl-w.isp.net - - [01/Sep/2006:07:17:15 -0400] "GET /cgi-bin/page.cgi?g=New%2F;d=1 HTTP/1.1" 200 7409 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6"
pool-71-0-107-3.locfl.dsl-w.isp.net - - [01/Sep/2006:07:17:20 -0400] "GET /cgi-bin/user.cgi?d=1 HTTP/1.1" 200 8181 "http://www.sample.com/cgi-bin/page.cgi?g=New%2F;d=1" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6"
pool-71-0-107-3.locfl.dsl-w.isp.net - - [01/Sep/2006:07:17:23 -0400] "POST /cgi-bin/user.cgi HTTP/1.1" 200 7091 "http://www.sample.com/cgi-bin/user.cgi?d=1" "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6"
These three lines were taken from the logs of an Apache server in Combined Log Format (CLF), recorded on September 1, 2006, at 7:17 AM EDT. They show the hostname or IP address of the connecting party, the date and time, the HTTP instruction processed (GET or POST ), the results of the instruction (server response code 200 means the server found no errors in the request), the amount of data passed as a result, the referrer URL (where the request came from), the kind of browser making the request, and the PC environment running the browser.
Reading the log entries would give you a good idea of the kind of activity your site receives. The entries would easily let you learn how many visits your site has had, and you can see which pages are the most popular. By making note of the connecting host and the timestamp, you could follow a visitor through your site, and see what interests him or her. All it takes is a bit of digging through the logs. Most people, however, would probably find it time consuming and dull.
Sure beats log reading
It is this awkward presentation that led developers to produce web analytics software in the first place. Log-based web analytics software imports or reads the web server log file and produces a series of graphs and tables. These user-friendlier graphs and tables take much of the drudgery out of “counting hits.” Popular examples of log-based web analytics programs include Webalizer and AWStats, best known because they are free products often used by web hosts as a benefit for clients.
It might be fair to classify Webalizer and AWStats as log readers, rather than analytics programs. Making web server logs readable is what they specialize in, after all. Webalizer starts with bar graphs permitting a comparison of site visits over the past year, along with a chart offering a summary of the major statistics over that same period. Clicking on a month gives you details on that month’s visits, all collected from the server’s logs. When you have nothing else, it’s handy, which is why (along with the price) so many web hosts provide it as a free benefit of hosting with them.
But there is more to analytics than simply restating the logs in a friendly manner. We’ll look at what the commercial log-based web analytics programs can do in a future blog.
Tag:
Add to Del.icio.us | Digg | Yahoo! My Web | Furl
Michael Pedone is the President / CEO of eTrafficJams.com, a search engine optimization and website marketing company http://www.etrafficjams.com> located in Clearwater, Florida that specializes in getting targeted, eager-to-buy traffic to your site. You can catch him blogging at: http://www.etrafficjams.com/blog/>.