Abstract
Practice Perl scripting by analyzing a web server log file.Details
Your company's web server is starting to see some action. However, when skimming the logs of your web server, you start to see activity that doesn't appear to be generated by a human. Many of the requests come from clients with search engine names, and so you wonder how much of your web server traffic is resulting just from those robots, or 'bots (aka spiders, crawlers, etc.).Your job is to write a single Perl script that will analyze the January 2005 web requests found in the apache web log file /home/brian/cse271/january-access.log. In particular, your script needs to output a report containing
Unlike project #6, your script is NOT expected to call other UNIX utilities, as Perl incorporates many functions. Your script must be self-contained -- it may be as long as necessary, but may not run any other custom script in another file. It should take one or more log files as arguments (no hardcoded filenames!).
- the absolute count and relative percentages of regular browser requests vs. robot requests
- the identities of the top 10 most active robots (and their request counts)
- the ip addresses of the top 10 most active clients (robot or otherwise, and their counts)
- the URLs of the top 10 most common referring web pages (and their counts)
- the URLs of the top 10 most requested web pages (and their counts) regardless of who requested them
Your script should not need to read through the datafile multiple times.
Submission Requirements
- As usual, the script must reside in the cse271.131/hw7 subdirectory. Name your script hw7.pl.
- Your name must be in the comment section (along with appropriate description, etc.).
- Do a touch DONE when the program is ready to be collected.