CSE 271: Programming Assignment 6

Assigned: 4 March
Due: 18 March

Abstract

Practice shell scripting by analyzing a web server log file.

Details

Your company's web server is starting to see some action. However, when skimming the logs of your web server, you start to see activity that doesn't appear to be generated by a human. Many of the requests come from clients with search engine names, and so you wonder how much of your web server traffic is resulting just from those robots, or 'bots (aka spiders, crawlers, etc.).

Your job is to write a single shell script that will analyze the January 2005 web requests found in the apache web log file /home/brian/cse271/january-access.log. In particular, your script needs to output a report containing

Your script is expected to call other UNIX utilities, including, but not limited to sort, grep, sed, wc, cut, and uniq. Your script must be self-contained -- it may be as long as necessary, but may not run any other custom script in another file. It should take one or more log files as arguments (no hardcoded filenames!).

Your script will likely need to read through the datafile multiple times. Faster (and often more elegant) scripts will minimize the number of times the datafile must be read.

Some background information and hints:

Submission Requirements

  1. As usual, the script must reside in the cse271.131/p6 subdirectory. Name your script p6.sh.
  2. Your name must be in the comment section (along with appropriate description, etc.).
  3. Do a touch DONE when the program is ready to be collected.

Last revised: 3 March 2013.