CSE271 Lab 6: Debugging and Profiling Tools

Announcements

Reminder: Homework #3 (coding style) is due today
Programming Assignment #5 (debugging) is assigned tomorrow.

Today: More Tools for Debugging and Optimization

We saw many debugging tools over the past week

Trusty printf()
Debuggers such as gdb + ddd
Dynamic memory allocation checkers such as ccmalloc, Electric Fence, Purify and Valgrind

There are a few more to add to your arsenal

Static code checkers
System call tracers

We will also look at a code profiler to help with optimization.

Static Code Checkers

lint -- a static code checker

The lint utility attempts to detect features of the named C program files that are likely to be bugs, to be non-portable, or to be wasteful. It also performs stricter type checking than the C compiler. (Lint goes further than -Wall, but also complains about non-errors.)

Lint finds unreachable statements, loops not entered at the top, variables declared and not used, logical expressions with constant values, calls to functions that return values in some places and not in others, functions called with varying numbers of arguments, function calls that pass arguments of a type other than the type the function expects to receive, functions whose values are not used, and calls to functions not returning values that use the non-existent return value of the function.

Lint is old (first appeared in 1979). There are commercial versions, but the modern open-source replacement is Splint. Lint is installed on the Suns. Similar packages are available for other languages (e.g., jlint for Java).

Splint -- Secure Programming Lint

Splint is a tool for statically checking C programs for security vulnerabilities and coding mistakes. With minimal effort, Splint can be used as a better lint. If additional effort is invested adding annotations to programs, Splint can perform stronger checking than can be done by any standard lint. Versions of Splint are available for Solaris, Linux, and Windows.
I've installed splint for the old suns in ~brian/cse271/splint-3.1.1-sparc/
and for the new suns in ~brian/cse271/splint-3.1.1-x86/

Using splint
Let's use splint on an example. Recall the troublesome leak.c program that we used to demonstrate dynamic memory debugging. Copy it to your directory. Run splint over it (for new suns):
% export LARCH_PATH=.:~brian/cse271/splint-3.1.2/lib % export LCLIMPORTDIR=~brian/cse271/splint-3.1.2/imports % ~brian/cse271/splint-3.1.2/bin/splint leak.c
or for old suns:
% export LARCH_PATH=.:~brian/cse271/splint-3.1.1/lib % export LCLIMPORTDIR=~brian/cse271/splint-3.1.1/imports % ~brian/cse271/splint-3.1.1/bin/splint leak.c
You will see a number of warnings about real problems in the code (memory leaks, unreachable code, problems with free(), dereferencing of possibly bad pointers, etc.). You will also see some warnings about things that might be changed to improve security (adding the static keyword). In this example, splint is able to find most (if not all) of the errors; in more complex programs the dynamic memory checking packages are needed.
System call tracers

Sometimes it is useful to know what system calls a program is making.

System call tracers like strace, truss and dtrace can do so, without re-compiling code!

Strace

Strace is a system call trace, i.e., a debugging tool which prints out a trace of all the system calls made by a another process/program. The program to be traced need not be recompiled for this, so you can use it on binaries for which you don't have source.
System calls and signals are events that happen at the user/kernel interface. A close examination of this boundary is useful for bug isolation, sanity checking and attempting to capture race conditions.
Strace is available for a wide variety of platforms.
See this tutorial for a nice intro to using strace for debugging and this article for using strace to determine where a missing library belonged.

truss
The truss(1) utility in Solaris is roughly equivalent to strace --- it executes the specified command and produces a trace of the system calls it performs, the signals it receives, and the machine faults it incurs. Each line of the trace output reports either the fault or signal name or the system call name with its arguments and return value(s). System call arguments are displayed symbolically when possible using defines from relevant system headers; for any path name pointer argument, the pointed-to string is displayed.
dtrace

dtrace(1M) is a dynamic tracing package developed by Sun/Oracle for Solaris and Linux. It provides a powerful infrastructure to permit administrators, developers, and service personnel to concisely answer arbitrary questions about the behavior of the operating system and user programs.
You might be interested in a case study using dtrace and truss.
Use of dtrace requires root access in a default Solaris installation.

Using a system call tracer
Let's see truss in action. At a system prompt, type:
% truss pwd
This will generate a list of all the system calls made by the pwd (print working directory) command. Each one of these should be documented in the system man pages, so you can get a very good feel for what is happening, but it is not important right now to understand the calls. pwd is a very simple command, and only generates about 30 calls. For a somewhat more complex one, try running truss on df, or better yet, on emacs:
% truss -f -o truss.out emacs
Rather than printing the calls to the screen, the -o option tells truss writes its output to a file so that emacs can handle the screen. If you just start and then stop emacs within a shell, you'll get more than 1200 system calls. Looking through traces like this, you'll find what libraries are needed by a program and where it looks for them, as well as where emacs stores temporary files.
Code Optimization

Comments from a discussion on Slashdot.

"The first thing to optimize is the algorithm. Use a O(n^2) algorithm that does the same job as an O(e^n) algorithm if you can. Algorithmical optimization makes the most difference."
"Write code that is easy to understand and modify, then optimize it, but only after you have profiled it to find out where optimization will actually matter."
"An important lesson that I wish I had learned when I was younger ;) It is crazy to start optimizing before you know where your bottlenecks are. Don't guess - run a profiler. It's not hard, and you'll likely get some big surprises."

What do the experts say?

"Premature optimization is the root of all evil." -- C.A.R. Hoare
"We should forget about small efficiencies, say about 97% of the time: Premature optimization is the root of all evil." -- Donald Knuth (quoting Hoare)
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
-- M.A. Jackson
"More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." -- W.A. Wulf

Optimizing Code

Yes, I agree with the experts.

However, there are times when you do need to optimize your code.

Code profiling can also sometimes expose bugs when you realize where your CPU time is being spent.
Will also demonstrate why a poor algorithm is indeed poor.

Code Profilers

A code profiler will tell you

how often a function is called.
which functions used what percentage of your time.

It is then up to you to figure out how to improve upon that behavior.

But now, you know where to focus your efforts.

Examples include gprof, prof, valgrind, etc.

gprof
gprof is the GNU profiler.
To use:
First, we need to compile for profiling, e.g.:
gcc -g -pg -o myprog myprog.c
Then, run your program to completion (generates a gmon.out file).
Finally, run gprof to analyze your code and the log file to produce a report.
gprof myprog
Using a profiler
Now let's try it out ourselves.
% gcc -pg -g -o leak leak.c % leak % gprof leak
This very simple program has very little to report via the profiler. It is able to show how often each routine is called, but since the program executed so quickly, it could not show any significant cpu usage for any part of the program. In longer running, more complex programs, you will see which routines used what fraction of your cpu time (and how often they were executed), allowing you to focus your efforts on improving those routines needing the most help.
Rules for writing fast code (aka optimization)
More from the Slashdot discussion...

Avoid doing what you don't have to do. Sounds obvious but I rarely see code that does the absolute minimum it needs to. Most of the code I've seen to date seems to precalculate too much stuff, read too much data from external storage, redraw too much stuff on screen etc...
Do it later. There are thousands of situations where you can postpone the actual computations.

Most string class implementations already make good use of this rule by only copying their buffers only when the "copied" buffer changes. Operating systems also use the "copy-on-write" rule when copying memory pages that can otherwise be shared (e.g., when fork()ing a process).

Apply minimum algorithmic complexity. If you can use a hashmap instead of a treemap, use the hash version since it's O(1) vs O(log n). Use quicksort for just about any kind of sorting you need to do.
Cache your data. There are some enormous performance gains that can be realized with smart caching strategies.
That's it! If you are applying rules one to four you can have fast AND readable code.

C Game Programming
Battleship is a popular two-player game for kids, and has a history going back to the 1930s.
Typically, each player places five "ships" of known size and distribution on a 10x10 grid. Those ships occupy grid positions that can be blindly bombed by your opponent. The goal is to sink all of your opponent's ships to win the game.
You can play a modern version written in flash, or a more complex javascript version online.
Your goal for this task is to write a C program to print the screen containing the positions of a fleet of five ships of varying length (one ship of each type):

Aircraft Carrier -- length 5 units
Battleship -- length 4 units
Frigate -- length 3 units
Submarine -- length 3 units
Minesweeper -- length 2 units
So, for example, a fleet might occupy grid positions as follows:
A (A,1) (B,1) (C,1) (D,1) (E,1)
B (B,2) (C,2) (D,2) (E,2) 
F (C,3) (D,3) (E,3)
S (D,4) (E,4) (F,4)
M (E,6) (E,7)
Note that in all cases, coordinates are row (letter) followed by number (column), and a ship can only occupy positions in a single row or column. You can store the representation of ship positions in any form, but your printing code should handle other ship configurations easily. So, when complete your program should print out a grid corresponding to this configuration as:
1 2 3 4 5 6 7 8 9 0 A A . . . . . . . . . B A B . . . . . . . . C A B F . . . . . . . D A B F S . . . . . . E A B F S . M M . . . F . . . S . . . . . . G . . . . . . . . . . H . . . . . . . . . . I . . . . . . . . . . J . . . . . . . . . .
You will use your completed code in a future project.

Last revised: 20 February 2013, Prof. Davison.