CSE 271 Lab 5: Debugging C Programs

1. Overview
There are two typical mechanisms for general purpose debugging. The first is the use of debugging statements within the code of your program. This is a technique that can be used in any language, in any environment. The other is through the use of a debugging package, which may be specific to the language or compiler.
In this lab we will explore the use of both mechanisms. In a later class we will even consider more difficult problems of memory leaks.
2. Debugging Statements
Consider the following program:
#include <stdlib.h>
#include <stdio.h>

int sum(int x, int y, int z) {
  char c = 4;
  int *a;

  return (c + x + y + z + *a) / 3;
}

int main(int argc, char *argv[]) {
  int i, j, k;
  int result;

  if (argc == 1) {
     printf("Please specify three numbers as parameters.\n");
     exit(1);
  }

  i = atoi(argv[1]);
  j = atoi(argv[2]);
  k = atoi(argv[3]);

  result = sum(i,j,44) + sum(j,k,55) + sum(i,k,66);

  printf("%d\n", result);

  return 0;
}
This program has some problems, and it would not take you long to figure them out. However, we're going to use debugging statements to attempt to help determine where the problems are.
So, copy and put this into a file called test1.c. Also create a Makefile and put something like the following into it:
CC = gcc
CPARAMS = -Wall

test1: test1.c
        $(CC) $(CPARAMS) -o test1 test1.c
If you make and run this program on an old Sun (e.g., SPARC-based machines: europa, metis, phoebe, pluto), you'll likely get the following result:
tritan 2% ./test1 1 2 3 Bus error tritan 2%
Exercise 1. Modify test1.c to include debugging statements in main() such as
fprintf(stderr, "Number of parameters = %d\n", argc);
and in sum() such as
fprintf(stderr, "x=%d\n", x); fprintf(stderr, "y=%d\n", y); fprintf(stderr, "z=%d\n", z); fprintf(stderr, "*a=%d\n", *a);
Now re-make and run the revised program. Assuming you put the debugging statements in the right places, you'll most likely see output from the statement from main() and some, but not all of the ones from sum().
Now add an additional statement immediately after the last one that successfully printed:
fprintf(stderr, "a=%ld\n", (long)a);
This should help to explain why the last fprintf() failed to print. Now try running the program with only one parameter, and see how it fails. Add some more debugging statements to your main() to find exactly where the line is that is causing the problem. TIP: When using printf/fprintf, always end the format string with a \n. This will ensure that your debugging message is printed to the screen, and not just buffered, waiting for additional characters to be sent.
Think about it: why might it be useful to use fprintf() as we did here, rather than the normal printf()?
OK, so that should help make sure that you can use printf() to generate debugging output to identify the problem line when your program crashes. But what about when it is not crashing? Lots of debugging output can be annoying. In a recent lecture, we looked at preprocessor directives that allow you to turn on and off debugging code. Review that material if needed.
Exercise 2. Revise test1.c to use preprocessor directives to easily turn on or off the debugging output on the basis of whether the DEBUG symbol is defined.
Add an additional entry to the bottom of your Makefile to generate a new executable, called test1d that uses the same source file, but includes the -D DEBUG compiler option. Note that if you just type make at this point, the new program will not get built. You can either type make test1d, or add a new entry at the beginning of your Makefile such as:
all: test1 test1d
This creates a new dependency, called all, which requires that test1 and test1d be up-to-date. Note that make will automatically make just the first target in the Makefile, which is why make did not build test1d earlier.
You should now have two binaries---one that was compiled with debugging enabled, and one that was compiled with it disabled.
While we will not cover it here, it is possible to use the compiler option -D to also set a value for a symbol, which is useful for different levels of debugging. It might also be prudent to note that you could also control debugging code interactively with a regular variable (which might be controlled by an option to your program), rather than using preprocessor directives during compilation.
3. Using a debugger
In the work above, we never actually fixed the problems with test1.c. That was intentional, so that we could continue to use it in this lab.
Debugging statements are simple, and work in all environments. But debugging that way is often quite tedious, as you must edit and re-compile many times until the right statements are in place to capture the effect that you need to understand. A debugger provides an interactive way to watch the execution of your program---you can stop the program at any point, examine the contents of variables, look at the calling stack, check function parameters, and more.
There are a number of common debuggers (adb, dbx, gdb, for example), but generally you want to use the one that matches the compiler you used. Since we used gcc, we want to use the GNU Debugger (gdb). In general, to use any debugger, we need to generate an executable program that contains debugging symbols (essentially information needed to map from the code that is running to the source code with which you started, so that the debugger can tell you variable names, function names, and line numbers from the original source). This requires re-compilation with the parameter -g in both compilation and linking. Once you've modified your Makefile and re-compiled test1 with debugging symbols, we can now run test1 within the debugger.
Exercise 3. Run your program within the GNU debugger with the command
tritan 2% gdb test1 GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10". (gdb)
At this point you haven't actually started your program yet, but can set variables to watch, breakpoints in your code at which to stop running, etc.
To actually start your program, type run along with any needed parameters, as in:
(gdb) run 1 2 3 Starting program: /home/brian/test1 1 2 3 Number of parameters = 4 x=1 y=2 z=44 Program received signal SIGSEGV, Segmentation fault. 0x106ec in sum (x=1, y=2, z=44) at test1.c:10 10 fprintf(stderr, "*a=%d\n", *a); (gdb)
Note that it shows us the usual output, plus the error and interestingly, the function in which the error occurred---sum() with particular parameter values, and the specific line at which it failed. (Note that the specific information for your program may vary slightly.)
This might be enough information for you to know where the problem is, fix it, and run it again. In that case, type quit to exit the debugger (and yes, the debugger will need to kill your running program to do so).
If not, you can ask the debugger for more information at this point. One useful question is to ask the contents of a particular variable (a). To do that, type
(gdb) print a $1 = (int *) 0x0 (gdb)
This tells us nicely that in this case the int * named a contains the value 0 (i.e., it is a null pointer). Note that the value of *a will depend on the OS, the compiler, the architecture, etc., which is why this program does not necessarily fail on the new (X86) Suns. We could easily ask about other variables that are accessible in this context (where the error occurred), such as x, y, or z.
We might also want to know how we got here---that is, what function called this one, and so on. The command for that is backtrace. Try that command now.
Notice that in this simple example, there is only one calling context to see on the stack---just the call to main(). In a real program there may be many such contexts, and backtrace, or more simply bt, will show the sequence of calls (and the values of their parameters).
Given the limited amount of time available in this class, we haven't really even scratched the surface of what is possible to do with gdb. It will be up to you to learn more about gdb on your own. I highly recommend Norman Matloff's Guide to Faster, Less Frustrating Debugging as it has an intro to using gdb. Learn how to set breakpoints, change the values of variables of a running program, etc.
As mentioned earlier, gdb is just one debugger that may be available. In practice, it is a good one, and works even when you only have an ssh or telnet connection to the system. However, the interface is a little cryptic. [It helps if you know and remember that you can type help at the gdb prompt and get information on how to use it.] Fortunately, others have extended gdb by incorporating a visual interface. One well-regarded interface is called ddd (data display debugger), which can be the front-end for a variety of back-end debuggers (for perl, bash, make, python, java, etc). ddd is becoming fairly common, and is often found on Linux installations, and is also installed here on the Suns. (Note that if it doesn't automatically find gdb, look at the ddd man page to find how to specify where the debugger is as an argument to ddd.) Again, I highly recommend Norm Matloff's Debugging Tutorial---in particular, the PDF slide show on using ddd.
Regardless of what debugger you use, there is one more aspect that you need to know.
4. Debugging with core dumps
When a program fails as a result of a segmentation fault, or a bus error, or from calling abort(), the OS will attempt to store a copy of the running process image (e.g., exactly what was in memory at the time) to disk. This is called dumping core. Core files will be however large the program was in memory (and are thus, sometimes, quite large). As a result, the generation of core files is often disabled by default (as is the case with your CSE accounts).
To see how large a core file you are permitted to generate, type ulimit -a. A limit of 0 will prevent the generation of core files at all. I suggest that you set it (typically in a file called .profile) to a unlimited (e.g., by commmenting out the limit that exists there). If your core file size is unlimited, then you can limit it if you want from the command line:
ceres:~% ulimit -c 500000 ceres:~% ulimit -a core file size (blocks, -c) 500000 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files (-n) 256 pipe size (512 bytes, -p) 10 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 256 virtual memory (kbytes, -v) unlimited ceres:~%
Since your .profile likely has a limit, you will be unable to set a larger corefile size (you can always shrink your limit, but only root can raise your limit) for your current processes. However, after you change the limit in .profile, any new shell (e.g., ssh to a new machine or log in again on your current one) will reflect the change. Once you allow non-zero core files, any programs that you run from this shell that exit abnormally will likely generate a core file (in the directory in which they were run).
Why is this important? Well, gdb is capable of reading in a core file (that was generated by a program with debugging symbols) and telling you where the problem is! In exactly the same way that we saw the line and the calling stack, etc., gdb can do the same for a core file. Typically, you'll want to run it as:
europa 2% ./test1 1 Number of parameters = 2 Segmentation fault (core dumped) europa 2% ls -l core -rw------- 1 brian faculty 80364 Feb 23 00:39 core europa 2% gdb -c core test1 GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.10"... Reading symbols from /lib/libc.so.1...done. Loaded symbols for /lib/libc.so.1 Reading symbols from /lib/ld.so.1...done. Loaded symbols for /lib/ld.so.1 Core was generated by `./test1 1'. Program terminated with signal 11, Segmentation fault. #0 0xff241d24 in atoi () from /lib/libc.so.1 (gdb) bt #0 0xff241d24 in atoi () from /lib/libc.so.1 #1 0x000107c0 in main (argc=2, argv=0xffbff7cc) at test1.c:21 (gdb)
This is quite helpful, as it means you can capture the state of the program in which the error occurs, even when you are not running under a debugger (which is most of the time). This is the mode in which I use a debugger most---after I've used a program, it crashes and dumps core. I start a debugger and examine the core to find out where, and hopefully why, it crashed.
I hereby repeat my strong recommendation to learn to debug faster by working through the tutorials mentioned above for gdb and ddd.

Last revised: 12 February 2013, Prof. Davison.