'C' programming on Linux

Getting started

Some Linux distributions include 'C' compilers and linkers etc. as part of the base system. On Ubuntu install the meta-package build-essential . (to achieve this on a non-networked virtual machine e.g. Gutsy Host 1, you will need to complete the mounting and aptitude package management tutorial, to loopback mount the Ubuntu packages DVD image and add this software source to /etc/apt/sources.list before synaptic, or the aptitude or apt-get commands will install build-essential for you.)

Create a directory to write 'C' programs in and edit a file: hello.c with the customary "hello world" program:

#include <stdio.h>
int main(void) {
  printf("hello Linux 'C' world!\n");
  return 0;
}

To compile it as hello, use the shell command:

gcc hello.c -o hello

To run it, use the command:

./hello

gcc stands for GNU Compiler Collection. It can compile 'C', 'C++', Fortran, Ada and Java programs. It can compile these under very many combinations of CPU hardware and operating system to which gcc has been ported.

Some common gcc options

gcc by default handles the following stages of compilation:

gcc has too many options for these all to be specified using single letters, so gcc -dr is not the same as gcc -d -r , as would work with other Unix tools. Not all compilation stages have to be done at once, so you can create assembly language files with the -S option, or unlinked object files with the -c option. Files for intermediate stages are deleted by default. If the -o option to give the output file isn't specified, an executable will be written in the file a.out , and object files will have the same name as the source with a .o suffix, and assembler files have the source prefix with the .s suffix.

The -O option controls optimisation, a number or letter after the 'O' determines the kind of optimisation, e.g. s for size or a numeral giving a tradeoff between compilation and execution speed.

The -llibrary option (library is the name of the library) links against a particular library, in addition to the system and compiler libraries.

The -Idirectory option searches the directory for header files.

Using the gdb debugger (Gnu DeBugger)

Before you can use this you will first need to be able to compile without compilation errors. The gdb debugger is used to help investigate runtime errors. This is enabled by compiling the program to be debugged using the gcc -g option.

The actions you are likely to want a debugger to do are to be able to run parts of the program until you get to specified break points, where you examine the state of data within the program, and to step into or over functions which are to be executed after your break points.

gdb program

runs program (which has to be specially compiled) inside the gdb environment.

Here are some of the most frequently commands needed inside gdb, taken from the gdb(1) man page:

       break [file:]function
               Set a breakpoint at function (in file).

       run [arglist]
              Start your program (with arglist, if specified).

       bt     Backtrace: display the program stack.

       print expr
               Display the value of an expression.

       c      Continue running your program (after stopping,
              e.g. at a  break-point).

       next   Execute  next program line (after stopping);
              step over any function calls in the line.

       step   Execute next program line (after stopping); step
              into any  function calls in the line.

       help [name]
              Show  information about GDB command name, or
              general information about using GDB.

       quit   Exit from GDB.

Example gdb session

#include <stdio.h>
#include <string.h>

int main(void){
  char word[]="hello";
  int i;
  i=strlen(word);
  printf("length of string: hello is %d\n",i);
}

The above program (strlen.c) was compiled using command:

gcc -g strlen.c -o strlen

The following debug session was recorded:

[rich@copsewood c]$ gdb strlen
(version and license details cut)
(gdb) break main
Breakpoint 1 at 0x804837c: file strlen.c, line 5.
(gdb) run
Starting program: /home/rich/c/strlen
Breakpoint 1, main () at strlen.c:5
5         char word[]="hello";
(gdb) step
7         i=strlen(word);
(gdb) print i
$1 = 134513257
(gdb) step
8         printf("length of string: hello is %d\n",i);
(gdb) print i
$2 = 5
(gdb) step
length of string: hello is 5
9       }
(gdb) step
0x4003a7f7 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) step
Single stepping until exit from function __libc_start_main,
which has no line number information.

Program exited with code 035.
(gdb) quit

To reduce typing, b, r, s, p and q are aliases of break, run, step, print and quit.

Building and installing programs from source with make

make is a significant project management tool. You are likely to need to use it to build non-trivial programs supplied in the form of source code. You will also have to use it to modify projects which already use make programs called makefiles, and are likely to benefit from using it for your own projects once the effort and time spent repetitively building these programs by compiling, linking and installing components can be reduced by automating and modularising the build process.

make is a command which you run in source directories containing a make program called Makefile or makefile. For a large project with a source file tree using more than one directory the Makefile in the parent of these is likely recursively to run individual makefiles in subdirectories containing source files. For some very large projects, creating the Makefile suitable for your system is itself automated, by convention through the use of a shell script called configure, also likely to be in the source parent directory.

make commands either specify a target or use a default target. A make target is either a file to be built (e.g. a binary executable) or a directive to get make to do something, e.g. make clean (clean being the target) is used by convention to remove files from the source directory tree, e.g. object files, which are generated by compiling source code and which are in the way or no longer needed. Another conventional target is make install which causes make to copy the compiled program, associated runtime library modules and documentation into the directories from which these files will be used.

A very simple makefile

(From "Programming with GNU Software", O'Reilly)

#a very simple makefile
#give name of target to compile
simulate:
# one or more tab-started shell commands to create stimulate
	gcc -o simulate -O simulate.c inputs.c outputs.c

To run this makefile, your shell command would be:

make simulate

This is adequate for a very small program, but it would compile all 3 source files every time one of them changes. Makefiles contain information about file dependencies, e.g. so that make does the minimum required work compiling only source files which have been updated, and not bothering with object files which are up-to date, i.e. where the depenant file (e.g. an object file) has a later modification date/time than the file it is dependant upon (e.g. the source file). If you only want to compile source which has changed, you could expand this makefile as follows:

# name of target can be followed by dependancies
simulate: simulate.o inputs.o outputs.o
# tab-started command to create target
	gcc -o simulate simulate.o inputs.o outputs.o
simulate.o: simulate.c
	gcc -c -O simulate.c
inputs.o: inputs.c
	gcc -c -O inputs.c
outputs.o: outputs.c
	gcc -c -O outputs.c

Running shell commands inside makefiles

Makefiles can use the shell to execute shell commands, but these must be on a line per shell execution starting with a tab:

clean:
	rm -f Makefile.bak a.out core errs lint.errs thisprog thatprog *.o

If you want a set of commands to run inside the same shell, e.g. to set and use environment variables or change working directories prior to using these for other shell commands, place semicolons between these commands. This can be on the same line or you can use backslashes as the last character of lines to be continued, with no whitespace (tabs or spaces) after the backslashes e.g.

	cd ../module1; \
gcc module1.c

Use of make macros

To avoid repetitive and error prone typing (even the very simple example above had this) , Makefiles often include macros which are short words expanded into substitution text e.g: (taken from "Internetworking with TCP/IP, Volume III" Makefile David L Stevens, Internetworking Research Group at Purdue.)

CFLAGS = -W -pedantic -ansi
SERVS = TCPdaytimed TCPechod TCPmechod UDPtimed daytimed sesamed
SXOBJ = passiveTCP.o passiveUDP.o passivesock.o errexit.o

Derived macros and make targets are then specified more easily using these macros eg:

${SERVS}: ${SXOBJ}
        ${CC} -o $@ ${CFLAGS} $@.o ${SXOBJ}
servers: ${SERVS}
TCPechod: TCPechod.o
TCPmechod: TCPmechod.o
UDPtimed: UDPtimed.o
daytimed: daytimed.o
sesamed: sesamed.o

This states that the server programs are dependant upon creating the listed object files. The macro $@ expands to the target: TCPdaytimed TCPechod TCPmechod UDPtimed daytimed sesamed . The macro ${CC} isn't defined within this Makefile, as make uses an inbuilt value for this macro, normally the cc command which is the default Unix 'C' compile command, on Linux cc is an alias of the gcc command.

Using make to build downloaded source code

make is typically used after downloading a program in the form of a source code archived in tar or compressed tar format ( compressed tar archives have suffixes .tgz or .tar.gz ). These archives can be uncompressed by downloading or copying the archive file into a directory where you wish to build the program and using the command:

tar -xzvf program.tgz

Where x means extract, z means uncompress, v means display all the paths as the archive is extracted and f refers to the archive file ( program.tgz ) to be extracted. On some older systems you may need to uncompress the tar file seperately from using tar to extract it, e.g. using gunzip program.tgz and then tar -xvf program.tar . Having done this it is a good idea to see if there any readable text files in the source archive describing how to proceed with compilation and installation. Typical file names to look out for are README, README.1ST , INSTALL, CONFIGURATION, etc.

Accessing environment variables

This data takes the form of a set of variables, each having a name and a value. Both names and values are stored as strings. These variables allow external control over how programs behave, for example the PATH environment variable gives a list of directories which the shell will search in order to find an executable program for an external command. This environment is inherited from the parent process' environment, but environment changes made by children, e.g. to the current working directory, do not affect parents. If it makes sense for an environment variable to specify more than one value, e.g. directories to be searched, these will be separated using a suitable delimiter (e.g. Unix directory paths are delimited using colons ':' while Windows directory paths are delimited using semicolons ';' .

In the Bash shell a variable can be assigned and exported to the environment as follows:

name_of_variable=value
export name_of_variable

it can then be accessed using $name_of_variable e.g:

echo $PATH
In 'C' programs, environment variables can be read and written using getenv(3) and setenv(3) system calls.

Example use of getenv

/* getenv.c : wrapper program around getenv()
 * Richard Kay 11 Jan 02 */
#include <stdio.h>
#include <stdlib.h>

int main(void){
    char name[BUFSIZ],*vp;
    printf("enter name of environment variable:\n");
    fgets(name,BUFSIZ,stdin);
    name[strlen(name)-1]='\0'; /* get rid of newline */
    vp=getenv(name);
    if(vp == NULL){
       fprintf(stderr,"I don't know that environment variable\n");
       exit(1);
    }
    printf("Value of %s is %s\n",name,vp);
    return 0;
}

Running the program

./getenv
enter name of environment variable:
PATH
Value of PATH is /home/rich/bin:/usr/local/bin:/usr/local/jkd1.4/bin:
/home/rich/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games

Use of setenv

Use of this system call will only affect child processes. So if you write a 'C' program using setenv, you will only be able to view the changed environment in a child process spawned by your 'C' program e.g. using fork(3) or the program itself. Once your program has exited, the parent and any sibling processes are unaffected.

Here is the man page.

SETENV(3)   Linux Programmer's Manual  SETENV(3)

NAME
       setenv - change or add an environment variable

SYNOPSIS
       #include <stdlib.h>

       int setenv(const char *name, const char *value, int overwrite);

       void unsetenv(const char *name);

DESCRIPTION
       The  setenv()  function  adds the variable name to the environment with
       the value value, if name does not already exist.  If name does exist in
       the  environment,  then  its  value is changed to value if overwrite is
       non-zero; if overwrite is zero, then the value of name is not  changed.

       The unsetenv() function deletes the variable name from the environment.

CONFORMING TO
       BSD 4.3

SEE ALSO
       clearenv(3), getenv(3), putenv(3), environ(5)

BSD                 1993-04-04          SETENV(3)

Accessing command line arguments in 'C' programs

This is done by specifying argc and argv parameters to the main function. These are the number of arguments and an array of string pointers respectively. This array starts at 0 and the 0 indexed argument is the pathname by which the program was executed.

The following program performs a similar function to echo(1) except that echo doesn't output the pathname by which it is called before the other command-line arguments:

#include <stdio.h>
int main(int argc, char *argv[]){
  int i;
  for(i=0;i<argc;i++)
    printf("%s ",argv[i]);
  printf("\n");
  return 0;
}