HANDLING TEXT FILES USING 'C'




1. Introduction.
 

So far we have used pre-allocated files, the standard input (stdin) which usually comes from the keyboard and which we might read using scanf() and the standard output (stdout), which usually goes to the screen and which we might write to using printf(). The kind of files with which we are now concerned exist in directories or folders and have filenames and pathnames. E.g. numbers.txt is a filename in the current working directory while c:\rich\myfiles\numbers.txt is a pathname which allows access from whichever device (e.g. a: might mean the floppy disk or d: the CD-ROM) or directory or folder (e.g. \programs\csource ) in which you might happen to be working.

Such files are generally organised in one of two ways: text or binary. Binary files store data in much the same way that it might be stored in the memory of your programs and tends to involve more compact use of storage space. However you often can't transfer binary files directly from one computer for use on another unless the applications or programs on both computers write and read these files in exactly the same way.

Text files can store data similarly to how you would view textual data on a computer screen or enter it by keyboard and can often be used more easily to transfer non-standardised data from one computer to another.

Before a file of either kind can be used in a 'C' program it needs to be opened for reading or writing or both using fopen(). The file can then be written or read as appropriate using various functions, e.g. fprintf() might write to an open file in much the same way that printf() writes to standard output while fscanf() can read from a file opened for reading in a similar manner to the way scanf() reads from standard input. It is a good idea to close files using fclose() when your program has finished reading or writing to a file in order to free the operating system resource associated with the open file connection. This is especially true if your program needs to access more than a few files. If you don't close the file, however, it will be automatically closed when the program terminates.

Files in 'C' are often referred to as "streams" because they are presented to "C" programs by the operating system just as if they were streams of data bytes. When reading to or writing to a file the file is considered to have a position, being the position within the file at which the next reading or writing operation will take place. A file will be read most simply sequentially from the start, either until the end of file marker or until the data required is found. However for faster access to data records in large files, indexed access of suitably organised data is possible. Simple writes to a file will either start at the beginning of a new or overwritten file or at the end of data which is already present. It is, however, also possible for an existing file to be updated in place, in which case the file cursor has to be positioned within the file at the correct point for the next read or write operation before reading or writing can take place.
 

2. Using fopen()

The prototype is: FILE *fopen(const char *filename,const char *mode);

This means that fopen()

a. returns a pointer variable (i.e. address) of type FILE . This special pointer is called a filehandle ; objects assigned with this value may be declared e.g. as: FILE *stream; .

b. requires a file or path name, typically specified as a double quoted constant string, e.g. "numbers.txt" . A full path name includes backslash characters. You need to include two to represent one e.g. "c:\\windows\\win.ini" will access the c:\windows\win.ini file.

c. needs an access mode, also typically a double quoted string. This is made up of letters starting with one of:

These one or 2 letters will be followed by
Example 1:
FILE *in,*out; /* declare file handles for input and output */
in=fopen("input.txt","rt"); /* opens file input.txt in read and text modes */
out=fopen("out.txt","wt"); /* opens out.txt for writing which is created or overwritten*/
Example 2:
#include <stdio.h>
#define IN "c:\\autoexec.bat" /* need a double backslash to represent a single */
int main(void){
  FILE *in;
  char buffer[1024];
  in=fopen(IN,"rt"); /* open in read and text modes */
  fgets(buffer,1024,in); /* read up to 1023 chars or first \n from in filehandle */
  printf("The first line of the AUTOEXEC.BAT file is:\n%s\n",buffer);
  fclose(in); /* good practice to close files*/
  return 1;
}


3. Using fprintf()


This function is similar to printf(), except that it writes to a file through use of a file handle previously assigned using fopen() instead of writing to the standard output. It takes an extra first parameter, the filehandle assigned using fopen(). All the other parameters are the same as for printf() except they are shifted forwards by one.

FILE *out; int number; char name[30];
out=fopen("numbers.txt","at"); /* will append output at end of existingfile */
gets(name); printf("enter number \n");
fprintf(out,"%s%d\n",name,number); /* note extra parameter */
fclose(out);
 

 

4. Using fscanf()
 

This function is similar to scanf(), except that it reads from a file through a file handle previously assigned using fopen() instead of reading from the standard input. It takes an extra first parameter, the filehandle assigned using fopen(), all the other parameters are the same as for scanf() except they are shifted forwards by one.

Example:

char name[30];
int extension;
FILE *in;
in=fopen("numbers.txt","rt");
fscanf(in,"%s%d",name,&extension);
printf("name: %s telephone extension: %d\n",name,extension);
fscanf will return the constant EOF if an attempt is made to read beyond the end of file. This return value is commonly tested to end an input loop e.g.
while(fscanf(in,"%s%d",name,&i) != EOF){
/* read data until end of file */
 ... /* process data */
}
For applications requiring robust handling of possibly invalid input data, use of fgets() followed by data validation processing followed by use of sscanf() is preferred to fscanf().

 
 

5. Using fgets() and fputs();

fgets() reads a line of data from a file up to and including the newline character: '\n' into a string and then appends the string terminator character: '\0' after the newline. fgets() returns the value NULL (not EOF !) when attempting to read beyond the end of file. fputs() writes a string to a file such that fputs(string,out) is the equivalent of fprintf(out,"%s",string). fgets() requires the name of the string ( or any other pointer giving the address at which it starts), the maximum number of characters to read - 1 (to leave room for the '\0' end of string marker) and the filehandle as its 3 parameters.

Example 1:

#include <stdio.h>
#define IN "master.txt"
#define OUT "backup.txt"

int main(void){
  FILE *in,*out; /* in and out are pointers of type FILE */
  char buffer[2048];
  int nlines=0;
  in=fopen(IN,"rt"); /* initialise in FILE pointer */
  out=fopen(OUT,"wt"); /* initialise out FILE pointer */
  while((fgets(buffer,2048,in) != NULL){ /* doesn't overflow buffer */
    nlines++; count number of input lines
    fputs(buffer,out);
  }
  printf("%d lines were copied\n",nlines);
  fclose(in); fclose(out);
  return 1;
}
The fgets() function is particularly useful for robustly reading text files organised into records separated using newlines as it contains built in buffer overflow protection. Data can be input using fgets() into a character string, validated to ensure the correct number and types of data items are present and then read from the string into local program variables of the appropriate types using sscanf().

 

6. Using getc() and putc();

These functions are the file enabled equivalents of getchar() and putchar(). They are used to read and write single characters from and to files respectively. getc() returns EOF if an attempt is made to read beyond the end of file. c=getc(in); is the equivalent of fscanf(in,"%c",&c); and putc(c,out); is the equivalent of fprintf(out,"%c",c); .

/* EXAMPLE: copy file one char at a time using getc and putc */
#include <stdio.h>
int main(void){
  FILE *in,*out; /* in and out are pointers of type FILE */
  char c; int i=0; /* i counts the number of characters copied */
  in=fopen("master.txt","rt"); /* initialise in FILE pointer */
  out=fopen("backup.txt","wt"); /* initialise out FILE pointer */
  while((c=getc(in)) != EOF){ /* copy all chars in in to out until EOF */
    putc(c,out); /* writes char c to FILE *out */
    i++; /* count how many chars are copied */
  }
  printf("%d characters were copied\n",i);
  fclose(in); fclose(out);
  return 1;
}
7. fgets() and sscanf() example
/* fgetsval.c : Demonstrates validation of data read from file.
  Extensions between 4000 and 7999 are considered valid. */
#include <stdio.h>
#include <string.h>
#define TRUE 1
#define FALSE 0
int main(void){
  int i,valid,recnum=0,ext;
  char name[50],strext[50],rec[200];
  FILE *in;
  in=fopen("numbers.txt","rt");
  clrscr();
  while(fgets(rec,200,in) != NULL){ /* input
    records until end of file */
    recnum++; /* counts record number */
    sscanf(rec,"%s %s",name,strext);
    valid=TRUE; /* assume innocent until proven guilty */
    if(strlen(strext) != 4) /* phone extensions must have 4 digits */
      valid=FALSE;
    else if(strext[0] < '4' || strext[0] > '7') /* 1st digit from 4-7 are OK */
      valid=FALSE;
    else
      for(i=1;i<4;i++)
        if(strext[i] < '0' || strext[i] > '9')
          /* other digits from 0-9 are OK */
          valid=FALSE;
    if(valid){
      sscanf(strext,"%d",&ext); /* the validation above makes this safe */
      printf("extension: %d name: %s\n",ext,name);
    } else
      printf("record number: %d has invalid extension\n",recnum);
  } /* end while */
  return 0;
} /* end of main */