Data Storage Choices

Richard Kay, Oct 2005.

File or Database ?

Files are simple and straightforward for standalone, single user, non-concurrent applications. Concurrent applications with multiple simultaneous users can use lockfiles to protect file access. Databases are more reliable for production use in a networked multi-user context.

Binary or Text file ?

With binary files data can be stored using internal format. Advantage: don't have to convert data on input or output. Disadvantage: binary files are less portable across networks, even across compiler setting. Alignment of structure members on word or byte boundaries ?

Variable or fixed record length ?

Technically this choice is orthogonal (i.e. unrelated) to whether a file is in text or binary format. In practice fixed length records are more likely for binary files. The advantages of fixed length records include:

Advantages of variable length records include:

Choice of text file record delimiter

Normally characters which don't appear in data are used for record (end of line) and field delimiters. Newline is most commonly used as the record delimiter. Problem: what happens if newline is required within a data field ? Possible solution: escape this value e.g. using \n . Then you have to escape backslash and convert data on input and output. Other characters possible, but fgets() function assumes use of newline.

Choice of text file field delimiters

This is more likely to involve a conflict between the data and delimiter. Normally different from the record delimiter, unless fields are counted or labelled. Popular delimiter characters include space, tab, comma (,), colon (:). Many applications allow export/import using comma delimited format, typically double quoting strings e.g:

Record No.,Name,Mark
123,"Asif Mohammed",76.2
145,"Joe Brown",72.1

The appearance of a double quote within a string complicates this approach further. The fscanf function assumes space, tab or newline delimited data, but is insecure if data of the wrong type or length is encountered, and can behave unpredictably with strings containing embedded spaces. Some data can be simplified by converting embedded spaces into underscores, e.g. for file and variable names.

XML anyone ?

XML (eXtended Markup Language) involves enclosing data within opening and closing < > tags. Probably not useful for simple standalone applications. Can be useful for communicating data with a common and defined purpose between different platforms. Not a good match to most 'C' type applications. Better suited for Java, Perl, Python business and web enabled applications for which XML libraries are available.