Programs Using Random Numbers

Introduction to Entropy

Randomness is the property attributed to behaviour, activity or a sequence of numbers which lack any appearance of order. A system is deterministic if subsequent behaviour can be derived from a knowledge of its starting state, and the physical laws governing changes of state. Programmed computers are inherently deterministic, because a program is a sequence of instructions with an intended (i.e. pre-specified) result from given input. The development methodology of a program which involves writing a specification prior to implementation, and development of a test plan for the purpose of evaluating whether the implementation succeeds in generating expected outputs from chosen inputs, generally assumes a program which is deterministic.

Quotations: "God doesn't play dice with the universe" Albert Einstein.
"Random numbers should not be generated with a method chosen at random." Donald Knuth.
"The total entropy of any isolated thermodynamic system tends to increase over time, approaching a maximum value." The second law of thermodynamics, http://en.wikipedia.org .

The second law of thermodynamics assumes the existence of a physical quantity called entropy, which increases for example when hot water is mixed with cold water. The mixing of these is an irreversable event. You could use a refrigerator to make 2 buckets of warm water into one hot and one cold bucket, but generating the electricity required for this reversal would increase thermodynamic entropy elsewhere within the universe. Thermodynamic entropy (or simply entropy) is a measure of the amount of energy in a physical system that cannot be used to do work. ( http://en.wikipedia.org ) . Having 2 buckets of hot and cold water enables more work to be done than having a larger mixed bucket of warm water. The warm bucket have the same amount of energy as the cold and hot buckets, but when the hot and cold were mixed entropy was increased.

The concept of entropy in information systems is closely related to entropy in thermodynamic systems, but these are not exactly the same quantity. Information entropy is sometimes called Shannon entropy after Claude Shannon, whose theoretical work is concerned with the capacity of transmission lines to carry information, relating the background noise, bandwidth and attenuation of signal strength caused by the transmission medium. Information entropy concerns the degree of uncertainty about a signal. The entropy rate of an information source is the number of bits needed to encode a character from this source. If the information is very predictable, it is also very compressible so fewer bits are needed. One way to measure information entropy is to see how much it is possible to compress a sequence of symbols into a smaller file.

[rich@saturn compress]$ ls -l
total 36
-rwxr-xr-x  1 rich rich  9220 Apr  5 17:17 avltree
-rw-r--r--  1 rich rich  6905 Apr  5 17:15 avltree.c
[rich@saturn compress]$ gzip *
[rich@saturn compress]$ ls -l
total 16
-rw-r--r--  1 rich rich 2123 Apr  5 17:15 avltree.c.gz
-rwxr-xr-x  1 rich rich 4001 Apr  5 17:17 avltree.gz
[rich@saturn compress]$ 

In the above example:

  1. A machine code executable was reduced from 9220 to 4001 bytes, to 43.39% .
  2. A 'C' source file used to compile the executable was reduced from 6905 to 2123 bytes, to 30.74%

The gzip (GNU Zip) program is not a perfect compression system, in the sense that other programs now exist which can obtain higher compression ratios for certain files. However, information which contains maximum entropy or uncertainty per byte is impossible to predict or compress. In the above examples, the machine code executable had a greater amount of entropy per byte compared to the 'C' source code file from which it was compiled.

One of the most interesting uses of entropy within computing systems is for security purposes, because it is important that passwords and encryption keys should be unpredictable.

Some systems used to generate random numbers are themselves inherently deterministic. Supposing enough were known about:

Then it would be theoretically possible to compute which number would be on top when the die comes to rest on the surface on which it lands. Gambling casino managers need not worry about this because the inherent uncertainty in the thrower's knowledge and manipulative ability in connection with the starting conditions and other system dynamics makes it impossible to alter the equal probability of the six possible outcomes.

At one time IBM consultants recommended that master system keys and passwords be generated by the system manager using dice. The reason for this is that using a simple system meant that the manager could be as certain as possible about the means by which these keys were created and the conditions surrounding this event. The more complex the system, the less probable that everything affecting it can be easily known and controlled. However, the increased performance of computers now require more entropy within cryptographic keys than can easily be generated using dice.

Obtaining random numbers on a computer

Using the ANSI 'C' library: <stdlib.h> 2 functions and a constant are defined.

Function void srand(unsigned int seed); is used to seed the pseudo-random number generator.

Function int rand(void); generates a pseudo-random sequence of numbers in the range 0 - RAND_MAX .

Example:

/* mkrands.c
 * Richard Kay, Feb 2006
 * Generates random floats in range 0.0 - 1.0 and puts these in rand.txt file
 * */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define NRANDS 5000

int main(void){
        int i;
        FILE *out;
        out=fopen("rands.txt","w");
        /* initialise pseudo random generator */
        srand(time(NULL)); /* too predictable for cryptography */
        for(i=0;i<NRANDS;i++) /* print 10 random ints */
                fprintf(out,"%f\n",rand()/(float)RAND_MAX);
        return 0;
}

These functions seed and produce pseudo-random numbers, which are too predictable for strong cryptographic purposes, or even for generating passwords if an attacker can find a way to obtain or reverse engineer the algorithm used and get hold of 2 or 3 passwords in a longer sequence generated, e.g. for multiple student accounts.

The POSIX 1003.1-2003 standard gives the following example of an implementation of rand() and srand(). This appears to be tailored to 16 bit int implementations.

static unsigned long next = 1;

/* RAND_MAX assumed to be 32767 */
int myrand(void) {
               next = next * 1103515245 + 12345;
               return((unsigned)(next/65536) % 32768);
}

void mysrand(unsigned seed) {
               next = seed;
}

From this simple implementation it is clear that the sequence generated will repeat whenever a value for next is repeated. As a 32 bit integer is used for next, the maximum possible sequence length will be 232.

Linux random devices

To improve upon the obvious limitations of predictable pseudo-random number generators, the Linux operating system kernel provides 2 device files for the purpose of generating entropy. 1 of these is fast and less cautious, the other is slow and cautious. The faster device uses the slower approach to reseed a pseudo-random sequence. The following description is from Linux documentation.

RANDOM(4)                  Linux Programmer's Manual                 RANDOM(4)

NAME
       random, urandom - kernel random number source devices

DESCRIPTION
       The character special files /dev/random and /dev/urandom (present since
       Linux 1.3.30) provide an interface to the kernel's random number gener-
       ator.  File /dev/random has major device number 1 and minor device num-
       ber 8.  File /dev/urandom has major device number 1  and  minor  device
       number 9.

       The  random  number  generator  gathers environmental noise from device
       drivers and other sources into an entropy  pool.   The  generator  also
       keeps  an  estimate of the number of bits of noise in the entropy pool.
       From this entropy pool random numbers are created.

       When read, the /dev/random device will only return random bytes  within
       the estimated number of bits of noise in the entropy pool.  /dev/random
       should be suitable for uses that need very high quality randomness such
       as  one-time  pad  or  key generation.  When the entropy pool is empty,
       reads from /dev/random will block until additional environmental  noise
       is gathered.

      When  read,  /dev/urandom  device  will  return  as  many  bytes as are
       requested.  As a result, if there is  not  sufficient  entropy  in  the
       entropy  pool,  the  returned  values are theoretically vulnerable to a
       cryptographic attack on the algorithms used by the  driver.   Knowledge
       of how to do this is not available in the current non-classified liter-
       ature, but it is theoretically possible that such an attack may  exist.
       If this is a concern in your application, use /dev/random instead.

To obtain an idea of how much data these 2 devices can generate, 30 second samples were taken from both and the file sizes listed.

[rich@saturn random]$ cat /dev/random > random
(CTRL-C was pressed after counting to 30 seconds)
[rich@saturn random]$ cat /dev/urandom > urandom
(CTRL-C was pressed after counting to 30 seconds)
[rich@saturn pwgen]$ ls -l
total 593736
-rw-r--r--  1 rich rich       520 Apr  5 18:19 random
-rw-r--r--  1 rich rich  94748672 Apr  5 18:19 urandom

520 bytes were generated by the cautious entropy device in about the same time 94 MBytes were generated by the fast device.

To obtain a fixed amount of data, the dd system data copy command is more precisely controllable than the cat command. This enables a blocksize and number of blocks to be copied to be specified. The following command was used to obtain 512Mbytes of entropy data in the file random.bin .

dd if=/dev/urandom of=random.bin count=1000000 bs=512

Using Random Numbers to Compute PI

This algorithm involves scattering random X,Y coordinates generated such that 0 < X < 1 and 0 < Y < 1 from pairs of random numbers. A large number of these coordinates can be scattered within a square of side 1. If the distance from the origin of a coordinate Z = sqrt( X2 + Y2) is less than or equal to 1, this coordinate fits within a quarter circle centred at X=0, Y=0. For a large number of coordinates, if these are genuinely random, the proportion fitting inside the quarter circle will be proportionate to the area of the quarter circle within the area of the square. The area of the quarter circle is PI/4 and the area of the square is 1. The following program computes an estimate for PI by comparing these 2 areas, based on the assumption that the coordinates are randomly scattered within the square.

Depending upon on how close to actual PI this estimate is in practice, this gives us one measure of the randomness of the data used to create the coordinates.

Source Code

/* randpi.c
 * Richard Kay, April 2006
 * Computes PI based on proportion of scattered random X,Y points 
 * being within the area of a quarter circle of radius 1 within
 * a square of side 1. The area of the quarter circle is PI/4
 * and the area of the square is 1. If there is a random distribution
 * of points within the square, then PI/4 of all points should be
 * within the quarter circle. If the proportion within the quarter
 * circle diverges from PI/4 this indicates that the random data is
 * not evenly scattered.  */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h> 

#define RFILE "random.bin" /* binary file containing random data */
#define PI 3.1415926 /* actual value to 7DP */


int getxycoord(FILE *rfh,float *x,float *y);
		/* reads next 2 random 32 bit ints from random file,
		 * converts to an x,y coordinate within a square of 
		 * side 1.0 and returns true if successful or false if EOF */

int main(void){
	FILE *rfh;
	unsigned long int inqcirc=0, counted=0, max;
	float x,y,z,myPI;
	
	printf("enter max sample size\n");
	scanf("%lu",&max);	
	if((rfh=fopen(RFILE,"r"))==NULL){
               printf("error opening random data file\n");
               return 1;
        }
	while(getxycoord(rfh,&x,&y) && counted < max){
		z=sqrt(x*x+y*y); /* pythagarus theorem */
		counted++;
		if(z<=1.0)
			inqcirc++;
	}
	myPI= 4.0*((float)inqcirc/counted);
	printf("Sample: %lu Estimate of PI: %f Actual PI: %f\n",
		counted,myPI,PI);
	printf("Estimated PI/Actual PI: %f\n",myPI/PI);
	return 0;
}

int getxycoord(FILE *rfh,float *x,float *y){

        /* reads next 2 random 32 bit int from random file. 
	 * Converts pair to a floating point X,Y coordinate
	 * with X and Y values in the range 0.0 to 1.0 . 
	 *
	 * Returns 1 if successful, 0 if file exhausted. */
	static double *twoxx32=NULL; 
	unsigned int random;
	float randf;
	if(!twoxx32){ /* only compute 2**32 on first call */
		twoxx32=(double*)malloc(sizeof(double));
		*twoxx32=pow(2,32);
	}
	/* read and convert x coordinate */
        if(feof(rfh))
        	return 0; /* end of file */ 
        fread(&random,sizeof(unsigned int),(size_t)1,rfh);
	randf=(float)random;
	*x=(float) (randf/ *twoxx32);
	/* read and convert y coordinate */
        if(feof(rfh))
        	return 0; /* end of file */ 
        fread(&random,sizeof(unsigned int),(size_t)1,rfh);
	randf=(float)random;
	*y=(float) (randf/ *twoxx32);
        return 1;
}

Program output

Randomly seeded pseudo-random test data was generated by copying 512MBytes of random data from the Linux /dev/urandom device.

[rich@saturn pwgen]$ dd if=/dev/urandom of=random.bin count=1000000 bs=512
1000000+0 records in
1000000+0 records out
[rich@saturn pwgen]$ ./randpi
enter max sample size
64000000
Sample: 64000000 Estimate of PI: 3.141639 Actual PI: 3.141593
Estimated PI/Actual PI: 1.000015

Using random data to generate passwords

Passwords chosen by humans often have too little entropy. When an individual is required to choose a password, one is often selected which an attacker would find very easy to guess, e.g. the name of the user's football team, pet, friend or partner or one of a list of the 50 most popular passwords. The advantage of getting the user to choose a password is that there is a better chance that the individual won't have to write it down. If the risk mitigated by use of a password is from a remote system attacker rather than a local one, it is better for a strong randomly generated password to be written down than for a weak password to be used.

/* program to generate a set of 8 char passwords capable
 * of being printed and read and input from a keyboard using
 * a restricted character set */


#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define RFILE "random.bin" /* binary file containing random data */

unsigned int getrand(FILE*); /* reads next random 32 bit int from random file */

void mkpasswd(char *pw,FILE *rfh);
 /* inserts next 8 char password into pw string */

/* set of 54 (23+23+8) easily typed and distinguished characters
 * from which an 8 character password is made up. Easily confused characters
 * when printed i.e. letters ILO and numbers 01 are not in this set */
char charset[]="ABCDEFGHJKMNPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz23456789";

int main(void){
        char password[9];
        int npw, i;
        FILE *rfh;
        if((rfh=fopen(RFILE,"r"))==NULL){
                printf("error opening random data file\n");
                return 1;
        }
        printf("How many passwords ?\n");
        scanf("%d",&npw);
        for(i=0;i<npw;i++){
                mkpasswd(password,rfh);
                printf("%s\n",password);
        }
        return 0;
}

unsigned int getrand(FILE *rfh){
        /* reads next random 32 bit int from random file */
        unsigned int r;
        if(feof(rfh)){
                printf("not enough data in file\n");
                exit(1);
        }
        fread(&r,sizeof(unsigned int),(size_t)1,rfh);
        return r;
}


void mkpasswd(char *pw, FILE *rfh){
        /* inserts next 8 char password into pw string */
        int i=0,j=0,k=0,rem,nchars;
        unsigned int rand;
        nchars=strlen(charset);
        for(i=0;i<2;i++){
                rand=getrand(rfh); /* each rand good for >4 chars */
                for(j=0;j<4;j++){
                        rem=rand%nchars;
                        pw[k++]=charset[rem];
                        rand/=nchars;
                }
        }
        pw[k]='\0';
        return;
}

Program output

How many passwords ?
8
Gdh6sWH3
cpUXETpc
zVpj6an8
fjQhfM9S
VagGz3rF
tJhAgJGm
6XVqReQ2
WxQA5mxx