These two processes are very much interrelated, since most algorithms for searching for a particular item in an array require that the data is in alphabetic/numeric order.
External sorting refers to the sorting of data stored in auxiliary memory, such as on magnetic tape or disc; this is normally referred to as a file. In such cases, the data may be only accessible sequentially; i.e. record 1 must be read before record 2 etc.
The algorithms used to sort/search data will obviously depend on whether it is sorted in main or auxiliary memory. We shall initially consider internal sorting/searching algorithms.
If such an array is in order, then the field used to determine the order is known as the key. In the example just given, it is likely that the name field will be the key; i.e. the array will be sorted such that the names are in alphabetical order. When searching the array, we would examine the name field of each record. When the required record is found, the information required will be in the mark field of the same record.
Each entry in a computerised personnel file might consist of a record with four fields:
employee_number, employee_name, date_of_birth, address
The array could be in alphabetical order of the employees' names. It two or more employees had identical names, then their records could be sorted according to their date of birth. If two or more employees had identical names and dates of birth, then their records could be sorted according to their addresses. In this case, the key is said to be 'according to address, within date of birth within name'.
In a practical data processing application, because employees might share names, dates of birth and addresses an additional field: employee_number is likely to be allocated to ensure that each employee record has a unique key.
While sorting or searching an array, it will only be the record field
containing the key that will be examined; the nature of the remainder of
the record will be of less concern. Thus for simplicity the following sections
assume each element of the array contains only a single integer numeric
value. In a more practical application each element of of the array would
contain a structure record with member fields.
In general, a function to search an array for a particular value
will require a function as shown below:
int search( int value );
It is assumed in this example that the array to be searched is a global
variable: a that contains 100 elements. The integer parameter: value
is the value to be searched for; obviously this could be a character or
other type of variable as required depending upon the key being used. The
integer value returned indicates the position in the array at which the
required value was found if successful or -1 to indicate an unsuccessful
search.
The simplest way of determining if a particular value is present
in an array is to examine each entry in turn until the required value has
been found. If the required value does not appear in the array, then the
action taken depends on whether the array is sorted or not.
When searching an unsorted array, it is not until the last element
used has been examined that it is proven that a particular value does not
appear in the array. Thus searching must continue either until the required
value is found or until all elements of the array in use have been examined.
The following program segment searches through array 'a' until it finds
the required value. This program must terminate if 'value' is not found
in array 'a'. This is achieved by checking the value of 'posn'.
int posn=0;
for (posn=0; posn<100; posn++){
if (a[posn]==value )
break;
}
In this case, however the only indication of whether the search has terminated
because 'value' has been found or because the whole array has been searched
and value has not been found is posn having an invalid value of 100. This
may be better signalled by returning -1 if the search was unsuccessful.
A complete sequential function based on the above is:
int search( int value ){
int posn;
for (posn=0; posn<100; posn++)
if (a[posn]==value )
return posn;
return -1;
}
If the elements of 'a' are in ascending order, then it is possible
to determine that the required value is not in the list as soon as an element
of the array is examined and found to be larger than 'value'. The change
to the for statement is shown below.
for (posn=0; posn<100 && a[posn]<=value; posn++)This will obviously speed up the search when the required value is not in the array.
We also don't want to search beyond the last position in the array
used. In practical applications the number of records present in an array
is likely to be less than the size of an array and the program will need
to keep track of the maximum index used. This can be achieved by giving
our search function an additional integer parameter: high to indicate
the maximum array index used (this is one less than the number of elements
used given we start counting them at 0 not 1). This is achieved below:
int search( int value, int high ){
int posn;
for (posn=0; posn<=high && a[posn] <= value ; posn++)
if (a[posn]==value )
return posn;
return -1;
}
A binary search is analogous to the way you look up a word in a dictionary or a name in a telephone directory. You might grasp the next paragraph more easily if you first think of a word and then look it up in a dictionary, or look up the telephone number for a friend or business in the phone book, writing down as you carry out your search exactly how you decide where you will make each guess about where in the dictionary or phone book you will next look for the entry.
The middle element of the list to be searched is compared with the required value from which it can be determined which half of the list the required value is in. The process in then repeated on the relevant half of the list. Each iteration of the process reduces the number of items still to be searched by a factor of 2, until eventually only one item remains.
int search( int value, int high){ /* implements binary split search algorithm */
int low = 0, mid;
do
{
mid = ( low + high ) / 2;
if ( value < a[mid] )
high = mid - 1;
if ( value > a[mid] )
low = mid + 1;
} while ( value != a[mid] && low <= high );
if ( value == a[mid] )
return mid;
else
return -1;
}
| Number of Comparisons | Unsorted | Sorted |
| Sequential found | n/2 | n/2 |
| Sequential not found | n | n/2 |
| Binary found | - | 1+log2n |
| Binary not found | - | 1+log2n |
Thus the sequential search takes on average n/2 comparisons to find
a key; ie 500,000 comparisons to search 1,000,000 items. The binary search
takes about 20 comparisons to search the same number of items. Frequently
we will simply quote the order of the algorithm; ie the sequential search
is of order n whilst the binary search is of order log2n.
Swapping two records will require three moves and involve a third temporary record. Assuming the following declarations:
struct student_info {
char name[20];
int mark;
}
typedef struct student_info STUDENT_INFO;
STUDENT_INFO student[100],temp;
Then to swap records i and j we must use:
temp = student[i]; /* copy all data on student[i] into temp */ student[i] = student[j]; student[j] = temp;
| pass | comparisons | exchanges | moves |
| 1st | (n-1) | (n-1)/2 exchanges (average) | 3(n-1)/2 |
| 2nd | (n-2) | (n-2)/2 exchanges (average) | 3(n-2)/2 |
| all passes | = (n-1)+(n-2) .... +1 = n(n-1)/2 =~ n2/2 | = n(n-1)/4=~ n2/4 (average) | ~3n2/4 moves |
| pass | comparisons | exchanges | moves |
| 1st | n-1 | 1 exchange | 3(n-1)/2 |
| 2nd | n-2 | 1 exchanges | 3(n-2)/2 |
| all passes | = (n-1)+(n-2) .... +1 = n(n-1)/2 =~ n2/2 | = n-1 (maximum) | ~3n moves |
Perform a second pass on the first (n-1) items, after which the last
2 items will be in place.
(n-1) passes will be required to sort an array containing n items.
At the end of the ith pass the last i items will be ordered.
If the pass is made in which no exchanges are required, then the list
is in order,
even if less than (n-1) passes have been made.
| pass | comparisons | exchanges | moves |
| 1st | (n-1) | (n-1)/2 exchanges (average) | 3(n-1)/2 |
| 2nd | (n-2) | (n-2)/2 exchanges (average) | 3(n-2)/2 |
| all passes | = (n-1)+(n-2) .... +1 = n(n-1)/2 =~ n2/2 | = n(n-1)/4=~ n2/4 (average) | ~3n2/4 moves |
| pass | comparisons | moves |
| i'th | i/2 (average) | (i/2) +1 |
| all passes | = (1.5+2.5+3.5 ... +n-0.5) = n(n-1)/4 =~ n2/4 | =~ n2/4 |
| pass | comparisons | moves |
| i'th | log2i (average) | (i/2) +1 |
| all passes | = (log21 + log22 +log23 ... + log2(n-1) ) < nlog2n | =~ n2/4 |
Merging is accomplished by comparing the 2 lowest (uncopied) keys and
copying the lowest into a temporary array.
Repeat comparing uncopied keys in both halves until all keys in at
least one half have been copied.
Finally copy any remaining (already sorted) keys in the other half.
This algorithm is recursive, in the sense that it calls upon itself. Each time the sort is called it calls itself twice, once for each array (portion) half. This continues with each half calling upon the sort twice. until halves contain only 1 record.
Recursion is a topic about which there will be more later in this module.
| passes | comparisons | moves |
| log2n | (nlog2n)/2 | 2nlog2n |
| Type of sort | comparisons | moves |
| Exchange | n2/2 | 3n2/4 |
| Selection | n2/2 | 3n |
| Bubble | n2/2 | 3n2/2 |
| Linear Insertion | n2/2 | n2/4 |
| Binary Insertion | < nlog2n | n2/4 |
| Merge | (nlog2n)/2 | 2nlog2n |
/* Exchange sort */
void sort(void)
{
int temp,passno,i;
for ( passno = 0 ; passno < n - 1 ; passno++)
for ( i = passno+1 ; i < n ; i++ )
if ( a[i] < a[passno] )
{
temp = a[i];
a[i] = a[passno];
a[passno] = temp;
}
}
/*----------------------------------------------------------------------------------------*/
45 84 12 9 34 1 62 87 3 51
1 84 45 12 34 9 62 87 3 51
1 3 84 45 34 12 62 87 9 51
1 3 9 84 45 34 62 87 12 51
1 3 9 12 84 45 62 87 34 51
1 3 9 12 34 84 62 87 45 51
1 3 9 12 34 45 84 87 62 51
1 3 9 12 34 45 51 87 84 62
1 3 9 12 34 45 51 62 87 84
1 3 9 12 34 45 51 62 84 87
/*--------------------------------------------------------------------------------------*/
/*Selection sort*/
/* Search positions first..n of array */
/* for the position of the smallest value */
void sort(void)
{
int temp,passno,i,smallest;
for ( passno = 0 ; passno < n - 1 ; passno++)
{
smallest = search(passno);
temp = a[smallest];
a[smallest] = a[passno];
a[passno] = temp;
}
}
int search(int first)
{
int tinyest, posn;
tinyest = first;
for ( posn = first+1; posn < n; posn++)
if ( a[posn] < a[tinyest] )
tinyest = posn;
return tinyest;
}
/*----------------------------------------------------------------------------------------*/
45 84 12 9 34 1 62 87 3 51
1 84 12 9 34 45 62 87 3 51
1 3 12 9 34 45 62 87 84 51
1 3 9 12 34 45 62 87 84 51
1 3 9 12 34 45 62 87 84 51
1 3 9 12 34 45 62 87 84 51
1 3 9 12 34 45 62 87 84 51
1 3 9 12 34 45 51 87 84 62
1 3 9 12 34 45 51 62 84 87
1 3 9 12 34 45 51 62 84 87
/*----------------------------------------------------------------------------------------*/
/*Standard bubble sort */
void sort(void)
{
int temp,passno,i;
for ( passno = 0 ; passno < n - 1 ; passno++)
for ( i = 0 ; i < n-passno ; i++ )
if ( a[i] > a[i+1] )
{
temp = a[i];
a[i] = a[i+1];
a[i+1] = temp;
}
}
/*----------------------------------------------------------------------------------------*/
45 84 12 9 34 1 62 87 3 51
45 12 9 34 1 62 84 3 51 87
12 9 34 1 45 62 3 51 84 87
9 12 1 34 45 3 51 62 84 87
9 1 12 34 3 45 51 62 84 87
1 9 12 3 34 45 51 62 84 87
1 9 3 12 34 45 51 62 84 87
1 3 9 12 34 45 51 62 84 87
1 3 9 12 34 45 51 62 84 87
1 3 9 12 34 45 51 62 84 87
/*-----------------------------------------------------------------------------------------*/
/* Insertion Sort */
/* Linear search of the sorted portion of the array */
/* to find the correct position to insert 'key'. */
/* Position returned to 'insert'. */
void sort(void)
{
int key,passno,i,insert;
for ( passno = 0 ; passno < n - 1 ; passno++)
{
key = a[passno+1];
insert = search(passno,key);
for(i = passno; i >= insert; i--)
a[i+1] = a[i];
a[insert] = key;
}
}
int search(int high, int key)
{
int insert=0;
while ( key >= a[insert] && insert <= high )
insert++;
return insert;
}
/*-----------------------------------------------------------------------------------------*/
45 84 12 9 34 1 62 87 3 51
45 84 12 9 34 1 62 87 3 51
12 45 84 9 34 1 62 87 3 51
9 12 45 84 34 1 62 87 3 51
9 12 34 45 84 1 62 87 3 51
1 9 12 34 45 84 62 87 3 51
1 9 12 34 45 62 84 87 3 51
1 3 9 12 34 45 62 84 87 51
1 3 9 12 34 45 51 62 84 87
/*-----------------------------------------------------------------------------------------*/