In this chapter we begin by outlining the basic processes you need to go through in order to compile your C (or C++) programs. We then proceed to formally describe the C compilation model and also how C supports additional libraries.
Creating, Compiling and Running Your Program
The stages of developing your C program are as follows. (See Appendix and exercises for more info.)
Creating the program
Create a file containing the complete program, such as the above example. You can use any ordinary editor with which you are familiar to create the file. One such editor is textedit available on most UNIX systems.
The filename must by convention end ``.c'' (full stop, lower case c), e.g. myprog.c or progtest.c. The contents must obey C syntax. For example, they might be as in the above example, starting with the line /* Sample .... (or a blank line preceding it) and ending with the line } /* end of program */ (or a blank line following it).
Compilation
There are many C compilers around. The cc being the default Sun compiler. The GNU C compiler gcc is popular and available for many platforms. PC users may also be familiar with the Borland bcc compiler.
There are also equivalent C++ compilers which are usually denoted by CC (note upper case CC. For example Sun provides CC and GNU GCC. The GNU compiler is also denoted by g++
Other (less common) C/C++ compilers exist. All the above compilers operate in essentially the same manner and share many common command line options. Below and in Appendix. we list and give example uses many of the common compiler options. However, the best source of each compiler is through the online manual pages of your system: e.g. man cc.
For the sake of compactness in the basic discussions of compiler operation we will simply refer to the cc compiler -- other compilers can simply be substituted in place of cc unless otherwise stated.
To Compile your program simply invoke the command cc. The command must be followed by the name of the (C) program you wish to compile. A number of compiler options can be specified also. We will not concern ourselves with many of these options yet, some useful and often essential options are introduced below -- See Appendix or online manual help for further details.
Thus, the basic compilation command is:
cc program.c
where program.c is the name of the file.
If there are obvious errors in your program (such as mistypings, misspelling one of the key words or omitting a semi-colon), the compiler will detect and report them.
There may, of course, still be logical errors that the compiler cannot detect. You may be telling the computer to do the wrong operations.
When the compiler has successfully digested your program, the compiled version, or executable, is left in a file called a.out or if the compiler option -o is used : the file listed after the -o.
It is more convenient to use a -o and filename in the compilation as in
cc -o program program.c
which puts the compiled program into the file program (or any file you name following the "-o" argument) instead of putting it in the file a.out .
Running the program
The next stage is to actually run your executable program. To run an executable in UNIX, you simply type the name of the file containing it, in this case program (or a.out)
This executes your program, printing any results to the screen. At this stage there may be run-time errors, such as division by zero, or it may become evident that the program has produced incorrect output.
If so, you must return to edit your program source, and recompile it, and run it again.
The C Compilation Model
We will briefly highlight key features of the C Compilation model (Fig. 2.1) here.
Source Code
Assembly Code
Libraries
Object Code
Executable Code
The C Compilation Model
The Preprocessor
The Preprocessor accepts source code as input and is responsible for
removing comments
interpreting special preprocessor directives denoted by #.
For example
#include -- includes contents of a named file. Files usually called header files. e.g
#include
#include
#define -- defines a symbolic name or constant. Macro substitution.
#define MAX_ARRAY_SIZE 100
C Compiler
The C compiler translates source to assembly code. The source code is received from the preprocessor.
Assembler
The assembler creates object code. On a UNIX system you may see files with a .o suffix (.OBJ on MSDOS) to indicate object code files.
C Basics
Before we embark on a brief tour of C's basic syntax and structure we offer a brief history of C and consider the characteristics of the C language.
In the remainder of the Chapter we will look at the basic aspects of C programs such as C program structure, the declaration of variables, data types and operators. We will assume knowledge of a high level language, such as PASCAL.
It is our intention to provide a quick guide through similar C principles to most high level languages. Here the syntax may be slightly different but the concepts exactly the same.
C does have a few surprises:
Many High level languages, like PASCAL, are highly disciplined and structured.
However beware -- C is much more flexible and free-wheeling. This freedom gives C much more power that experienced users can employ. The above example below (mystery.c) illustrates how bad things could really get.
History of C
The milestones in C's development as a language are listed below:
UNIX developed c. 1969 -- DEC PDP-7 Assembly Language
BCPL -- a user friendly OS providing powerful development tools developed from BCPL. Assembler tedious long and error prone.
A new language ``B'' a second attempt. c. 1970.
A totally new language ``C'' a successor to ``B''. c. 1971
By 1973 UNIX OS almost totally written in ``C''.
Characteristics of C
We briefly list some of C's characteristics that define the language and also have lead to its popularity as a programming language. Naturally we will be studying many of these aspects throughout the course.
Small size
Extensive use of function calls
Loose typing -- unlike PASCAL
Structured language
Low level (BitWise) programming readily available
Pointer implementation - extensive use of pointers for memory, array, structures and functions.
C has now become a widely used professional language for various reasons.
It has high-level constructs.
It can handle low-level activities.
It produces efficient programs.
It can be compiled on a variety of computers.
Its main drawback is that it has poor error detection which can make it off putting to the beginner. However diligence in this matter can pay off handsomely since having learned the rules of C we can break them. Not many languages allow this. This if done properly and carefully leads to the power of C programming.
C Program Structure
A C program basically has the following form:
Preprocessor Commands
Type definitions
Function prototypes -- declare function types and variables passed to function.
Variables
Functions
We must have a main() function.
A function has the form:
type function_name (parameters)
{
local variables
C Statements
}
If the type definition is omitted C assumes that function returns an integer type. NOTE: This can be a source of problems in a program.
So returning to our first C program:
/* Sample program */
main()
{
printf( ``I Like C \n'' );
exit ( 0 );
}
NOTE:
C requires a semicolon at the end of every statement.
printf is a standard C function -- called from main.
\n signifies newline. Formatted output -- more later.
exit() is also a standard function that causes the program to terminate. Strictly speaking it is not needed here as it is the last line of main() and the program will terminate anyway.
Let us look at another printing statement:
printf(``.\n.1\n..2\n...3\n'');
The output of this would be:
.
.1
..2
...3
Variables
C has the following simple data types:
The Pascal Equivalents are:
On UNIX systems all ints are long ints unless specified as short int explicitly.
NOTE: There is NO Boolean type in C -- you should use char, int or (better) unsigned char.
Unsigned can be used with all char and int types.
To declare a variable in C, do:
var_type list variables;
e.g. int i,j,k;
float x,y,z;
char ch;
Defining Global Variables
Global variables are defined above main() in the following way:-
short number,sum;
int bignumber,bigsum;
char letter;
main()
{
}
It is also possible to pre-initialise global variables using the = operator for assignment.
NOTE: The = operator is the same as := is Pascal.
For example:-
float sum=0.0;
int bigsum=0;
char letter=`A';
main()
{
}
This is the same as:-
float sum;
int bigsum;
char letter;
main()
{
sum=0.0;
bigsum=0;
letter=`A';
}
...but is more efficient.
C also allows multiple assignment statements using =, for example:
a=b=c=d=3;
...which is the same as, but more efficient than:
a=3;
b=3;
c=3;
d=3;
This kind of assignment is only possible if all the Variable types in the statement are the same.
You can define your own types use typedef. This will have greater relevance later in the course when we learn how to create more complex data structures.
As an example of a simple use let us consider how we may define two new types real and letter. These new types can then be used in the same way as the pre-defined C types:
typedef real float;
typedef letter char;
Variables declared:
real sum=0.0;
letter nextletter;
Printing Out and Inputting Variables
C uses formatted output. The printf function has a special formatting character (%) -- a character following this defines a certain format for a variable:
%c -- characters
%d -- integers
%f -- floats
e.g. printf(``%c %d %f'',ch,i,x);
NOTE: Format statement enclosed in ``...'', variables follow after. Make sure order of format and variable data types match up.
scanf() is the function for inputting values to a data structure: Its format is similar to printf:
i.e. scanf(``%c %d %f'',&ch,&i,&x);
NOTE: & before variables. Please accept this for now and remember to include it. It is to do with pointers which we will meet later (Section 17.4.1).
Constants
ANSI C allows you to declare constants. When you declare a constant it is a bit like a variable declaration except the value cannot be changed.
The const keyword is to declare a constant, as shown below:
int const a = 1;
const int a = 2;
Note:
You can declare the const before or after the type. Choose one an stick to it.
It is usual to initialise a const with a value as it cannot get a value any other way.
The preprocessor #define is another more flexible (see Preprocessor Chapters) method to define constants in a program.
You frequently see const declaration in function parameters. This says simply that the function is not going to change the value of the parameter.
The following function definition used concepts we have not met (see chapters on functions, strings, pointers, and standard libraries) but for completenes of this section it is is included here:
void strcpy(char *buffer, char const *string)
The second argiment string is a C string that will not be altered by the string copying standard library function.
Arithmetic Operations
As well as the standard arithmetic operators (+ - * /) found in most languages, C provides some more operators. There are some notable differences with other languages, such as Pascal.
Assignment is = i.e. i = 4; ch = `y';
Increment ++, Decrement -- which are more efficient than their long hand equivalents, for example:- x++ is faster than x=x+1.
The ++ and -- operators can be either in post-fixed or pre-fixed. With pre-fixed the value is computed before the expression is evaluated whereas with post-fixed the value is computed after the expression is evaluated.
In the example below, ++z is pre-fixed and the w- is post-fixed:
int x,y,w;
main()
{
x=((++z)-(w-)) % 100;
}
This would be equivalent to:
int x,y,w;
main()
{
z++;
x=(z-w) % 100;
w-;
}
The % (modulus) operator only works with integers.
Division / is for both integer and float division. So be careful.
The answer to: x = 3 / 2 is 1 even if x is declared a float!!
RULE: If both arguments of / are integer then do integer division.
So make sure you do this. The correct (for division) answer to the above is x = 3.0 / 2 or x= 3 / 2.0 or (better) x = 3.0 / 2.0.
There is also a convenient shorthand way to express computations in C. It is very common to have expressions like: i = i + 3 or x = x*(y + 2)
This can written in C (generally) in a shorthand form like this:
expr1 op = expr2
which is equivalent to (but more efficient than):
expr1 = expr1 op expr2
So we can rewrite i = i + 3 as i += 3
and x = x*(y + 2) as x *= y + 2.
NOTE: that x *= y + 2 means x = x*(y + 2) and
NOT x = x*y + 2.
Comparison Operators
To test for equality is = =
A warning: Beware of using ``='' instead of ``= ='', such as writing accidentally
if ( i = j ) .....
This is a perfectly LEGAL C statement (syntactically speaking) which copies the value in "j" into "i", and delivers this value, which will then be interpreted as TRUE if j is non-zero. This is called assignment by value -- a key feature of C.
Not equals is: !=
Other operators < (less than) , > (grater than), <= (less than or equals), >= (greater than or equals) are as usual.
Logical Operators
Logical operators are usually used with conditional statements which we shall meet in the next Chapter.
The two basic logical operators are:
&& for logical AND, || for logical OR.
Beware & and | have a different meaning for bitwise AND and OR ( more on this later in Chapter 12).
Order of Precedence
It is necessary to be careful of the meaning of such expressions as a + b * c
We may want the effect as either
(a + b) * c
or
a + (b * c)
All operators have a priority, and high priority operators are evaluated before lower priority ones. Operators of the same priority are evaluated from left to right, so that
a - b - c
is evaluated as
( a - b ) - c
as you would expect.
From high priority to low priority the order for all C operators (we have not met all of them yet) is:
( ) [ ] -> .
! ~ - * & sizeof cast ++ -
(these are right->left)
* / %
+ -
< <= >= >
== !=
&
^ |
&&
||
?: (right->left)
= += -= (right->left)
, (comma)
Thus
a < 10 && 2 * b < c
is interpreted as
( a < 10 ) && ( ( 2 * b ) < c )
and
a = b = spokes / spokes_per_wheel + spares;
as
a = ( b = ( spokes / spokes_per_wheel ) + spares);
Conditionals
This Chapter deals with the various methods that C can control the flow of logic in a program. Apart from slight syntactic variation they are similar to other languages.
As we have seen following logical operations exist in C:
==, !=, ||, &&.
One other operator is the unitary - it takes only one argument - not !.
These operators are used in conjunction with the following statements.
The if statement
The if statement has the same function as other languages. It has three basic forms:
if (expression)
statement
...or:
if (expression)
statement1
else
statement2
...or:
if (expression)
statement1
else if (expression)
statement2
else
statement3
For example:-
int x,y,w;
main()
{
if (x>0)
{
z=w;
........
}
else
{
z=y;
........
}
}
The ? operator
The ? (ternary condition) operator is a more efficient form for expressing simple if statements. It has the following form:
expression1 ? expression2: expression3
It simply states:
if expression1 then expression2 else expression3
For example to assign the maximum of a and b to z:
z = (a>b) ? a : b;
which is the same as:
if (a>b)
z = a;
else
z=b;
The switch statement
The C switch is similar to Pascal's case statement and it allows multiple choice of a selection of items at one level of a conditional where it is a far neater way of writing multiple if statements:
switch (expression) {
case item1:
statement1;
break;
case item2:
statement2;
break;
.
.
. case itemn:
statementn;
break;
default:
statement;
break;
}
In each case the value of itemi must be a constant, variables are not allowed.
The break is needed if you want to terminate the switch after execution of one choice. Otherwise the next case would get evaluated. Note: This is unlike most other languages.
We can also have null statements by just including a ; or let the switch statement fall through by omitting any statements (see e.g. below).
The default case is optional and catches any other cases.
For example:-
switch (letter)
{
case `A':
case `E':
case `I':
case `O':
case `U':
numberofvowels++;
break;
case ` ':
numberofspaces++;
break;
default:
numberofconstants++;
break;
}
In the above example if the value of letter is `A', `E', `I', `O' or `U' then numberofvowels is incremented.
If the value of letter is ` ' then numberofspaces is incremented.
If none of these is true then the default condition is executed, that is numberofconstants is incremented.
Looping and Iteration
This chapter will look at C's mechanisms for controlling looping and iteration. Even though some of these mechanisms may look familiar and indeed will operate in standard fashion most of the time. NOTE: some non-standard features are available.
The for statement
The C for statement has the following form:
for (expression1; expression 2; expression3)
statement;
or {block of statements}
expression1 initialises; expression2 is the terminate test; expression3 is the modifier (which may be more than just simple increment);
NOTE: C basically treats for statements as while type loops
For example:
int x;
main()
{
for (x=3;x>0;x-)
{
printf("x=%d\n",x);
}
}
...outputs:
x=3
x=2
x=1
...to the screen
All the following are legal for statements in C. The practical application of such statements is not important here, we are just trying to illustrate peculiar features of C for that may be useful:-
for (x=0;((x>3) && (x<9)); x++)
for (x=0,y=4;((x>3) && (y<9)); x++,y+=2)
for (x=0,y=4,z=4000;z; z/=10)
The second example shows that multiple expressions can be separated a ,.
In the third example the loop will continue to iterate until z becomes 0;
The while statement
The while statement is similar to those used in other languages although more can be done with the expression statement -- a standard feature of C.
The while has the form:
while (expression)
statement
For example:
int x=3;
main(){
while (x>0)
{
printf("x=%d\n",x);
x-;
}
}
...outputs:
x=3
x=2
x=1
...to the screen.
Because the while loop can accept expressions, not just conditions, the following are all legal:-
while (x-);
while (x=x+1);
while (x+=5);
Using this type of expression, only when the result of x-, x=x+1, or x+=5, evaluates to 0 will the while condition fail and the loop be exited.
We can go further still and perform complete operations within the while expression:
while (i++ < 10);
while ( (ch = getchar()) != `q')
putchar(ch);
The first example counts i up to 10.
The second example uses C standard library functions (See Chapter 18) getchar() - reads a character from the keyboard - and putchar() - writes a given char to screen. The while loop will proceed to read from the keyboard and echo characters to the screen until a 'q' character is read. NOTE: This type of operation is used a lot in C and not just with character reading!! (See Exercises).
C's do-while statement has the form:
do
statement;
while (expression);
It is similar to PASCAL's repeat ... until except do while expression is true.
For example:
int x=3;
main()
{
do {
printf("x=%d\n",x-);
}
while (x>0);
}
..outputs:-
x=3
x=2
x=1
NOTE: The postfix x- operator which uses the current value of x while printing and then decrements x.
break and continue
C provides two commands to control how we loop:
break -- exit form loop or switch.
continue -- skip 1 iteration of loop.
Consider the following example where we read in integer values and process them according to the following conditions. If the value we have read is negative, we wish to print an error message and abandon the loop. If the value read is great than 100, we wish to ignore it and continue to the next value in the data. If the value is zero, we wish to terminate the loop.
while (scanf( ``%d'', &value ) == 1 && value != 0)
{
if (value < 0)
{
printf(``Illegal value\n'');
break; /* Abandon the loop */
}
if (value > 100)
{
printf(``Invalid value\n'');
continue; /* Skip to start loop again */
}
/* Process the value read */
/* guaranteed between 1 and 100 */
....;
....;
} /* end while value != 0 */
Arrays and Strings
In principle arrays in C are similar to those found in other languages. As we shall shortly see arrays are defined slightly differently and there are many subtle differences due the close link between array and pointers. We will look more closely at the link between pointer and arrays later in Chapter 9.
Single and Multi-dimensional Arrays
Let us first look at how we define arrays in C:
int listofnumbers[50];
BEWARE: In C Array subscripts start at 0 and end one less than the array size. For example, in the above case valid subscripts range from 0 to 49. This is a BIG difference between C and other languages and does require a bit of practice to get in the right frame of mind.
Elements can be accessed in the following ways:-
thirdnumber=listofnumbers[2];
listofnumbers[5]=100;
Multi-dimensional arrays can be defined as follows:
int tableofnumbers[50][50];
for two dimensions.
For further dimensions simply add more [ ]:
int bigD[50][50][40][30]......[50];
Elements can be accessed in the following ways:
anumber=tableofnumbers[2][3];
tableofnumbers[25][16]=100;
Strings
In C Strings are defined as arrays of characters. For example, the following defines a string of 50 characters: char name[50];
C has no string handling facilities built in and so the following are all illegal:
char firstname[50],lastname[50],fullname[100];
firstname= "Arnold"; /* Illegal */
lastname= "Schwarznegger"; /* Illegal */
fullname= "Mr"+firstname +lastname; /* Illegal */
However, there is a special library of string handling routines which we will come across later.
To print a string we use printf with a special %s control character: printf(``%s'',name);
NOTE: We just need to give the name of the string.
In order to allow variable length strings the \0 character is used to indicate the end of a string.
So we if we have a string, char NAME[50]; and we store the ``DAVE'' in it its contents will look like:
NAME
0 49
Further Data Types
This Chapter discusses how more advanced data types and structures can be created and used in a C program.
Structures
Structures in C are similar to records in Pascal. For example:
struct gun
{
char name[50];
int magazinesize;
float calibre;
};
struct gun arnies;
defines a new structure gun and makes arnies an instance of it.
NOTE: that gun is a tag for the structure that serves as shorthand for future declarations. We now only need to say struct gun and the body of the structure is implied as we do to make the arnies variable. The tag is optional.
Variables can also be declared between the } and ; of a struct declaration, i.e.:
struct gun
{
char name[50];
int magazinesize;
float calibre;
} arnies;
struct's can be pre-initialised at declaration:
struct gun arnies={"Uzi",30,7};
which gives arnie a 7mm. Uzi with 30 rounds of ammunition.
To access a member (or field) of a struct, C provides the . operator. For example, to give arnie more rounds of ammunition:
arnies.magazineSize=100;
Defining New Data Types
typedef can also be used with structures. The following creates a new type agun which is of type struct gun and can be initialised as usual:
typedef struct gun
{
char name[50];
int magazinesize;
float calibre;
} agun;
agun arnies={"Uzi",30,7};
Here gun still acts as a tag to the struct and is optional. Indeed since we have defined a new data type it is not really of much use, agun is the new data type. arnies is a variable of type agun which is a structure.
C also allows arrays of structures:
typedef struct gun
{
char name[50];
int magazinesize;
float calibre;
} agun;
agun arniesguns[1000];
This gives arniesguns a 1000 guns. This may be used in the following way:
arniesguns[50].calibre=100;
gives Arnie's gun number 50 a calibre of 100mm, and:
itscalibre=arniesguns[0].calibre;
assigns the calibre of Arnie's first gun to itscalibre.
Unions
A union is a variable which may hold (at different times) objects of different sizes and types. C uses the union statement to create unions, for example:
union number
{
short shortnumber;
long longnumber;
double floatnumber;
} anumber
defines a union called number and an instance of it called anumber. number is a union tag and acts in the same way as a tag for a structure.
Members can be accessed in the following way:
printf("%ld\n",anumber.longnumber);
This clearly displays the value of longnumber.
When the C compiler is allocating memory for unions it will always reserve enough room for the largest member (in the above example this is 8 bytes for the double).
In order that the program can keep track of the type of union variable being used at a given time it is common to have a structure (with union embedded in it) and a variable which flags the union type:
An example is:
typedef struct
{ int maxpassengers;
} jet;
typedef struct
{ int liftcapacity;
} helicopter;
typedef struct
{ int maxpayload;
} cargoplane;
typedef union
{ jet jetu;
helicopter helicopteru;
cargoplane cargoplaneu;
} aircraft;
typedef struct
{ aircrafttype kind;
int speed;
aircraft description;
} an_aircraft;
This example defines a base union aircraft which may either be jet, helicopter, or cargoplane.
In the an_aircraft structure there is a kind member which indicates which structure is being held at the time.
Coercion or Type-Casting
C is one of the few languages to allow coercion, that is forcing one variable of one type to be another type. C allows this using the cast operator (). So:
int integernumber;
float floatnumber=9.87;
integernumber=(int)floatnumber;
assigns 9 (the fractional part is thrown away) to integernumber.
And:
int integernumber=10;
float floatnumber;
floatnumber=(float)integernumber;
assigns 10.0 to floatnumber.
Coercion can be used with any of the simple data types including char, so:
int integernumber;
char letter='A';
integernumber=(int)letter;
assigns 65 (the ASCII code for `A') to integernumber.
Some typecasting is done automatically -- this is mainly with integer compatibility.
A good rule to follow is: If in doubt cast.
Another use is the make sure division behaves as requested: If we have two integers internumber and anotherint and we want the answer to be a float then :
e.g.
floatnumber =
(float) internumber / (float) anotherint;
ensures floating point division.
Enumerated Types
Enumerated types contain a list of constants that can be addressed in integer values.
We can declare types and variables as follows.
enum days {mon, tues, ..., sun} week;
enum days week1, week2;
NOTE: As with arrays first enumerated name has index value 0. So mon has value 0, tues 1, and so on.
week1 and week2 are variables.
We can define other values:
enum escapes { bell = `\a', backspace = `\b', tab = `\t',
newline = `\n', vtab = `\v', return = `\r'};
We can also override the 0 start value:
enum months {jan = 1, feb, mar, ......, dec};
Here it is implied that feb = 2 etc.
Static Variables
A static variable is local to particular function. However, it is only initialised once (on the first call to function).
Also the value of the variable on leaving the function remains intact. On the next call to the function the the static variable has the same value as on leaving.
To define a static variable simply prefix the variable declaration with the static keyword. For example:
void stat(); /* prototype fn */
main()
{ int i;
for (i=0;i<5;++i)
stat();
}
stat()
{ int auto_var = 0;
static int static_var = 0;
printf( ``auto = %d, static = %d \n'',
auto_var, static_var);
++auto_var;
++static_var;
}
Output is:
auto_var = 0, static_var = 0
auto_var = 0, static_var = 1
auto_var = 0, static_var = 2
auto_var = 0, static_var = 3
auto_var = 0, static_var = 4
Clearly the auto_var variable is created each time. The static_var is created once and remembers its value.
Pointers
Pointer are a fundamental part of C. If you cannot use pointers properly then you have basically lost all the power and flexibility that C allows. The secret to C is in its use of pointers.
C uses pointers a lot. Why?:
It is the only way to express some computations.
It produces compact and efficient code.
It provides a very powerful tool.
C uses pointers explicitly with:
Arrays,
Structures,
Functions.
NOTE: Pointers are perhaps the most difficult part of C to understand. C's implementation is slightly different DIFFERENT from other languages.
What is a Pointer?
A pointer is a variable which contains the address in memory of another variable. We can have a pointer to any variable type.
The unary or monadic operator & gives the ``address of a variable''.
The indirection or dereference operator * gives the ``contents of an object pointed to by a pointer''.
To declare a pointer to a variable do:
int *pointer;
NOTE: We must associate a pointer to a particular type: You can't assign the address of a short int to a long int, for instance.
Consider the effect of the following code:
int x = 1, y = 2;
int *ip;
ip = &x;
y = *ip;
x = ip;
*ip = 3;
It is worth considering what is going on at the machine level in memory to fully understand how pointer work. Consider Fig. 9.1. Assume for the sake of this discussion that variable x resides at memory location 100, y at 200 and ip at 1000.
Note A pointer is a variable and thus its values need to be stored somewhere. It is the nature of the pointers value that is new.
int x=1, y=2;
int *ip;
ip = &x;
x y ip
100 200 1000
y = *ip;
x y ip
100 200 1000
x = ip;
x y ip
100 200 1000
*ip = 3;
x y ip
100 200 1000
Fig. 9.1 Pointer, Variables and Memory Now the assignments x = 1 and y = 2 obviously load these values into the variables. ip is declared to be a pointer to an integer and is assigned to the address of x (&x). So ip gets loaded with the value 100.
Next y gets assigned to the contents of ip. In this example ip currently points to memory location 100 -- the location of x. So y gets assigned to the values of x -- which is 1.
We have already seen that C is not too fussy about assigning values of different type. Thus it is perfectly legal (although not all that common) to assign the current value of ip to x. The value of ip at this instant is 100.
Finally we can assign a value to the contents of a pointer (*ip).
IMPORTANT: When a pointer is declared it does not point anywhere. You must set it to point somewhere before you use it.
So ...
int *ip;
*ip = 100;
will generate an error (program crash!!).
The correct use is:
int *ip;
int x;
ip = &x;
*ip = 100;
We can do integer arithmetic on a pointer:
float *flp, *flq;
*flp = *flp + 10;
++*flp;
(*flp)++;
flq = flp;
NOTE: A pointer to any variable type is an address in memory -- which is an integer address. A pointer is definitely NOT an integer.
The reason we associate a pointer to a data type is so that it knows how many bytes the data is stored in. When we increment a pointer we increase the pointer by one ``block'' memory.
So for a character pointer ++ch_ptr adds 1 byte to the address.
For an integer or float ++ip or ++flp adds 4 bytes to the address.
Consider a float variable (fl) and a pointer to a float (flp) as shown in Fig. 9.2.
1 float(4bytes)
f1
flp ++flp flp+2
Fig. 9.2 Pointer Arithmetic Assume that flp points to fl then if we increment the pointer ( ++flp) it moves to the position shown 4 bytes on. If on the other hand we added 2 to the pointer then it moves 2 float positions i.e 8 bytes as shown in the Figure.
Pointer and Functions
Let us now examine the close relationship between pointers and C's other major parts. We will start with functions.
When C passes arguments to functions it passes them by value.
There are many cases when we may want to alter a passed argument in the function and receive the new value back once to function has finished. Other languages do this (e.g. var parameters in PASCAL). C uses pointers explicitly to do this. Other languages mask the fact that pointers also underpin the implementation of this.
The best way to study this is to look at an example where we must be able to receive changed parameters.
Let us try and write a function to swap variables around?
The usual function call:
swap(a, b) WON'T WORK.
Pointers provide the solution: Pass the address of the variables to the functions and access address of function.
Thus our function call in our program would look like this:
swap(&a, &b)
The Code to swap is fairly straightforward:
void swap(int *px, int *py)
{ int temp;
temp = *px; /* contents of pointer */
*px = *py;
*py = temp;
}
We can return pointer from functions. A common example is when passing back structures. e.g.:
typedef struct {float x,y,z;} COORD;
main()
{ COORD p1, *coord_fn(); /* declare fn to
return ptr of COORD type */
....
p1 = *coord_fn(...); /* assign contents of
address returned */
....
}
COORD *coord_fn(...)
{ COORD p;
.....
p = ....; /* assign structure values */
return &p; /* return address of p */
}
Here we return a pointer whose contents are immediately unwrapped into a variable. We must do this straight away as the variable we pointed to was local to a function that has now finished. This means that the address space is free and can be overwritten. It will not have been overwritten straight after the function ha squit though so this is perfectly safe.
Pointers and Arrays
Pointers and arrays are very closely linked in C.
Hint: think of array elements arranged in consecutive memory locations.
Consider the following:
int a[10], x;
int *pa;
pa = &a[0]; /* pa pointer to address of a[0] */
x = *pa; /* x = contents of pa (a[0] in this
case) */
0 1 ………… 9
a
pa ++pa pa+i
Fig. 9.3 Arrays and Pointers
To get somewhere in the array (Fig. 9.3) using a pointer we could do:
pa + i º a[i]
WARNING: There is no bound checking of arrays and pointers so you can easily go beyond array memory and overwrite other things.
C however is much more subtle in its link between arrays and pointers.
For example we can just type
pa = a;
instead of pa = &a[0]
and
a[i] can be written as *(a + i).
i.e. &a[i] º a + i.
We also express pointer addressing like this:
pa[i] º *(pa + i).
However pointers and arrays are different:
A pointer is a variable. We can do
pa = a and pa++.
An Array is not a variable. a = pa and a++ ARE ILLEGAL.
This stuff is very important. Make sure you understand it. We will see a lot more of this.
We can now understand how arrays are passed to functions.
When an array is passed to a function what is actually passed is its initial elements location in memory.
So: strlen(s) º strlen(&s[0])
This is why we declare the function:
int strlen(char s[]);
An equivalent declaration is : int strlen(char *s);
since char s[] ºchar *s.
strlen() is a standard library function (Chapter 18) that returns the length of a string. Let's look at how we may write a function:
int strlen(char *s)
{ char *p = s;
while (*p != `\0);
p++;
return p-s;
}
Now lets write a function to copy a string to another string. strcpy() is a standard library function that does this.
void strcpy(char *s, char *t)
{ while ( (*s++ = *t++) != `\0);}
This uses pointers and assignment by value.
Very Neat!! NOTE: Uses of Null statements with while.
Arrays of Pointers
We can have arrays of pointers since pointers are variables.
Example use:
Sort lines of text of different length.
NOTE: Text can't be moved or compared in a single operation.
Arrays of Pointers are a data representation that will cope efficiently and conveniently with variable length text lines.
How can we do this?:
Store lines end-to-end in one big char array (Fig. 9.4). \n will delimit lines.
Store pointers in a different array where each pointer points to 1st char of each new line.
Compare two lines using strcmp() standard library function.
If 2 lines are out of order -- swap pointer in pointer array (not text).
TEXT: ABC….\n DEF…..\n CAT……..\n ……..
P[0] P[1] P[2]
P P
0 0
1
2
Fig. 9.4 Arrays of Pointers (String Sorting Example)
This eliminates:
complicated storage management.
high overheads of moving lines.
Multidimensional arrays and pointers
We should think of multidimensional arrays in a different way in C:
A 2D array is really a 1D array, each of whose elements is itself an array
Hence
a[n][m] notation.
Array elements are stored row by row.
When we pass a 2D array to a function we must specify the number of columns -- the number of rows is irrelevant.
+The reason for this is pointers again. C needs to know how many columns in order that it can jump from row to row in memory.
Considerint a[5][35] to be passed in a function:
We can do:
f(int a[][35]) {.....}
or even:
f(int (*a)[35]) {.....}
We need parenthesis (*a) since [] have a higher precedence than *
So:
int (*a)[35]; declares a pointer to an array of 35 ints.
int *a[35]; declares an array of 35 pointers to ints.
Now lets look at the (subtle) difference between pointers and arrays. Strings are a common application of this.
Consider:
char *name[10];
char Aname[10][20];
We can legally do name[3][4] and Aname[3][4] in C.
However
Aname is a true 200 element 2D char array.
access elements via
20*row + col + base_address
in memory.
name has 10 pointer elements.
NOTE: If each pointer in name is set to point to a 20 element array then and only then will 200 chars be set aside (+ 10 elements).
The advantage of the latter is that each pointer can point to arrays be of different length.
Consider:
char *name[] = { ``no month'', ``jan'',
``feb'', ... };
char Aname[][15] = { ``no month'', ``jan'',
``feb'', ... };
aname 15 Elements
13
name
0 no month\0
1 jan\0
2 feb\0
2D Arrays and Arrays of Pointers
Static Initialisation of Pointer Arrays
Initialisation of arrays of pointers is an ideal application for an internal static array.
some_fn()
{ static char *months = { ``no month'',
``jan'', ``feb'',...};
}
static reserves a private permanent bit of memory.
Pointers and Structures
These are fairly straight forward and are easily defined. Consider the following:
struct COORD {float x,y,z;} pt;
struct COORD *pt_ptr;
pt_ptr = &pt; /* assigns pointer to pt */
the -> operator lets us access a member of the structure pointed to by a pointer.i.e.:
pt_ptr -> x = 1.0;
pt_ptr -> y = pt_ptr -> y - 3.0;
Example: Linked Lists
typedef struct { int value;
ELEMENT *next;
} ELEMENT;
ELEMENT n1, n2;
n1.next = &n2;
n1 n2
Fig. Linking Two Nodes
NOTE: We can only declare next as a pointer to ELEMENT. We cannot have a element of the variable type as this would set up a recursive definition which is NOT ALLOWED. We are allowed to set a pointer reference since 4 bytes are set aside for any pointer.
Common Pointer Pitfalls
Here we will highlight two common mistakes made with pointers.
Not assigning a pointer to memory address before using it
int *x;
*x = 100;
we need a physical location say: int y;
x = &y;
*x = 100;
This may be hard to spot. NO COMPILER ERROR. Also x could some random address at initialisation.
Illegal indirection
Suppose we have a function malloc() which tries to allocate memory dynamically (at run time) and returns a pointer to block of memory requested if successful or a NULL pointer
otherwise.
char *malloc() -- a standard library function (see later).
Let us have a pointer: char *p;
Consider:
*p = (char *) malloc(100); /* request 100 bytes of memory */
*p = `y';
There is mistake above. What is it?
No * in
*p = (char *) malloc(100);
Malloc returns a pointer. Also p does not point to any address.
The correct code should be:
p = (char *) malloc(100);
If code rectified one problem is if no memory is available and p is NULL. Therefore we can't do:
*p = `y';.
A good C program would check for this:
p = (char *) malloc(100);
if ( p == NULL)
{ printf(``Error: Out of Memory \n'');
exit(1);
}
*p = `y';
Dynamic Memory Allocation and Dynamic Structures
Dynamic allocation is a pretty unique feature to C (amongst high level languages). It enables us to create data types and structures of any size and length to suit our programs need within the program.
We will look at two common applications of this:
dynamic arrays
dynamic data structure e.g. linked lists
Malloc, Sizeof, and Free
The Function malloc is most commonly used to attempt to ``grab'' a continuous portion of memory. It is defined by:
void *malloc(size_t number_of_bytes)
That is to say it returns a pointer of type void * that is the start in memory of the reserved portion of size number_of_bytes. If memory cannot be allocated a NULL pointer is returned.
Since a void * is returned the C standard states that this pointer can be converted to any type.
The size_t argument type is defined in stdlib.h and is an unsigned type.
So:
char *cp;
cp = malloc(100);
attempts to get 100 bytes and assigns the start address to cp.
Also it is usual to use the sizeof() function to specify the number of bytes:
int *ip;
ip = (int *) malloc(100*sizeof(int));
Some C compilers may require to cast the type of conversion. The (int *) means coercion to an integer pointer. Coercion to the correct pointer type is very important to ensure pointer arithmetic is performed correctly. I personally use it as a means of ensuring that I am totally correct in my coding and use cast all the time.
It is good practice to use sizeof() even if you know the actual size you want -- it makes for device independent (portable) code.
sizeof can be used to find the size of any data type, variable or structure. Simply supply one of these as an argument to the function.
SO:
int i;
struct COORD {float x,y,z};
typedef struct COORD PT;
sizeof(int), sizeof(i),
sizeof(struct COORD) and
sizeof(PT) are all ACCEPTABLE
In the above we can use the link between pointers and arrays to treat the reserved memory like an array. i.e we can do things like:
ip[0] = 100;
or
for(i=0;i<100;++i) scanf("%d",ip++);
When you have finished using a portion of memory you should always free() it. This allows the memory freed to be aavailable again, possibly for further malloc() calls
The function free() takes a pointer as an argument and frees the memory to which the pointer refers.
Calloc and Realloc
There are two additional memory allocation functions, Calloc() and Realloc(). Their prototypes are given below:
void *calloc(size_t num_elements, size_t element_size};
void *realloc( void *ptr, size_t new_size);
Malloc does not initialise memory (to zero) in any way. If you wish to initialise memory then use calloc. Calloc there is slightly more computationally expensive but, occasionally, more convenient than malloc. Also note the different syntax between calloc and malloc in that calloc takes the number of desired elements, num_elements, and element_size, element_size, as two individual arguments.
Thus to assign 100 integer elements that are all initially zero you would do:
int *ip;
ip = (int *) calloc(100, sizeof(int));
Realloc is a function which attempts to change the size of a previous allocated block of memory. The new size can be larger or smaller. If the block is made larger then the old contents remain unchanged and memory is added to the end of the block. If the size is made smaller then the remaining contents are unchanged.
If the original block size cannot be resized then realloc will attempt to assign a new block of memory and will copy the old block contents. Note a new pointer (of different value) will consequently be returned. You must use this new value. If new memory cannot be reallocated then realloc returns NULL.
Thus to change the size of memory allocated to the *ip pointer above to an array block of 50 integers instead of 100, simply do: ip = (int *) calloc( ip, 50);
Linked Lists
Let us now return to our linked list example:
typedef struct { int value;
ELEMENT *next;
} ELEMENT;
We can now try to grow the list dynamically:
link = (ELEMENT *) malloc(sizeof(ELEMENT));
This will allocate memory for a new link.
If we want to deassign memory from a pointer use the free()
function:
free(link)
Advanced Pointer Topics
We have introduced many applications and techniques that use pointers. We have introduced some advanced pointer issues already. This chapter brings together some topics we have briefly mentioned and others to complete our study C pointers.
In this chapter we will:
Examine pointers to pointers in more detail.
See how pointers are used in command line input in C.
Study pointers to functions
Pointers to Pointers
We introduced the concept of a pointer to a pointer previously. You can have a pointer to a pointer of any type.
Consider the following:
char ch; /* a character */
char *pch; /* a pointer to a character */
char **ppch; /* a pointer to a pointer to a character */
We can visualise this in Figure 11.1. Here we can see that **ppch refers to memory address of *pch which refers to the memory address of the variable ch. But what does this mean in practice?
Ppch pch ch
Fig. 11.1 Pointers to pointers Recall that char * refers to a (NULL terminated string. So one common and convenient notion is to declare a pointer to a pointer to a string (Figure 11.2)
ppch pch
Fig. 11.2 Pointer to String Taking this one stage further we can have several strings being pointed to by the pointer (Figure 11.3)
Fig. 11.3 Pointer to Several Strings We can refer to individual strings by ppch[0], ppch[1], ..... Thus this is identical to declaring char *ppch[].
One common occurrence of this type is in C command line argument input which we now consider.
Command line input
C lets read arguments from the command line which can then be used in our programs.
We can type arguments after the program name when we run the program.
We have seen this with the compiler for example
c89 -o prog prog.c
c89 is the program, -o prog prog.c the arguments.
In order to be able to use such arguments in our code we must define them as follows:
main(int argc, char **argv)
So our main function now has its own arguments. These are the only arguments main accepts.
argc is the number of arguments typed -- including the program name.
argv is an array of strings holding each command line argument -- including the program name in the first array element.
A simple program example:
#include
main (int argc, char **argv)
{ /* program to print arguments from command line */
int i;
printf(``argc = %d\n\n'',argc);
for (i=0;i
}
Assume it is compiled to run it as args.
So if we type:
args f1 ``f2'' f3 4 stop!
The output would be:
argc = 6
argv[0] = args
argv[1] = f1
argv[2] = f2
argv[3] = f3
argv[4] = 4
argv[5] = stop!
NOTE: argv[0] is program name.
argc counts program name
Embedded `` '' are ignored.
Blank spaces delimit end of arguments.
Put blanks in `` '' if needed.
Pointers to a Function
Pointer to a function are perhaps on of the more confusing uses of pointers in C. Pointers to functions are not as common as other pointer uses. However, one common use is in a passing pointers to a function as a parameter in a function call. (Yes this is getting confusing, hold on to your hats for a moment).
This is especially useful when alternative functions maybe used to perform similar tasks on data. You can pass the data and the function to be used to some control function for instance. As we will see shortly the C standard library provided some basic sorting ( qsort) and searching (bsearch) functions for free. You can easily embed your own functions.
To declare a pointer to a function do:
int (*pf) ();
This simply declares a pointer *pf to function that returns and int. No actual function is pointed to yet.
If we have a function int f() then we may simply (!!) write:
pf = &f;
For compiler prototyping to fully work it is better to have full function prototypes for the function and the pointer to a function:
int f(int);
int (*pf) (int) = &f;
Now f() returns an int and takes one int as a parameter.
You can do things like:
ans = f(5);
ans = pf(5);
which are equivalent.
The qsort standard library function is very useful function that is designed to sort an array by a key value of any type into ascending order, as long as the elements of the array are of fixed type.
qsort is prototyped in (stdlib.h):
void qsort(void *base, size_t num_elements, size_t element_size,
int (*compare)(void const *, void const *));
The argument base points to the array to be sorted, num_elements indicates how long the array is, element_size is the size in bytes of each array element and the final argument compare is a pointer to a function.
qsort calls the compare function which is user defined to compare the data when sorting. Note that qsort maintains it's data type independence by giving the comparison responsibility to the user. The compare function must return certain (integer) values according to the comparison result:
less than zero
: if first value is less than the second value
zero
: if first value is equal to the second value
greater than zero
: if first value is greater than the second value
Some quite complicated data structures can be sorted in this manner. For example, to sort the following structure by integer key:
typedef struct { int key;
struct other_data;
} Record;
We can write a compare function, record_compare:
int record\_compare(void const *a, void const *a)
{ return ( ((Record *)a)->key - ((Record *)b)->key );
}
Assuming that we have an array of array_length Records suitably filled with date we can call qsort like this:
qsort( array, arraylength, sizeof(Record), record_compare);
The C Preprocessor
Recall that preprocessing is the first step in the C program compilation stage -- this feature is unique to C compilers.
The preprocessor more or less provides its own language which can be a very powerful tool to the programmer. Recall that all preprocessor directives or commands begin with a #.
Use of the preprocessor is advantageous since it makes:
programs easier to develop,
easier to read,
easier to modify
C code more transportable between different machine architectures.
The preprocessor also lets us customise the language. For example to replace { ... } block statements delimiters by PASCAL like begin ... end we can do:
#define begin {
#define end }
During compilation all occurrences of begin and end get replaced by corresponding { or } and so the subsequent C compilation stage does not know any difference!!!.
Lets look at #define in more detail
#define
Use this to define constants or any macro substitution. Use as follows:
#define
For Example
#define FALSE 0
#define TRUE !FALSE
We can also define small ``functions'' using #define. For example max. of two variables:
#define max(A,B) ( (A) > (B) ? (A):(B))
? is the ternary operator in C.
Note: that this does not define a proper function max.
All it means that wherever we place max(C^,D^) the text gets replaced by the appropriate definition. [^ = any variable names - not necessarily C and D]
So if in our C code we typed something like:
x = max(q+r,s+t);
after preprocessing, if we were able to look at the code it would appear like this:
x = ( (q+r) > (r+s) ? (q+r) : (s+t));
Other examples of #define could be:
#define Deg_to_Rad(X) (X*M_PI/180.0)
/* converts degrees to radians, M_PI is the value
of pi and is defined in math.h library */
#define LEFT_SHIFT_8 <<8
NOTE: The last macro LEFT_SHIFT_8 is only
valid so long as replacement context is valid i.e.
x = y LEFT_SHIFT_8.
This commands undefined a macro. A macro must be undefined before being redefined to a different value.
#include
This directive includes a file into code.
It has two possible forms:
#include
or
#include ``file''
``file'' looks for a file in the current directory (where program was run from)
Included files usually contain C prototypes and declarations from header files and not (algorithmic) C code (SEE next Chapter for reasons)
#if -- Conditional inclusion
#if evaluates a constant integer expression. You always need a #endif to delimit end of statement.
We can have else etc. as well by using #else and #elif -- else if.
Another common use of #if is with:
#ifdef
-- if defined and
#ifndef
-- if not defined
These are useful for checking if macros are set -- perhaps from different program modules and header files.
For example, to set integer size for a portable C program between TurboC (on MSDOS) and Unix (or other) Operating systems. Recall that TurboC uses 16 bits/integer and UNIX 32 bits/integer.
Assume that if TurboC is running a macro TURBOC will be defined. So we just need to check for this:
#ifdef TURBOC
#define INT_SIZE 16
#else
#define INT_SIZE 32
#endif
As another example if running program on MSDOS machine we want to include file msdos.h otherwise a default.h file. A macro SYSTEM is set (by OS) to type of system so check for this:
#if SYSTEM == MSDOS
#include
#else
#include ``default.h''
#endif
Preprocessor Compiler Control
You can use the cc compiler to control what values are set or defined from the command line. This gives some flexibility in setting customised values and has some other useful functions. The
-D compiler option is used. For example:
cc -DLINELENGTH=80 prog.c -o prog
has the same effect as:
#define LINELENGTH 80
Note that any #define or #undef within the program (prog.c above) override command line settings.
You can also set a symbol without a value, for example:
cc -DDEBUG prog.c -o prog
Here the value is assumed to be 1.
The setting of such flags is useful, especially for debugging. You can put commands like:
#ifdef DEBUG
print("Debugging: Program Version 1\");
#else
print("Program Version 1 (Production)\");
#endif
Also since preprocessor command can be written anywhere in a C program you can filter out variables etc for printing etc. when debugging:
x = y *3;
#ifdef DEBUG
print("Debugging: Variables (x,y) = \",x,y);
#endif
The -E command line is worth mentioning just for academic reasons. It is not that practical a command. The -E command will force the compiler to stop after the preprocessing stage and output the current state of your program. Apart from being debugging aid for preprocessor commands and also as a useful initial learning tool (try this option out with some of the examples above) it is not that commonly used.
Other Preprocessor Commands
There are few other preprocessor directives available:
#error
text of error message -- generates an appropriate compiler error message. e.g
#ifdef OS_MSDOS
#include
#elifdef OS_UNIX
#include ``default.h''
#else
#error Wrong OS!!
#endif
# line
number "string" -- informs the preprocessor that the number is the next number of line of input. "string" is optional and names the next line of input. This is most often used with programs that translate other languages to C. For example, error messages produced by the C compiler can reference the file name and line numbers of the original source files instead of the intermediate C (translated) source files.
Input and Output (I/O):stdio.h
This chapter will look at many forms of I/O. We have briefly mentioned some forms before will look at these in much more detail here.
Your programs will need to include the standard I/O header file so do:
#include
Reporting Errors
Many times it is useful to report errors in a C program. The standard library perror() is an easy to use and convenient function. It is used in conjunction with errno and frequently on encountering an error you may wish to terminate your program early.
Whilst not strictly part of the stdio.h library we introduce the concept of errno and the function exit() here. We will meet these concepts in other parts of the Standard Library also.
perror()
The function perror() is prototyped by:
void perror(const char *message);
perror() produces a message (on standard error output -- see Section 17.2.1), describing the last error encountered, returned to errno (see below) during a call to a system or library function. The argument string message is printed first, then a colon and a blank, then the message and a newline. If message is a NULL pointer or points to a null string, the colon is not printed.
errno
errno is a special system variable that is set if a system call cannot perform its set task. It is defined in #include
To use errno in a C program it must be declared via:
extern int errno;
It can be manually reset within a C program (although this is uncommon practice) otherwise it simply retains its last value returned by a system call or library function.
exit()
The function exit() is prototyped in #include
void exit(int status)
Exit simply terminates the execution of a program and returns the exit status value to the operating system. The status value is used to indicate if the program has terminated properly:
it exist with a EXIT_SUCCESS value on successful termination
it exist with a EXIT_FAILURE value on unsuccessful termination.
On encountering an error you may frequently call an exit(EXIT_FAILURE) to terminate an errant program.
Streams
Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O.
A Stream is a file or a physical device (e.g. printer or monitor) which is manipulated with a pointer to the stream.
There exists an internal C data structure, FILE, which represents all streams and is defined in stdio.h. We simply need to refer to the FILE structure in C programs when performing I/O with streams.
We just need to declare a variable or pointer of this type in our programs.
We do not need to know any more specifics about this definition.
We must open a stream before doing any I/O,
then access it
and then close it.
Stream I/O is BUFFERED: That is to say a fixed ``chunk'' is read from or written to a file via some temporary storage area (the buffer). This is illustrated in Fig. 17.1. NOTE the file pointer actually points to this buffer.
Operating system c side
side
Buffer
Base of buffer(initial file pointer)
Storage device
e.g. file on disk
Fig. Stream I/O Model This leads to efficient I/O but beware: data written to a buffer does not appear in a file (or device) until the buffer is flushed or written out. (\n does this). Any abnormal exit of code can cause problems.
Predefined Streams
UNIX defines 3 predefined streams (in stdio.h):
stdin, stdout, stderr
They all use text a the method of I/O.
stdin and stdout can be used with files, programs, I/O devices such as keyboard, console, etc.. stderr always goes to the console or screen.
The console is the default for stdout and stderr. The keyboard is the default for stdin.
Predefined stream are automatically open.
Redirection
This how we override the UNIX default predefined I/O defaults.
This is not part of C but operating system dependent. We will do redirection from the command line.
> -- redirect stdout to a file.
So if we have a program, out, that usually prints to the screen then
out > file1
will send the output to a file, file1.
< -- redirect stdin from a file to a program.
So if we are expecting input from the keyboard for a program, in we can read similar input from a file
in < file2.
| -- pipe: puts stdout from one program to stdin of another
prog1 | prog2
e.g. Sent output (usually to console) of a program direct to printer:
out | lpr
Basic I/O
There are a couple of function that provide basic I/O facilities.
probably the most common are: getchar() and putchar(). They are defined and used as follows:
int getchar(void) -- reads a char from stdin
int putchar(char ch) -- writes a char to stdout, returns character written.
int ch;
ch = getchar();
(void) putchar((char) ch);
Related Functions:
int getc(FILE *stream),
int putc(char ch,FILE *stream)
Formatted I/O
We have seen examples of how C uses formatted I/O already. Let's look at this in more detail.
Printf
The function is defined as follows:
int printf(char *format, arg list ...) --
prints to stdout the list of arguments according specified format string. Returns number of characters printed.
The format string has 2 types of object:
ordinary characters -- these are copied to output.
conversion specifications -- denoted by % and listed in Table 17.1.
Table: Printf/scanf format characters
Format Spec (%) Type Result
C char single character
i,d int decimal number
O int octal number
x,X int hexadecimal number
lower/uppercase notation
U int unsigned int
S char * print string
terminated by \0
F double/float format -m.ddd...
e,E " Scientific Format
-1.23e002
g,G " e or f whichever
is most compact
% - print % character
Between % and format char we can put:
- (minus sign)
-- left justify.
integer number
-- field width.
m.d
-- m = field width, d = precision of number of digits after decimal point or number of chars from a string.
So:
printf("%-2.3f\n",17.23478);
The output on the screen is:
17.235
and:
printf("VAT=17.5%%\n");
...outputs:
VAT=17.5%
This function is defined as follows:
int scanf(char *format, args....) -- reads from stdin and puts input in address of variables specified in args list. Returns number of chars read.
Format control string similar to printf
Note: The ADDRESS of variable or a pointer to one is required by scanf.
scanf(``%d'',&i);
We can just give the name of an array or string to scanf since this corresponds to the start address of the array/string.
char string[80];
scanf(``%s'',string);
Files
Files are the most common form of a stream.
The first thing we must do is open a file. The function fopen() does this:
FILE *fopen(char *name, char *mode)
fopen returns a pointer to a FILE. The name string is the name of the file on disc that we wish to access. The mode string controls our type of access. If a file cannot be accessed for any reason a NULL pointer is returned.
Modes include: ``r'' -- read,
``w'' -- write and
``a'' -- append.
To open a file we must have a stream (file pointer) that points to a FILE structure.
So to open a file, called myfile.dat for reading we would do:
FILE *stream, *fopen();
/* declare a stream and prototype fopen */
stream = fopen(``myfile.dat'',``r'');
it is good practice to to check file is opened
correctly:
if ( (stream = fopen( ``myfile.dat'', ``r'')) == NULL)
{ printf(``Can't open %s\n'', ``myfile.dat'');
exit(1);
}
......
Reading and writing files
The functions fprintf and fscanf a commonly used to access files.
int fprintf(FILE *stream, char *format, args..)
int fscanf(FILE *stream, char *format, args..)
These are similar to printf and scanf except that data is read from the stream that must have been opened with fopen().
The stream pointer is automatically incremented with ALL file read/write functions. We do not have to worry about doing this.
char *string[80]
FILE *stream, *fopen();
if ( (stream = fopen(...)) != NULL)
fscanf(stream,``%s'', string);
Other functions for files:
int getc(FILE *stream), int fgetc(FILE *stream)
int putc(char ch, FILE *s), int fputc(char ch, FILE *s)
These are like getchar, putchar.
getc is defined as preprocessor MACRO in stdio.h. fgetc is a C library function. Both achieve the same result!!
fflush(FILE *stream) -- flushes a stream.
fclose(FILE *stream) -- closes a stream.
We can access predefined streams with fprintf etc.
fprintf(stderr,``Cannot Compute!!\n'');
fscanf(stdin,``%s'',string);
sprintf and sscanf
These are like fprintf and fscanf except they read/write to a string.
int sprintf(char *string, char *format, args..)
int sscanf(char *string, char *format, args..)
For Example:
float full_tank = 47.0; /* litres */
float miles = 300;
char miles_per_litre[80];
sprintf( miles_per_litre,``Miles per litre
= %2.3f'', miles/full_tank);
Stream Status Enquiries
There are a few useful stream enquiry functions, prototyped as follows:
int feof(FILE *stream);
int ferror(FILE *stream);
void clearerr(FILE *stream);
int fileno(FILE *stream);
Their use is relatively simple:
feof()
-- returns true if the stream is currently at the end of the file. So to read a stream,fp, line by line you could do:
while ( !feof(fp) )
fscanf(fp,"%s",line);
ferror()
-- reports on the error state of the stream and returns true if an error has occurred.
clearerr()
-- resets the error indication for a given stream.
fileno()
-- returns the integer file descriptor associated with the named stream.
Low Level I/O
This form of I/O is UNBUFFERED -- each read/write request results in accessing disk (or device) directly to fetch/put a specific number of bytes.
There are no formatting facilities -- we are dealing with bytes of information.
This means we are now using binary (and not text) files.
Instead of file pointers we use low level file handle or file descriptors
which give a unique integer number to identify each file.
To Open a file use:
int open(char *filename, int flag, int perms) -- this returns a file
descriptor or -1 for a fail.
The flag controls file access and has the following predefined in fcntl.h:
O_APPEND, O_CREAT, O_EXCL, O_RDONLY, O_RDWR, O_WRONLY + others
see online man pages or reference manuals.
perms -- best set to 0 for most of our applications.
The function:
creat(char *filename, int perms)
can also be used to create a file.
int close(int handle) -- close a file
int read(int handle, char *buffer,
unsigned length)
int write(int handle, char *buffer, unsigned length)
are used to read/write a specific number of bytes from/to a file (handle) stored or to be put in the memory location specified by buffer.
The sizeof() function is commonly used to specify the length.
read and write return the number of bytes read/written or -1 if they fail.
/* program to read a list of floats from a binary file */
/* first byte of file is an integer saying how many */
/* floats in file. Floats follow after it, File name got from */
/* command line */
#include
#include
float bigbuff[1000];
main(int argc, char **argv)
{ int fd;
int bytes_read;
int file_length;
if ( (fd = open(argv[1],O_RDONLY)) = -1)
{ /* error file not open */....
perror("Datafile");
exit(1);
}
if ( (bytes_read = read(fd,&file_length,
sizeof(int))) == -1)
{ /* error reading file */...
exit(1);
}
if ( file_length > 999 ) {/* file too big */ ....}
if ( (bytes_read = read(fd,bigbuff,
file_length*sizeof(float))) == -1)
{ /* error reading open */...
exit(1);
}
}
String Handling:
Recall from our discussion of arrays (Chapter 6) that strings are defined as an array of characters or a pointer to a portion of memory containing ASCII characters. A string in C is a sequence of zero or more characters followed by a NULL (\o)character:
NAME
0 49
It is important to preserve the NULL terminating character as it is how C defines and manages variable length strings. All the C standard library functions require this for successful operation.
In general, apart from some length-restricted functions ( strncat(), strncmp,() and strncpy()), unless you create strings by hand you should not encounter any such problems, . You should use the many useful string handling functions and not really need to get your hands dirty dismantling and assembling strings.
Basic String Handling Functions
All the string handling functions are prototyped in:
#include
The common functions are described below:
char *stpcpy (const char *dest,const char *src) --
Copy one string into another.
int strcmp(const char *string1,const char *string2) - Compare
string1 and string2 to determine alphabetic order.
char *strcpy(const char *string1,const char *string2) -- Copy string2 to stringl.
char *strerror(int errnum) -- Get error message corresponding to specified error number.
int strlen(const char *string) -- Determine the length of a string.
char *strncat(const char *string1, char *string2, size_t n) -- Append n characters from string2 to stringl.
int strncmp(const char *string1, char *string2, size_t n) -- Compare first n characters of two strings.
char *strncpy(const char *string1,const char *string2, size_t n) -- Copy first n characters of string2 to stringl .
int strcasecmp(const char *s1, const char *s2) -- case insensitive version of strcmp().
int strncasecmp(const char *s1, const char *s2, int n) -- case insensitive version of strncmp().
The use of most of the functions is straightforward, for example:
char *str1 = "HELLO";
char *str2;
int length;
length = strlen("HELLO"); /* length = 5 */
(void) strcpy(str2,str1);
Note that both strcat() and strcopy() both return a copy of their first argument which is the destination array. Note the order of the arguments is destination array followed by source array which is sometimes easy to get the wrong around when programming.
The strcmp() function lexically compares the two input strings and returns:
Less than zero
-- if string1 is lexically less than string2
Zero
-- if string1 and string2 are lexically equal
Greater than zero
-- if string1 is lexically greater than string2
This can also confuse beginners and experience programmers forget this too.
The strncat(), strncmp,() and strncpy() copy functions are string restricted version of their more general counterparts. They perform a similar task but only up to the first n characters. Note the the NULL terminated requirement may get violated when using these functions, for example:
char *str1 = "HELLO";
char *str2;
int length = 2;
(void) strcpy(str2,str1, length); /* str2 = "HE" */
str2 is NOT NULL TERMINATED!! -- BEWARE
String Searching
The library also provides several string searching functions:
char *strchr(const char *string, int c) -- Find first occurrence of character c in string.
char *strrchr(const char *string, int c) -- Find last occurrence of character c in string.
char *strstr(const char *s1, const char *s2) -- locates the first occurrence of the string s2 in string s1.
char *strpbrk(const char *s1, const char *s2) -- returns a pointer to the first occurrence in string s1 of any character from string s2, or a null pointer if no character from s2 exists in s1
size_t strspn(const char *s1, const char *s2) -- returns the number of characters at the begining of s1 that match s2.
size_t strcspn(const char *s1, const char *s2) -- returns the number of characters at the begining of s1 that do not match s2.
char *strtok(char *s1, const char *s2) -- break the string pointed to by s1 into a sequence of tokens, each of which is delimited by one or more characters from the string pointed to by s2.
char *strtok_r(char *s1, const char *s2, char **lasts) -- has the same functionality as strtok() except that a pointer to a string placeholder lasts must be supplied by the caller.
strchr() and strrchr() are the simplest to use, for example:
char *str1 = "Hello";
char *ans;
ans = strchr(str1,'l');
After this execution, ans points to the location str1 + 2
strpbrk() is a more general function that searches for the first occurrence of any of a group of characters, for example:
char *str1 = "Hello";
char *ans;
ans = strpbrk(str1,'aeiou');
Here, ans points to the location str1 + 1, the location of the first e.
strstr() returns a pointer to the specified search string or a null pointer if the string is not found. If s2 points to a string with zero length (that is, the string ""), the function returns s1. For example,
char *str1 = "Hello";
char *ans;
ans = strstr(str1,'lo');
will yield ans = str + 3.
strtok() is a little more complicated in operation. If the first argument is not NULL then the function finds the position of any of the second argument characters. However, the position is remembered and any subsequent calls to strtok() will start from this position if on these subsequent calls the first argument is NULL. For example, If we wish to break up the string str1 at each space and print each token on a new line we could do:
char *str1 = "Hello Big Boy";
char *t1;
for ( t1 = strtok(str1," ");
t1 != NULL;
t1 = strtok(NULL, " ") )
printf("%s\n",t1);
Here we use the for loop in a non-standard counting fashion:
The initialisation calls strtok() loads the function with the string str1
We terminate when t1 is NULL
We keep assigning tokens of str1 to t1 until termination by calling strtok() with a NULL first argument.
Character conversions and testing: ctype.h
We conclude this chapter with a related library #include
Character testing:
int isalnum(int c) -- True if c is alphanumeric.
int isalpha(int c) -- True if c is a letter.
int isascii(int c) -- True if c is ASCII .
int iscntrl(int c) -- True if c is a control character.
int isdigit(int c) -- True if c is a decimal digit
int isgraph(int c) -- True if c is a graphical character.
int islower(int c) -- True if c is a lowercase letter
int isprint(int c) -- True if c is a printable character
int ispunct (int c) -- True if c is a punctuation character.
int isspace(int c) -- True if c is a space character.
int isupper(int c) -- True if c is an uppercase letter.
int isxdigit(int c) -- True if c is a hexadecimal digit
Character Conversion:
int toascii(int c) -- Convert c to ASCII .
tolower(int c) -- Convert c to lowercase.
int toupper(int c) -- Convert c to uppercase.
The use of these functions is straightforward and we do not give examples here.
Memory Operations:
Finally we briefly overview some basic memory operations. Although not strictly string functions the functions are prototyped in #include
void *memchr (void *s, int c, size_t n) -- Search for a character in a buffer .
int memcmp (void *s1, void *s2, size_t n) -- Compare two buffers.
void *memcpy (void *dest, void *src, size_t n) -- Copy one buffer into another .
void *memmove (void *dest, void *src, size_t n) -- Move a number of bytes from one buffer lo another.
void *memset (void *s, int c, size_t n) -- Set all bytes of a buffer to a given character.
Their use is fairly straightforward and not dissimilar to comparable string operations (except the exact length (n) of the operations must be specified as there is no natural termination here).
Note that in all case to bytes of memory are copied. The sizeof() function comes in handy again here, for example:
char src[SIZE],dest[SIZE];
int isrc[SIZE],idest[SIZE];
memcpy(dest,src, SIZE); /* Copy chars (bytes) ok */
memcpy(idest,isrc, SIZE*sizeof(int)); /* Copy arrays of ints */
memmove() behaves in exactly the same way as memcpy() except that the source and destination locations may overlap.
memcmp() is similar to strcmp() except here unsigned bytes are compared and returns less than zero if s1 is less than s2 etc.