Preprocessor#

The previous chapter established that the preprocessor receives a stream of tokens from phase 3. Its output is also a stream of tokens, passed directly into phase 5.

The preprocessor is therefore purely a token manipulation tool. A source file with no preprocessor directives has a no-op phase 4: the token stream passes through unchanged.

What it can do falls into two categories:

  • Generate new tokens: inserting code the programmer didn’t write explicitly

  • Modify existing tokens: replace, remove, or transform them

What it cannot do is evaluate expressions: 1 + 3, sizeof(int), strlen("Hello")[1] are all resolved later, during phase 7. The preprocessor only sees tokens, never their values.

Interacting with the preprocessor is done by starting a line with the # character, followed by a preprocessing directive. (Any number of spaces can be present before and after the # character)

Directives#

File inclusion#

#include < filename >

Look for a file called filename in folders provided to the preprocessor[2] with the -I flag, and in standard folders configured at compiler installation. Once the file is found, its content is pasted verbatim in place of the #include line

#include " filename "

Same as above, but look into the current directory first

Source : cppreference

Note

No assumption is made about the content of the included file, it technically doesn’t have to be valid C, or even code at all…

Which directories does my compiler look into ?

To list the folders where your compiler’s preprocessor looks for files, you can execute the following command:

$(cc -print-prog-name=cpp) -v < /dev/null
Possible output#
$ `cc -print-prog-name=cpp` -v < /dev/null
 ...
 #include "..." search starts here:
 #include <...> search starts here:
  /usr/lib/gcc/x86_64-linux-gnu/11/include
  /usr/local/include
  /usr/include/x86_64-linux-gnu
  /usr/include
 ...
$ `cc -print-prog-name=cpp` -I ~/mylib/include -iquote ./include -v < /dev/null
 ...
 #include "..." search starts here:
  ./include
 #include <...> search starts here:
  /home/user/mylib/include
  /usr/lib/gcc/x86_64-linux-gnu/11/include
  /usr/local/include
  /usr/include/x86_64-linux-gnu
  /usr/include
 ...

Source: stack overflow

Macros#

Object-like#

#define identifier replacement

After this directive, each occurrence of identifier in the source code is replaced by replacement.

#define identifier

Equivalent to #define identifier 1

Function-like#

#define identifier(parameters) replacement

After this directive, each occurrence of identifier(values) in the source code is replaced by replacement, with each parameter name substituted by the corresponding value at the invocation site.

#define identifier(parameters, ...) replacement

Similar to the previous definition, but zero or more extra parameters can be supplied. The identifier __VA_ARGS__ will be replaced by those extra parameters. Additionally, __VA_OPT__(x) will be replaced by nothing if zero extra parameters were supplied, or by x if at least one extra parameter was supplied.

Source: cppreference

Conditional inclusion#

#if condition A #else B #endif

Evaluates condition (so at preprocessor-time), then replaces the whole #if#endif block with A or B depending on the result.

#ifdef MACRO

Equivalent to #if defined(MACRO)

#ifndef MACRO

Equivalent to #if !defined(MACRO)

#elif condition2 B #endif

An alternative form for chaining multiple conditions without nesting, equivalent to:

#else
#  if condition2
B
#  endif
#endif
#elifdef MACRO

Added in C23 for consistency, equivalent to #elif defined(MACRO)

#elifndef MACRO

Added in C23 for consistency, equivalent to #elif !defined(MACRO)

Source: cppreference

How is the condition evaluated ?

The #if block needs to be resolved at preprocessor-time, so its condition is evaluated with limited capabilities:

  • only integer literals and macros that evaluate to an integer literal can be used in the condition

  • all identifiers unknown to the preprocessor are replaced with 0

In other words: anything that is not a macro is replaced by 0, even if it has a value known at compile time (e.g. comparing to an enumerator is actually comparing to 0).

Danger

It means that typos are silently replaced by 0

The operators#

Both operators act directly on tokens: it’s the only unit the preprocessor works with.

#

Set token type to string literal

name name

##

Concatenate 2 tokens

some thing something

These operators can only be used on parameters of function-like macros.

Source: cppreference

Perspective#

The directives and operators above are a small set of low-level primitives: file inclusion, name substitution, token stringification, and token concatenation.

Yet because they operate before the language is parsed — on raw tokens, not on types, values, or scopes — they are unconstrained by what C itself allows. The preprocessor cannot change the language, but it can generate whatever C code is needed, making restrictions invisible at the source level.

With the full set of preprocessor tools catalogued, chapter 2 illustrates their use through existing macros; chapter 3 then applies them to construct a logging utility.