Preliminary C++0x support

In order to prepare a proposal for the C++ Standards committee, which should describe certain new and enhanced preprocessor features, the Wave preprocessor library has implemented experimental support for the following features:

Variadic macros
Placemarker tokens
Well defined token-pasting
Macro scoping mechanism
New alternative tokens

The described features are enabled by the --c++0x command line option of the Wave driver. Alternatively you can enable these features by calling the wave::context<>::set_language() function with the wave::support_cpp0x value.

Variadic macros

Both variadic macros and placemarker tokens have already been added to C99 [2]. This represents an unnecessary incompatibility between C and C++. Adding these facilities to the C++ preprocessor would cause no code to break that is currently well-defined and would closing the gap between C and C++ in this field.

Variadic macros were added to the C preprocessor as of C99 [2]. They are, effectively, a way to pass a variable number of arguments to a macro. The specific syntax is as follows:

    #define A(...) __VA_ARGS__
    #define B(a, ...) __VA_ARGS__
 
    A(1, 2, 3)     // expands to: 1, 2, 3
    B(1, 2, 3)     // expands to: 2, 3

The ellipsis is used to denote that the macro can accept any number of trailing arguments. It must always occur as the last formal parameter of the macro. The variadic arguments passed to the macro are identified by the special symbol __VA_ARGS__ in the replacement list of a variadic macro. The use of this symbol is prohibited in any other context.

Placemarker tokens

Placemarker tokens (technically, preprocessing tokens) are simply a well-defined way of passing "nothing" as a macro argument. This facility was also added to the C preprocessor as of C99 [2].

    #define X(p) f(p)
    X("abc")      // expands to: f("abc")
    X()           // expands to: f()
 
    #define Y(a, b) int[a][b]
    Y(2, 2)       // expands to: int[2][2]
    Y(, 2)        // expands to: int[][2]

Placemarker tokens are a natural counterpart to variadic macros. They formalize the optional nature of a variadic argument (or arguments) so that variadic macros appear similar to the variadic functions, but have been generalized to include named parameters as well.

Well defined token-pasting

Currently, as of both C++98 and C99, if token-pasting results in multiple preprocessing tokens, the behavior is undefined. For example,

    #define PASTE(a, b) a ## b
    PASTE(1, 2)   // okay
    PASTE(+, -)   // undefined behavior

Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is currently undefined for no substantial reason. It is not dependent on architecture nor is it difficult for an implementation to diagnose. Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do. Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the standard.

To achieve well-defined behavior in this context Wave retokenizes the result of the token-pasting and inserts the newly created token sequence as the macro replacement text.

    PASTE(+, ==)  // expands to: += =

Macro scoping mechanism

One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language. As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit. The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor. This limits the scope of macro definitions without limiting its accessibility. Here are the details.

The scoping mechanism is implemented with the help of three new preprocessor directives: #region, #endregion and #import. Additionally it changes minor details of some of the existing preprocessor directives: #ifdef, #ifndef and the operator defined().

The #region directive

The #region directive starts a new named or unnamed macro scope.

Syntax

#region <qualified-identifier>

Where <qualified-identifier> is an optionally qualified name defining the name of the region to open.
This name is optional. If the name is omitted a nameless region is opened.

If the qualified identifier starts with an '::' the name is looked up relative to the global scope (the <qualified-identifier> is called absolute), if it starts with an identifier, the region is looked up relative to the current open region (the <qualified-identifier> is called relative). If the specified region is not defined, it is created.

The #region directive is opaque for all macro definitions made outside this region, i.e. no macros defined inside of other regions or at the global scope are directly accessible from inside the opened region. To access such macros these must be imported (see the #import directive) or must be referred to through it's qualified name.

Regions may be nested.

A region may be re-opened (i.e. a #region directive with the same name is found at the same scope), and macros defined inside the previous occurences of this region will be visible.

Region names and macro names of the same scope are stored into the same symbol table. This implies, that at one scope there shall not be defined a region and a macro with the same name.

Macros defined inside a nameless region may not be accessed from outside this region. Further, from inside a nameless region it is not allowed to open an enclosed region through an absolute name.

The argument of the #region directive is not subject to macro expansion before it is evaluated.

The following is a small code sample, which shows possible usages of preprocessor regions.

    #define A() 1
    
    /////////////////////////////////////
    #region region_A
    # define B() 2
 
    /////////////////////////////////////
    # region region_B
    #  define C() 3
    A() // expands to: A()
    B() // expands to: B()
    C() // expands to: 3
    # endregion // region_B
    /////////////////////////////////////
 
    A() // expands to: A()
    B() // expands to: 2
    C() // expands to: C()
    region_B::C()             // expands to: 3
    ::region_A::region_B::C() // expands to: 3
    #endregion // region_A
    /////////////////////////////////////

    A() // expands to: 1
    B() // expands to: B()
    region_A::B()   // expands to: 2
    ::region_A::B() // expands to: 2
    region_A::region_B::C()   // expands to: 3
    ::region_A::region_B::C() // expands to: 3
 
    #define region_A ... // error, name clash with region_A
    #region A            // error, name clash with macro A
    #endregion

The #endregion directive

The #endregion directive closes the last macro scope opened with a #region directive .

Syntax

#endregion

The #endregion directive is opaque for all macros defined inside the closed region. Macros defined inside this region may be accessed from outside of this region only if imported (see the #import directive) or if referenced through qualified names specifying the region and the macro name and if the region isn't unnamed.

The #region and #endregion directives shall be balanced over the whole translation unit. Otherwise an error is raised.

The #import directive

The #import directive allows to import macros or whole macro scopes into the current macro scope.

Syntax

#import <qualified-identifier> [, <qualified-identifier> ...]

Where <qualified-identifier> is an optionally qualified name defining the name of the macro or region to import. The #import directive may specify one or more comma separated qualified names.

If the qualified identifier starts with an '::' the name is looked up relative to the global scope (the <qualified-identifier> is called absolute), if it starts with an identifier, the region is looked up relative to the current open region (the <qualified-identifier> is called relative).

If <qualified-identifier> refers to a macro, then the referenced macro definition is made available in the current region, just if it were defined here. Both macro definitions (the original macro definition and the imported one) refer to the same macro. This is significant for disabling of a certain macro during the rescanning of a replacement list. If one of the different instances of the macro definition is marked as disabled for expansion, the others are marked as disabled for expansion too.

If <qualified-identifier> refers to a region, then

Imported macros may be undefined with the #undef directive as usual. This removes the referenced macro from the current region, but leaves it unchanged in the original region, where it was defined initially.

The argument of the #import directive is not subject to macro expansion before it is evaluated.

Changes to the #ifdef, #ifndef directives and the operator defined()

To fully support macro regions, the #ifdef and #ifndef directives and the operator defined() may be used with qualified identifiers as its arguments too. Therefor the following sample is completely wellformed (courtesy to Paul Mensonides):

    # ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP
    # region ::CHAOS_PREPROCESSOR::chaos
    #
    # define WSTRINGIZE_HPP
    #
    # include <chaos/experimental/cat.hpp>
    #
    # // wstringize
    #
    # define wstringize(...) \
        chaos::primitive_wstringize(__VA_ARGS__) \
        /**/
    #
    # // primitive_wstringize
    #
    # define primitive_wstringize(...) \
        chaos::primitive_cat(L, #__VA_ARGS__) \
        /**/
    #
    # endregion
    # endif

    # import ::CHAOS_PREPROCESSOR
 
    chaos::wstringize(a,b,c) // L"a,b,c"

In the context of the #ifdef and #ifndef directives and the operator defined() a qualified macro name is considered to be defined if:

New alternative tokens

Vesa Karvonen recently suggested on the Boost mailing list the following addition to the preprocessor, which is implemented by Wave in C++0x mode.

Consider the following example:

    #define ID(x) x
    ID( (         )
    ID(   a , b   )
    ID(         ) )

The macro expansion of the above preprocessor code does not produce the intended result:

    ( a , b )

The basic idea is that the keywords __lparen__, __rparen__ and __comma__ could be used in place of '(', ')' and ',', respectively. The
above example would now become:

    #define ID(x) x
    ID( __lparen__                          )
    ID(            a __comma__ b            )
    ID(                          __rparen__ )

and it would expand into:

    __lparen__ a __comma__ b __rparen__

which would be recognized in translation phases after macro replacement as equivalent to the token sequence:

    ( a , b )

This trivial extension makes it an order of magnitude easier to generate C++ code using the C++ preprocessor.


Last updated: Monday, January 5, 2004 14:57