|Preliminary C++0x support|
In order to prepare a proposal for the C++ Standards committee, which should describe certain new and enhanced preprocessor features, the Wave preprocessor library has implemented experimental support for the following features:
Well defined token-pasting
Macro scoping mechanism
New alternative tokens
The described features are enabled by the --c++0x command line option of the Wave driver. Alternatively you can enable these features by calling the wave::context<>::set_language() function with the wave::support_cpp0x value.
Both variadic macros and placemarker tokens have already been added to C99 . This represents an unnecessary incompatibility between C and C++. Adding these facilities to the C++ preprocessor would cause no code to break that is currently well-defined and would closing the gap between C and C++ in this field.
Variadic macros were added to the C preprocessor as of C99 . They are, effectively, a way to pass a variable number of arguments to a macro. The specific syntax is as follows:
#define A(...) __VA_ARGS__ #define B(a, ...) __VA_ARGS__ A(1, 2, 3) // expands to: 1, 2, 3 B(1, 2, 3) // expands to: 2, 3
The ellipsis is used to denote that the macro can accept any number of trailing arguments. It must always occur as the last formal parameter of the macro. The variadic arguments passed to the macro are identified by the special symbol __VA_ARGS__ in the replacement list of a variadic macro. The use of this symbol is prohibited in any other context.
Placemarker tokens (technically, preprocessing tokens) are simply a well-defined way of passing "nothing" as a macro argument. This facility was also added to the C preprocessor as of C99 .
#define X(p) f(p) X("abc") // expands to: f("abc") X() // expands to: f() #define Y(a, b) int[a][b] Y(2, 2) // expands to: int Y(, 2) // expands to: int
Placemarker tokens are a natural counterpart to variadic macros. They formalize the optional nature of a variadic argument (or arguments) so that variadic macros appear similar to the variadic functions, but have been generalized to include named parameters as well.
Currently, as of both C++98 and C99, if token-pasting results in multiple preprocessing tokens, the behavior is undefined. For example,
#define PASTE(a, b) a ## b PASTE(1, 2) // okay PASTE(+, -) // undefined behavior
Token-pasting of unrelated tokens (i.e. token-pasting resulting in multiple preprocessing tokens) is currently undefined for no substantial reason. It is not dependent on architecture nor is it difficult for an implementation to diagnose. Furthermore, retokenization is what most, if not all, preprocessors already do and what most programmers already expect the preprocessor to do. Well-defined behavior is simply standardizing existing practice and removing an arbitrary and unnecessary undefined behavior from the standard.
To achieve well-defined behavior in this context Wave retokenizes the result of the token-pasting and inserts the newly created token sequence as the macro replacement text.
PASTE(+, ==) // expands to: += =
One of the major problems of the preprocessor is that macro definitions do not respect any of the scoping mechanisms of the core language. As history has shown, this is a major inconvenience and drastically increases the likelihood of name clashes within a translation unit. The solution is to add both a named and unnamed scoping mechanism to the C++ preprocessor. This limits the scope of macro definitions without limiting its accessibility. Here are the details.
The scoping mechanism is implemented with the help of three new preprocessor directives: #region, #endregion and #import. Additionally it changes minor details of some of the existing preprocessor directives: #ifdef, #ifndef and the operator defined().
Where <qualified-identifier> is an optionally qualified name defining the name of the region to open.
This name is optional. If the name is omitted a nameless region is opened.
If the qualified identifier starts with an '::' the name is looked up relative to the global scope (the <qualified-identifier> is called absolute), if it starts with an identifier, the region is looked up relative to the current open region (the <qualified-identifier> is called relative). If the specified region is not defined, it is created.
The #region directive is opaque for all macro definitions made outside this region, i.e. no macros defined inside of other regions or at the global scope are directly accessible from inside the opened region. To access such macros these must be imported (see the #import directive) or must be referred to through it's qualified name.
Regions may be nested.
A region may be re-opened (i.e. a #region directive with the same name is found at the same scope), and macros from the previous occurences of this region will be visible.
Region names and macro names of the same scope are stored into the same symbol table. This implies, that at one scope there shall not be defined a region and a macro with the same name.
Macros defined inside a nameless region may not be accessed from outside this region. Further, from inside a nameless region it is not allowed to open an enclosed region through an absolute name.
The argument of the #region directive is not subject to macro expansion before it is evaluated.
The following is a small code sample, which shows possible usages of preprocessor regions.
#define A() 1 ///////////////////////////////////// #region region_A # define B() 2 ///////////////////////////////////// # region region_B # define C() 3 A() // expands to: A() B() // expands to: B() C() // expands to: 3 # endregion // region_B ///////////////////////////////////// A() // expands to: A() B() // expands to: 2 C() // expands to: C() region_B::C() // expands to: 3 ::region_A::region_B::C() // expands to: 3 #endregion // region_A ///////////////////////////////////// A() // expands to: 1 B() // expands to: B() region_A::B() // expands to: 2 ::region_A::B() // expands to: 2 region_A::region_B::C() // expands to: 3 ::region_A::region_B::C() // expands to: 3 #define region_A ... // error, name clash with region_A #region A // error, name clash with macro A #endregion
The #endregion directive closes the last opened region. It is opaque for all macros defined inside the closed region. Macros from defined inside this region may be accessed from outside of this region only if imported (see the #import directive) or if used as qualified names specifying the region and the macro name and if the region isn't unnamed.
The #region and #endregion directives shall be balanced over the whole translation unit. Otherwise an error is raised.
#import <qualified-identifier> [, <qualified-identifier> ...]
Where <qualified-identifier> is an optionally qualified name defining the name of the region to open. The #import directive may specify one or more comma separated qualified names.
If the qualified identifier starts with an '::' the name is looked up relative to the global scope (the <qualified-identifier> is called absolute), if it starts with an identifier, the region is looked up relative to the current open region (the <qualified-identifier> is called relative).
If <qualified-identifier> refers to a macro, then the referenced macro definition is copied into the current region, just if it were defined here.
If <qualified-identifier> refers to a region, then
Imported macros may be undefined with the #undef
directive as usual. This removes the referenced macro from the current region,
but leaves it unchanged in the original region, where it was defined initially.
The argument of the #import directive is not subject to macro expansion before it is evaluated.
To fully support macro regions, the #ifdef and #ifndef directives and the operator defined() may be used with qualified identifiers as its arguments too. Therefor the following sample is completely wellformed (curtesy to Paul Mensonides):
# ifndef ::CHAOS_PREPROCESSOR::chaos::WSTRINGIZE_HPP # region ::CHAOS_PREPROCESSOR::chaos # # define WSTRINGIZE_HPP # # include <chaos/experimental/cat.hpp> # # // wstringize # # define wstringize(...) \ chaos::primitive_wstringize(__VA_ARGS__) \ /**/ # # // primitive_wstringize # # define primitive_wstringize(...) \ chaos::primitive_cat(L, #__VA_ARGS__) \ /**/ # # endregion # endif # import ::CHAOS_PREPROCESSOR chaos::wstringize(a,b,c) // L"a,b,c"
Vesa Karvonen recently suggested on the Boost mailing list the following addition to the preprocessor, which is implemented by Wave in C++0x mode.
Consider the following example:
#define ID(x) x ID( ( ) ID( a , b ) ID( ) )
The macro expansion of the above preprocessor code does not produce the intended result:
( a , b )
The basic idea is that the keywords __lparen__, __rparen__
and __comma__ could be used in place of '(',
')' and ',', respectively.
above example would now become:
#define ID(x) x ID( __lparen__ ) ID( a __comma__ b ) ID( __rparen__ )
and it would expand into:
__lparen__ a __comma__ b __rparen__
which would be recognized in translation phases after macro replacement as equivalent to the token sequence:
( a , b )
This trivial extension makes it an order of magnitude easier to generate C++ code using the C preprocessor.
Copyright © 2003 Hartmut Kaiser
Copyright © 2003 Paul Mensonides
Copyright © 2003 Vesa Karvonen
Permission to copy, use, modify, sell and distribute this document is granted provided this copyright notice appears in all copies. This document is provided "as is" without express or implied warranty, and with no claim as to its suitability for any purpose.