PpDefine

This handles definition, undefinition, redefintion, replacement and rescaning of macro declarations

It implements: ISO/IEC 9899:1999(E) section 6 (aka ‘C99’) and/or: ISO/IEC 14882:1998(E) section 16 (aka ‘C++98’)

exception cpip.core.PpDefine.ExceptionCpipDefine

Exception when handling PpDefine object.

exception cpip.core.PpDefine.ExceptionCpipDefineBadArguments

Exception when scanning an argument list for a function style macro fails. NOTE: This is only raised during replacement not during initialisation.

exception cpip.core.PpDefine.ExceptionCpipDefineBadWs

Exception when calling bad whitespace is in a define statement. See: ISO/IEC 9899:1999(E) Section 6.10-f and ISO/IEC 14882:1998(E) 16-2

exception cpip.core.PpDefine.ExceptionCpipDefineDupeId

Exception for a function-like macro has duplicates in the identifier-list.

exception cpip.core.PpDefine.ExceptionCpipDefineInit

Exception when creating PpDefine object fails.

exception cpip.core.PpDefine.ExceptionCpipDefineInitBadLine

Exception for a bad line number given as argument.

exception cpip.core.PpDefine.ExceptionCpipDefineInvalidCmp

Exception for a redefinition where the identifers are different.

exception cpip.core.PpDefine.ExceptionCpipDefineMissingWs

Exception when calling missing ws between identifier and replacement tokens.

See: ISO/IEC 9899:1999(E) Section 6.10.3-3 and ISO/IEC 14882:1998(E) Section ???

Note

The executable, cpp, says for #define PLUS+

src.h:1:13: warning: ISO C requires whitespace after the macro name
exception cpip.core.PpDefine.ExceptionCpipDefineReplace

Exception when replacing a macro definition fails.

class cpip.core.PpDefine.PpDefine(theTokGen, theFileId, theLine)

Represents a single #define directive and performs ISO/IECISO/IEC 9899:1999 (E) 6.10.3 Macro replacement.

theTokGen
A PpToken generator that is expected to generate pp-tokens that appear after the start of the #define directive from the first non-whitespace token onwards i.e. the __init__ will, itself, consume leading whitespace.
theFileId
A string that represents the file ID.
theLine
A positive integer that represents the line in theFile that the #define statement occurred.

Definition example, object-like macros:

[identifier, [replacement-list (opt)], new-line, ...]

Or function-like macros:

[
    identifier,
    lparen,
    [identifier-list(opt),],
    ')',
    replacement-list,
    new-line,
    ...
]

Note

No whitespace is allowed between the identifier and the lparen of function-like macros.

The identifier-list of parameters is stored as a list of names. The replacement-list is stored as a list of preprocessor tokens. Leading and trailing whitespace in the replacement list is removed to facilitate redefinition comparison.

CPP_CONCAT_OP = '##'

C standard definition of concatenation operator

CPP_STRINGIZE_OP = '#'

C standard definition of string’izing operator

IDENTIFIER_SEPERATOR = ','

C standard definition of identifier separator in function-like macros

INITIAL_REF_COUNT = 0

This is what the reference count is set to on construction

LPAREN = '('

C standard definition of left parenthesis

PLACEMARKER = None

Our representation of a placemarker token

RPAREN = ')'

C standard definition of right parenthesis

STRINGIZE_WHITESPACE_CHAR = ' '

Whitespace runs are replaced by a single space ISO/IEC 9899:1999 (E) 6.10.3.2-2

VARIABLE_ARGUMENT_IDENTIFIER = '...'

Variable argument (variadic) macro definitions

VARIABLE_ARGUMENT_SUBSTITUTE = '__VA_ARGS__'

Variable argument (variadic) macro substitution

_PpDefine__addTokenAndTypeToReplacementList(theTtt)

Adds a token and a token type to the replacement list. Runs of whitespace tokens are concatenated.

Parameters:theTtt (cpip.core.PpToken.PpToken) – Token.
Returns:NoneType
_PpDefine__logWarningHashHashHash()

Emit a warning to the log that # and ## are dangerous together.

__init__(theTokGen, theFileId, theLine)

Takes a preprocess token generator and creates a macro. The generator (e.g. a instance of PpTokeniser.next()) can generate pp-tokens that appear after the start of the #define directive from the first non-whitespace token onwards i.e. this __init__ will, itself, consume leading whitespace.

Definition example, object-like macros: [identifier, [replacement-list (opt)], new-line, ...]

Or function-like macros:

[
    identifier,
    lparen,
    [identifier-list(opt),
    ],
    ')',
    replacement-list,
    new-line,
    ...
]

NOTE: No whitespace is allowed between the identifier and the lparen of function-like macros.

The replacement-list is stored as a list of preprocessor tokens. The identifier-list is stored as a list of names. Leading and trailing whitespace in the replacement list is removed to facilitate redefinition comparison.

Parameters:
  • theTokGen (generator) – Token generator.
  • theFileId (str) – File ID such as the path.
  • theLine (int) – theLine is a positive integer that represents the line in theFile that the #define statement occurred. This must be >= 1
Returns:

NoneType

__weakref__

list of weak references to the object (if defined)

_appendArgIdentifier(theTok, theGenTok)

Appends the token text to the argument identifier list.

_appendToReplacementList(theGenTok)

Takes a token sequence up to a newline and assign it to the replacement-list. Leading and trailing whitespace is ignored.

TODO: Set setPrevWs flag where necessary.

Parameters:theGenTok (generator) – Token generator.
Returns:NoneType
_consumeAndRaise(theGen, theException)

Consumes all tokens up to and including the next newline then raises an exception. This is commonly used to get rid of bad token streams but allow the caller to catch the exception, report the error and continue.

_consumeNewline(theGen)

Consumes all tokens up to and including the next newline.

_cppStringize(theArgTokens)

Applies the ‘#’ operator to function style macros ISO/IEC ISO/IEC 14882:1998(E) 16.3.2 The # operator [cpp.stringize]

_ctorFunctionMacro(theGenTok)

Construct function type macros. [[identifier-list,] ,')', replacement-list, new-line, ...]

The identifier-list is not specified in the specification but there seems to be some disparity between the standards and cpp.exe. The relevant bits of the standards [C: ISO/IEC 9899:1999(E) 6.10.3-10 and -11 and C++: ISO/IEC 14882:1998(E) 16.3-9 (C++)] appear, to me, to suggest that left and right parenthesis are allowed in the identifier-list and that (,) is ignored. But cpp.exe will not accept that.

Playing with cpp -E it seems that it is a comma separated list where whitespace is ignored, nothing else is allowed. See unit tests testInitFunction_70(), 71 and 72. cpp.exe also is not so strict when it comes the the above sections. For example in this:

..code_block: c

#define FOO(a,b,c) a+b+c FOO (1,(2),3)

The whitespace between FOO and LPAREN is ignored and the replacement occurs.

_functionLikeReplacement(theArgMap)

Returns the replacement list where if a token is encountered that is a key in the map then the value in the map is inserted into the replacement list.

theArgMap is of the form returned by _retReplacementMap(). This also handles the '#' token i.e. [cpp.stringize] and '##' token i.e. [cpp.concat].

Returns a list of pairs i.e. [(token, token_type), ...]

TODO: Accidental token pasting #define f(x) =x= f(=) We want ‘= = =’ not ‘===’.

_isPlacemarker(theTok)

Returns True if the Token represents a PLACEMARKER token. This is the correct comparison operator can be used if self.PLACEMARKER is defined as None.

_nextNonWsOrNewline(theGen)

Returns the next non-whitespace token or whitespace that contains a newline.

Parameters:theGen (generator) – Token generator.
Returns:cpip.core.PpToken.PpToken – The next non-whitespace token or whitespace that contains a newline.
_objectLikeReplacement()

Returns the replacement list for an object like macro. This handles the ## token i.e. [cpp.concat].

Returns a list of pairs i.e. [(token, token_type), ...]

_retReplacementMap(theArgs)

Given a list of lists of (token, type) this returns a map of: {identifier : [replacement_token and token types, ...], ...}

For example for:

#define FOO(c,b,a) a+b+c
FOO(1+7,2,3)

i.e theArgs is (types are shown as text for clarity, in practice they would be enumerated):

[
    [
        PpToken.PpToken('1', 'pp-number'),
        PpToken.PpToken('+', 'preprocessing-op-or-punc'),
        PpToken.PpToken('7', 'pp-number')
    ],
    [
        PpToken.PpToken('2', 'pp-number'),
    ],
    [
        PpToken.PpToken('3', 'pp-number'),
    ],
]

Map would be:

{
    'a' : [
            PpToken.PpToken('3', 'pp-number'),
        ],
    'b' : [
            PpToken.PpToken('2', 'pp-number'),
        ],
    'c' : [
            PpToken.PpToken('1', 'pp-number'),
            PpToken.PpToken('+', 'preprocessing-op-or-punc'),
            PpToken.PpToken('7', 'pp-number')
        ],
}

Note that values that are placemarker tokens are PpDefine.PLACEMARKER.

For example:

#define FOO(a,b,c) a+b+c
FOO(,2,)

Generates:

{
    'a' : PpDefine.PLACEMARKER,
    'b' : [
            ('2', 'pp-number'),
        ]
    'c' : PpDefine.PLACEMARKER,
}

PERF: See TODO below.

TODO: Return a map of identifiers to indexes in the supplied argument as this will save making a copy of the argument tokens?

So:

#define FOO(c,b,a) a+b+c
FOO(1+7,2,3)

Would return a map of:

{
    'a' : 2,
    'b' : 1,
    'c' : 0,
}

And use index -1 for a placemarker token???:

#define FOO(a,b,c) a+b+c
FOO(,2,)

Generates:

{
    'a' : -1,
    'b' : 1
    'c' : -1,
}
_retToken(theGen)

Returns the next token object and increments the IR.

Parameters:theGen (generator) – Token generator.
Returns:cpip.core.PpToken.PpToken – The next token.
assertReplListIntegrity()

Tests that any identifier tokens in the replacement list are actually replaceable. This will raise an assertion failure if not. It is really an integrity tests to see if an external entity has grabbed a reference to the replacement list and set a token to be not replaceable.

Returns:NoneType
Raises:AssertionError
consumeFunctionPreamble(theGen)

This consumes tokens to the preamble of a Function style macro invocation. This really means consuming whitespace and the opening LPAREN.

This will return either:

  • None - Tokens including the leading LPAREN have been consumed.
  • List of (token, token_type) if the LPAREN is not found.

For example given this:

#define t(a) a+2
t   (21) - t  ;

For the first t this would consume '   (' and return None leaving the next token to be (‘21’, ‘pp-number’).

For the second t this would consume '  ;' and return:

[
    ('  ', 'whitespace'),
    (';',   'preprocessing-op-or-punc'),
]

This allows the MacroReplacementEnv to generate the correct result:

21 +2 - t ;
expandArguments

The flag that says whether arguments should be expanded. For object like macros this will be False. For function like macros this will be False if there is a stringize (#) or a token pasting operator (##). True otherwise.

fileId

The file ID given as an argument in the constructor.

Returns:str – File ID, path for example.
identifier

The macro identifier i.e. the name as a string.

Returns:str – Macro name.
incRefCount(theFileLineCol=None)

Increment the reference count. Typically callers do this when replacement is certain of in the event of definition testing

For example:

#ifdef SPAM or defined(SPAM) // etc.

Or if the macro is expanded e.g. #define SPAM_N_EGGS spam and eggs

The menu is SPAM_N_EGGS.

Parameters:theFileLineCol (cpip.core.FileLocation.FileLineCol([str, int, int])) – File location.
Returns:NoneType
isCurrentlyDefined

Returns True if the current instance is a valid definition i.e. it has not been #undef‘d.

Returns:bool – Has valid definition.
isObjectTypeMacro
Returns:bool – True if this is an object type macro and False if it is a function type macro.
isReferenced

Returns True if the reference count has been incremented since construction.

isSame(other)

Tests ‘sameness’. Returns: -1 if the identifiers are different. 1 if the identifiers are the same but redefinition is NOT allowed. 0 if the identifiers are the same but redefinition is allowed i.e. the macros are equivelent.

isValidRefefinition(other)

Returns True if this is a valid redefinition of other, False otherwise.

Will raise an ExceptionCpipDefineInvalidCmp if the identifiers are different.

Will raise an ExceptionCpipDefine if either is not currently defined.

From: ISO/IEC 9899:1999 (E) 6.10.3:

  1. Two replacement lists are identical if and only if the preprocessing
    tokens in both have the same number, ordering, spelling, and white-space separation, where all white-space separations are considered identical.
  2. An identifier currently defined as a macro without use of lparen
    (an object-like macro) may be redefined by another #define preprocessing directive provided that the second definition is an object-like macro definition and the two replacement lists are identical, otherwise the program is ill-formed.
  3. An identifier currently defined as a macro using lparen (a
    function-like macro) may be redefined by another #define preprocessing directive provided that the second definition is a function-like macro definition that has the same number and spelling of parameters, and the two replacement lists are identical, otherwise the program is ill-formed.

See also: ISO/IEC 14882:1998(E) 16.3 Macro replacement [cpp.replace]

line

The line number given as an argument in the constructor.

Returns:int – Line number.
parameters

The list of parameter names as strings for a function like macros or None if this is an object type Macro.

refCount

Returns the current reference count as an integer less its initial value on construction.

Returns:int – Reference count.
refFileLineColS

Returns the list of FileLineCol objects where this macro was referenced.

Returns:list([]),list([cpip.core.FileLocation.FileLineCol([str, int, int])]) – Places the macro was referenced.
replaceArgumentList(theArgList)

Given an list of arguments this does argument substitution and returns the replacement token list. The argument list is of the form given by retArgumentListTokens(). The caller must have replaced any macro invocations in theArgList before calling this method.

Note

For function style macros only.

replaceObjectStyleMacro()

Returns a list of [(token, token_type), ...] from the replacement of an object style macro.

replacementTokens

The list of zero or more replacement token as a list of PpToken.PpToken

Returns:list([]),list([cpip.core.PpToken.PpToken]) – Tokens.
replacements

The list of zero or more replacement tokens as strings.

retArgumentListTokens(theGen)

For a function macro this reads the tokens following a LPAREN and returns a list of arguments where each argument is a list of PpToken objects.

Thus this function returns a list of lists of PpToken.PpToken objects, for example given this:

#define f(x,y) ...
f(a,b)

This function, then passed a,b) returns:

[
    [
        PpToken.PpToken('a', 'identifier'),
    ],
    [
        PpToken.PpToken('b', 'identifier'),
    ],
]

And an invocation of: f(1(,)2,3) i.e. this gets passed via the generator "1(,)2,3)" and returns two argunments:

[
    [
        PpToken('1', 'pp-number'),
        PpToken('(', 'preprocessing-op-or-punc'),
        PpToken(',', 'preprocessing-op-or-punc'),
        PpToken(')', 'preprocessing-op-or-punc'),
        PpToken('2', 'pp-number'),
    ],
    [
        PpToken('3', 'pp-number'),
    ],
]

So this function supports two cases:

  1. Parsing function style macro declarations.
  2. Interpreting function style macro invocations where the argument list is subject to replacement before invoking the macro.

In the case that an argument is missing a PpDefine.PLACEMARKER token is inserted. For example:

#define FUNCTION_STYLE(a,b,c) ...
FUNCTION_STYLE(,2,3)

Gives:

[
    PpDefine.PLACEMARKER,
    [
        PpToken.PpToken('2',       'pp-number'),
    ],
    [
        PpToken.PpToken('3',       'pp-number'),
    ],
]

Placemarker tokens are not used if the macro is defined with no arguments. This might raise a ExceptionCpipDefineBadArguments if the list does not match the prototype or a StopIteration if the token list is too short. This ignores leading and trailing whitespace for each argument.

TODO: Raise an ExceptionCpipDefineBadArguments if there is a #define statement. e.g.:

#define f(x) x x
f (1
#undef f
#define f 2
f)
strIdentPlusParam()

Returns the identifier name and parameters if a function-like macro as a string.

Returns:str – Macro declaration..
strReplacements()

Returns the replacements tokens with minimised whitespace as a string.

Returns:str – The replacements tokens with minimised whitespace as a string.
tokenCounter

The PpTokenCount object that counts tokens that have been consumed from the input.

Returns:cpip.core.PpTokenCount.PpTokenCount – Token count.
tokensConsumed

The total number of tokens consumed by the class.

undef(theFileId, theLineNum)

Records this instance of a macro #undef‘d at a particular file and line number. May raise an ExceptionCpipDefine if already undefined or the line number is bad.

undefFileId

The file ID where this macro was undef’d or None.

undefLine

The line number where this macro was undef’d or None.