PpLexer

Generates tokens from a C or C++ translation unit.

TODO: Fix accidental token pasting. See: TestFromCppInternalsTokenspacing and, connected is: TODO: Set setPrevWs flag on the token where necessary.

TODO: Preprocessor statements in arguments of function like macros. Sect. 3.9 of cpp.pdf and existing MacroEnv tests.

exception cpip.core.PpLexer.ExceptionConditionalExpression

Exception when eval() conditional expressions.

exception cpip.core.PpLexer.ExceptionPpLexer

Exception when handling PpLexer object.

exception cpip.core.PpLexer.ExceptionPpLexerAlreadyGenerating

Exception when two generators are created then the internal state will become inconsistent.

exception cpip.core.PpLexer.ExceptionPpLexerCallStack

Exception when finding issues with the call stack or nested includes.

exception cpip.core.PpLexer.ExceptionPpLexerCallStackTooSmall

Exception when sys.getrecursionlimit() is too small.

exception cpip.core.PpLexer.ExceptionPpLexerCondLevelOutOfRange

Exception when handling a conditional token generation level.

exception cpip.core.PpLexer.ExceptionPpLexerDefine

Exception when loading predefined macro definitions.

exception cpip.core.PpLexer.ExceptionPpLexerNestedInclueLimit

Exception when nested #include limit exceeded.

exception cpip.core.PpLexer.ExceptionPpLexerNoFile

Exception when can not find file.

exception cpip.core.PpLexer.ExceptionPpLexerPreInclude

Exception when loading pre-include files.

exception cpip.core.PpLexer.ExceptionPpLexerPreIncludeIncNoCp

Exception when loading a pre-include file that has no current place (e.g. a StringIO object) and the pre-include then has an #include statement.

exception cpip.core.PpLexer.ExceptionPpLexerPredefine

Exception when loading predefined macro definitions.

cpip.core.PpLexer.PREPROCESSING_DIRECTIVES = ['if', 'ifdef', 'ifndef', 'elif', 'else', 'endif', 'include', 'define', 'undef', 'line', 'error', 'pragma']

Allowable preprocessing directives

class cpip.core.PpLexer.PpLexer(tuFileId, includeHandler, preIncFiles=None, diagnostic=None, pragmaHandler=None, stdPredefMacros=None, autoDefineDateTime=True, gccExtensions=False, annotateLineFile=False)

Create a translation unit tokeniser that applies ISO/IEC 9899:1999(E) Section 6 and/or ISO/IEC 14882:1998(E) section 16.

TODO: Set flags here rather than supplying them to a generator? This would make the API simply the ctor and ppTokens/next(). Flags would be:

incWs - Include whitespace tokens. condLevel - (0, 1, 2) thus:

0: No conditionally compiled tokens. The fileIncludeGraphRoot will
    not have any information about conditionally included files.

1: Conditionally compiled tokens are generated but not from
    conditionally included files. The fileIncludeGraphRoot will have
    a reference to a conditionally included file but not that
    included file's includes.

2: Conditionally compiled tokens including tokens from conditionally
    included files. The fileIncludeGraphRoot will have all the
    information about conditionally included files recursively.
CALL_STACK_DEPTH_ASSUMED_PPTOKENS = 10

Each include The call stack depth, D = A + B + C*L Where L is the number of levels of nested includes and A is the call stack A above:

CALL_STACK_DEPTH_FIRST_INCLUDE = 3

B above:

CALL_STACK_DEPTH_PER_INCLUDE = 3

C above:

COND_LEVEL_DEFAULT = 0

Conditianlity settings for token generation

COND_LEVEL_OPTIONS = range(0, 3)

Conditionality level (0, 1, 2)

MAX_INCLUDE_DEPTH = 200

The maximum value of nested #include’s

__init__(tuFileId, includeHandler, preIncFiles=None, diagnostic=None, pragmaHandler=None, stdPredefMacros=None, autoDefineDateTime=True, gccExtensions=False, annotateLineFile=False)

Constructor.

Parameters:
  • tuFileId (str) – A file ID that will be given to the include handler to find the translation unit. Typically this will be the file path (as a string) to the file that is the Initial Translation Unit (ITU) i.e. the file being preprocessed.
  • includeHandler (cpip.core.IncludeHandler.CppIncludeStdOs) – A handler to file #includ‘d files typically a IncludeHandler.IncludeHandlerStd. This might have user and system include path information and a means of resolving file references.
  • preIncFiles (list([_io.StringIO])) – An ordered list of file like objects that are pre-include files. These are processed in order before the ITU is processed. Macro redefinition rules apply.
  • diagnostic (NoneType) – A diagnostic object, defaults to a CppDiagnostic.PreprocessDiagnosticStd.
  • pragmaHandler (NoneType) –

    A handler for #pragma statements.

    This must have the attribute replaceTokens is to be implemented, if True then the tokens stream will be be macro replaced before being passed to the pragma handler.

    This must have a function pragma() defined that takes a non-zero length list of PpToken.PpToken the last of which will be a newline token. The tokens returned will be yielded.

  • stdPredefMacros (dict({})) –

    A dictionary of Standard pre-defined macros. See for example: ISO/IEC 9899:1999 (E) 6.10.8 Predefined macro names ISO/IEC 14882:1998 (E) 16.8 Predefined macro names N2800=08-0310 16.8 Predefined macro names

    The macros __DATE__ and __TIME__ will be automatically updated to current locale date/time (see autoDefineDateTime).

  • autoDefineDateTime (bool) – If True then the macros __DATE__ and __TIME__ will be automatically updated to current locale date/time. Mostly this is used for testing.
  • gccExtensions (bool) – Support GCC extensions. Currently just #include_next is supported.
  • annotateLineFile (bool) –

    If True then PpToken will output line number and file as cpp. For example:

    # 22 "/usr/include/stdio.h" 3 4
    # 59 "/usr/include/stdio.h" 3 4
    # 1 "/usr/include/sys/cdefs.h" 1 3 4
    
Returns:

NoneType

__weakref__

list of weak references to the object (if defined)

_appendTokenMergingWhitespace(theList, theToken)

Adds a token to the list merging whitespace if possible.

Parameters:
  • theList (list([]), list([cpip.core.PpToken.PpToken])) – List of tokens.
  • theToken (cpip.core.PpToken.PpToken) – The token to append, if whitespace and the last token on the list is also whitespace then this token will be merged into the last token on the list.
Returns:

NoneType

_countNonWsTokens(theTokS)

Returns the integer count of non-whitespace tokens in the given list.

Parameters:theTokS (list([cpip.core.PpToken.PpToken])) – List of tokens.
Returns:int – Count of non-whitespace tokens.
_cppDefine(theGen, theFlc)

Handles a define directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File line and column numbers.
Returns:

cpip.core.PpToken.PpToken – Yields resulting tokens.

_cppElif(theGen, theFlc)

Handles a elif directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppElse(theGen, theFlc)

Handles a else directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppEndif(theGen, theFlc)

Handles a endif directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppError(theGen, theFlc)

Handles a error directive.

_cppIf(theGen, theFlc)

Handles a if directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppIfdef(theGen, theFlc)

Handles a Ifdef directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppIfndef(theGen, theFlc)

Handles a ifndef directive.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

NoneType,cpip.core.PpToken.PpToken – The replacement token, a single whitespace.

_cppInclude(theGen, theFlc)

Handles an #include directive. This handles:

# include <h-char-sequence> new-line
# include "q-char-sequence" new-line

This gathers a list of PpTokens up to, and including, a newline with macro replacement. Then it reinterprets the list using cpip.core.PpTokeniser.PpTokeniser.reduceToksToHeaderName() to cast tokens to possible #include <header-name> token.

Finally we try and resolve that to a ‘file’ that can be included.

FWIW cpp.exe does not explore #include statements when they are conditional so will not error on unreachable files if they are conditionally included.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
Returns:

generator – <insert documentation for return values>

_cppIncludeGeneric(theGen, theFlc, theFileIncludeFunction)

Handles the target of an #include or #include_next directive. theFileIncludeFunction is the function to call to resolve the target to an actual file.

Parameters:
  • theGen (generator) – Token generator.
  • theFlc (cpip.core.FileLocation.FileLineCol([str, int, int])) – File, line, column.
  • theFileIncludeFunction (method) – Function to process include directive.
Returns:

NoneType,cpip.core.PpToken.PpToken – Yields tokens.

Raises:

StopIteration

_cppIncludeNext(theGen, theFlc)

Handles an #include_next GCC extension. This behaves in a very similar fashion to self._cppInclude but calls includeNextHeaderName() on the include handler

_cppIncludeReportError(theMsg=None)

Reports a consistent error message when #indlude is not processed and consumes all tokens up to and including the next newline.

_cppLine(theGen, theFlc)

Handles a line directive. This also handles ISO/IEC 9899:1999 (E) 6.10.4 Line control In particular 6.10.4-4 where the form is:

# line digit-sequence "s-char-sequenceopt" new-line

digit-sequence is a a token type pp-number.

The s-char-sequenceopt is a token type ‘string-literal’, this will have the double quote delimeters and may have a ‘L’ prefix. for example L”abc”.

_cppPragma(theGen, theFlc)

Handles a pragma directive. ISO/IEC 9899:1999 (E) 6.10.6 Pragma directive

Semantics:

1 A preprocessing directive of the form::
# pragma pp-tokensopt new-line

where the preprocessing token STDC does not immediately follow pragma in the directive (prior to any macro replacement)146) causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any such pragma that is not recognized by the implementation is ignored.

Footnote 146: An implementation is not required to perform macro replacement in pragmas, but it is permitted except for in standard pragmas (where STDC immediately follows pragma). If the result of macro replacement in a non-standard pragma has the same form as a standard pragma, the behavior is still implementation-defined; an implementation is permitted to behave as if it were the standard pragma, but is not required to.

_cppUndef(theGen, theFlc)

Handles a undef directive.

_cppWarning(theGen, theFlc)

Handles a warning directive. Not in the standard but we support it.

_diagnosticDebugMessage(theM)

Sends a message to the diagnostic object.

Parameters:theM (str) – The message
Returns:NoneType
_genPpTokensRecursive(theGen)

Given a token generator this applies the lexical rules and generates tokens. This means handling preprocessor directives and macro replacement.

With #include‘d files this become recursive.

Parameters:theGen (generator) – Token generator.
Returns:cpip.core.PpToken.PpToken – Yields tokens.
Raises:StopIteration
_genPreIncludeTokens()

Reads all the pre-include files and loads the macro environment.

Returns:NoneType
Raises:AttributeError, StopIteration
_lineFileAnnotation(flags)

Returns a list of PpTokens that represent the line number and file name. For example:

# 22 "/usr/include/stdio.h" 3 4
# 59 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4

Trailing numbers are described here: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html

'1' - This indicates the start of a new file.

'2' - This indicates returning to a file (after having included another file).

'3' - This indicates that the following text comes from a system header file, so certain warnings should be suppressed.

'4' - This indicates that the following text should be treated as being wrapped in an implicit extern "C" block.

We don’t support ‘4’

_nextNonWsOrNewline(theGen, theDiscardList=None)

Returns the next non-whitespace token or whitespace that contains a newline.XXX

Parameters:
  • theGen (generator) – Token generator.
  • theDiscardList (list([cpip.core.PpToken.PpToken])) – If theDiscardList is non-None intermediate tokens will be appended to it.
Returns:

cpip.core.PpToken.PpToken – Next non-whitespace token or whitespace that contains a newline.

_pptPop()

End a #included file.

Returns:NoneType
_pptPostPop()

Called immediately after _pptPop() this, optionally, returns a list of PpToken’s that can be yielded.

Returns:list([]) – Tokens to yield.
_pptPostPush()

Called immediately after _pptPush() this, optionally, returns a list of PpToken’s that can be yielded.

Returns:list([]) – Tokens to yield.
_pptPush(theFpo)

This takes a cpip.core.IncludeHandler.FilePathOrigin object and pushes it onto the FileIncludeStack which creates a PpTokneiser object on the stack.

This returns that PpTokeniser generator function.

Parameters:theFpo (cpip.core.IncludeHandler.FilePathOrigin([_io.StringIO, str, NoneType, str]), cpip.core.IncludeHandler.FilePathOrigin([_io.TextIOWrapper, str, str, str])) – FilePathOrigin.
Returns:generator – The cpip.core.PpTokeniser.PpTokeniser object.
_processCppDirective(theTtt, theGen)

Processes a token as a CPP directive. ISO/IEC ISO/IEC 14882:1998(E) 16 Preprocessing directives [cpp] This consumes tokens and generates others.

Parameters:
  • theTtt (cpip.core.PpToken.PpToken) – Current token.
  • theGen (generator) – Token generator.
Returns:

cpip.core.PpToken.PpToken – Yields resulting tokens.

Raises:

StopIteration

_reportSpuriousTokens(theCmd)

Reports the presence of spurious tokens in things like: #else spurious 1 ) tokens ... Used by #else and #endif which expect no semantically significant tokens to follow them. Typical cpp.exe behaviour: cpp.exe: <stdin>:3:7: warning: extra tokens at end of #else directive

_retDefineAndTokens(theGen)

Returns 1 or 0 if a macro is defined and the literal tokens as as string..

Parameters:theGen (generator) – Token generator.
Returns:tuple([int, str]) – (bool, literal tokens)
_retDefinedSubstitution(theGen)

Returns a list of tokens from the supplied argument with defined... and !defined... handled appropriately and other tokens expanded where appropriate.

This is used by #if, #elif.

Reporting conditional state, for example:

#define F(a) a % 2
#define X 5

What to say?     This?        Or?           Or?              Or?
#if F(X) == 1    F(X) == 1    F(5) == 1    (5 % 2) == 1      1 == 1
...
#else            !F(X) == 1   !F(5) == 1   !(5 % 2) == 1     !(1 == 1)
...
#endif

The current implementation takes the first as most useful: "F(X) == 1". This means capturing the original token stream as well as the (possibly replaced) evaluated token stream.

TODO: There is an issue here is with poorly specified #if/#elif statements For example:

#if deeeefined SPAM
cpp.exe: <stdin>:1:7: missing binary operator before token "SPAM"

#if 1 SPAM
cpp.exe: <stdin>:1:7: missing binary operator before token "SPAM"
Parameters:theGen (generator) – Token generator.
Returns:tuple([list([cpip.core.PpToken.PpToken]), list([cpip.core.PpToken.PpToken])]) – (Replacment tokens, raw tokens).
_retHeaderName(theGen)

This returns the first PpToken of type header-name it finds up to a newline token or None if none found. It handles:

# include <h-char-sequence> new-line
# include "q-char-sequence" new-line

This gathers a list of PpTokens up to, and including, a newline with macro replacement. Then it reinterprets the list using cpip.core.PpTokeniser.PpTokeniser.reduceToksToHeaderName() to cast tokens to possible #include header-name token.

Parameters:theGen (generator) – Token generator.
Returns:cpip.core.PpToken.PpToken – First token of header name.
_retIfEvalAndTokens(theGen)

Returns (bool | None, tokenStr) from processing a #if or #elif conditional statement. This also handles defined... and !defined...

bool - True/False based on the evaluation of the constant expression.
This will be None on evaluation failure.
tokenStr - A string of raw (original) PpTokens that made up the constant
expression.
Parameters:theGen (generator) – Token generator.
Returns:tuple([int, str]) – (bool, literal tokens)
_retListReplacedTokens(theTokS)

Takes a list of PpToken objects and returns a list of PpToken objects where macros are replaced in the current environment where possible. TODO: get pragma to use this.

_tokensToEol(theGen, macroReplace)

Returns a list of PpToken objects from a generator up to and including the first token that has a newline.

Parameters:
  • theGen (generator) – Token generator.
  • macroReplace (bool) – If macroReplace is True then macros are replaced with the current environment.
Returns:

list([cpip.core.PpToken.PpToken]) – List of consumed tokens.

colNum

Returns the current column number as an integer during processing.

condCompGraph

The conditional compilation graph as a cpip.core.CppCond.CppCondGraph object.

Returns:cpip.core.CppCond.CppCondGraph – The conditional compilation graph.
condState

The conditional state as (boolean, string).

currentFile

Returns the file ID on the top of the file stack.

definedMacros

Returns a string representing the currently defined macros.

fileIncludeGraphRoot

Returns the cpip.core.FileIncludeGraph.FileIncludeGraphRoot object.

Returns:cpip.core.FileIncludeGraph.FileIncludeGraphRoot – The file include graph root.
fileLineCol

Returns a FileLineCol object or None.

Returns:cpip.core.FileLocation.FileLineCol – File location as (str, int, int).
fileName

Returns the current file name during processing.

Returns:str – File name.
fileStack

Returns the file stack.

Returns:list([str]) – The file stack.
finalise()

Finalisation, may raise any Exception.

Returns:NoneType
Raises:Exception - Any exception.
includeDepth

Returns the integer depth of the include stack.

lineNum

Returns the current line number as an integer during processing or None.

Returns:NoneType,int – Line number.
macroEnvironment

The current Macro environment as a cpip.core.MacroEnv.MacroEnv object.

Caution

Write to this at your own risk. Your write might be ignored or cause undefined behaviour.

Returns:cpip.core.MacroEnv.MacroEnv – The macro environment.
ppTokens(incWs=True, minWs=False, condLevel=0)

A generator for providing a sequence of PpToken.PpToken in accordance with section 16 of ISO/IEC 14882:1998(E).

Parameters:
  • incWs (bool) – If True then also include all whitespace tokens.
  • minWs (bool) – If True then whitespace runs will be minimised to a single space or, if newline is in the whitespace run, a single newline.
  • condLevel (int) –

    If != 0 then conditionally compiled tokens will be yielded and they will have have tok.isCond == True. The fileIncludeGraphRoot will be marked up with the appropriate conditionality. Levels are:

    0: No conditionally compiled tokens. The fileIncludeGraphRoot will
    not have any information about conditionally included files.
    
    1: Conditionally compiled tokens are generated but not from
    conditionally included files. The fileIncludeGraphRoot will have
    a reference to a conditionally included file but not that
    included file's includes.
    
    2: Conditionally compiled tokens including tokens from conditionally
    included files. The fileIncludeGraphRoot will have all the
    information about conditionally included files recursively.
    

    (see _cppInclude where we check if self._condStack.isTrue():).

Returns:

cpip.core.PpToken.PpToken – Yields tokens.

Raises:

StopIteration

tuFileId

Returns the user supplied ID of the translation unit.

Returns:str – Translation unit ID.
cpip.core.PpLexer.UNNAMED_FILE_NAME = 'Unnamed Pre-include'

Used when file objects have no name