PpLexer¶
Generates tokens from a C or C++ translation unit.
TODO: Fix accidental token pasting. See: TestFromCppInternalsTokenspacing and, connected is: TODO: Set setPrevWs flag on the token where necessary.
TODO: Preprocessor statements in arguments of function like macros. Sect. 3.9 of cpp.pdf and existing MacroEnv tests.
-
exception
cpip.core.PpLexer.
ExceptionConditionalExpression
¶ Exception when eval() conditional expressions.
-
exception
cpip.core.PpLexer.
ExceptionPpLexer
¶ Exception when handling PpLexer object.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerAlreadyGenerating
¶ Exception when two generators are created then the internal state will become inconsistent.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerCallStack
¶ Exception when finding issues with the call stack or nested includes.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerCallStackTooSmall
¶ Exception when sys.getrecursionlimit() is too small.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerCondLevelOutOfRange
¶ Exception when handling a conditional token generation level.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerDefine
¶ Exception when loading predefined macro definitions.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerNestedInclueLimit
¶ Exception when nested #include limit exceeded.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerNoFile
¶ Exception when can not find file.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerPreInclude
¶ Exception when loading pre-include files.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerPreIncludeIncNoCp
¶ Exception when loading a pre-include file that has no current place (e.g. a StringIO object) and the pre-include then has an #include statement.
-
exception
cpip.core.PpLexer.
ExceptionPpLexerPredefine
¶ Exception when loading predefined macro definitions.
-
cpip.core.PpLexer.
PREPROCESSING_DIRECTIVES
= ['if', 'ifdef', 'ifndef', 'elif', 'else', 'endif', 'include', 'define', 'undef', 'line', 'error', 'pragma']¶ Allowable preprocessing directives
-
class
cpip.core.PpLexer.
PpLexer
(tuFileId, includeHandler, preIncFiles=None, diagnostic=None, pragmaHandler=None, stdPredefMacros=None, autoDefineDateTime=True, gccExtensions=False, annotateLineFile=False)¶ Create a translation unit tokeniser that applies ISO/IEC 9899:1999(E) Section 6 and/or ISO/IEC 14882:1998(E) section 16.
TODO: Set flags here rather than supplying them to a generator? This would make the API simply the ctor and ppTokens/next(). Flags would be:
incWs - Include whitespace tokens. condLevel - (0, 1, 2) thus:
0: No conditionally compiled tokens. The fileIncludeGraphRoot will not have any information about conditionally included files. 1: Conditionally compiled tokens are generated but not from conditionally included files. The fileIncludeGraphRoot will have a reference to a conditionally included file but not that included file's includes. 2: Conditionally compiled tokens including tokens from conditionally included files. The fileIncludeGraphRoot will have all the information about conditionally included files recursively.
-
CALL_STACK_DEPTH_ASSUMED_PPTOKENS
= 10¶ Each include The call stack depth, D = A + B + C*L Where L is the number of levels of nested includes and A is the call stack A above:
-
CALL_STACK_DEPTH_FIRST_INCLUDE
= 3¶ B above:
-
CALL_STACK_DEPTH_PER_INCLUDE
= 3¶ C above:
-
COND_LEVEL_DEFAULT
= 0¶ Conditianlity settings for token generation
-
COND_LEVEL_OPTIONS
= range(0, 3)¶ Conditionality level (0, 1, 2)
-
MAX_INCLUDE_DEPTH
= 200¶ The maximum value of nested #include’s
-
__init__
(tuFileId, includeHandler, preIncFiles=None, diagnostic=None, pragmaHandler=None, stdPredefMacros=None, autoDefineDateTime=True, gccExtensions=False, annotateLineFile=False)¶ Constructor.
Parameters: - tuFileId (
str
) – A file ID that will be given to the include handler to find the translation unit. Typically this will be the file path (as a string) to the file that is the Initial Translation Unit (ITU) i.e. the file being preprocessed. - includeHandler (
cpip.core.IncludeHandler.CppIncludeStdOs
) – A handler to file#includ
‘d files typically aIncludeHandler.IncludeHandlerStd
. This might have user and system include path information and a means of resolving file references. - preIncFiles (
list([_io.StringIO])
) – An ordered list of file like objects that are pre-include files. These are processed in order before the ITU is processed. Macro redefinition rules apply. - diagnostic (
NoneType
) – A diagnostic object, defaults to aCppDiagnostic.PreprocessDiagnosticStd
. - pragmaHandler (
NoneType
) –A handler for
#pragma
statements.This must have the attribute
replaceTokens
is to be implemented, if True then the tokens stream will be be macro replaced before being passed to the pragma handler.This must have a function
pragma()
defined that takes a non-zero length list ofPpToken.PpToken
the last of which will be a newline token. The tokens returned will be yielded. - stdPredefMacros (
dict({})
) –A dictionary of Standard pre-defined macros. See for example: ISO/IEC 9899:1999 (E) 6.10.8 Predefined macro names ISO/IEC 14882:1998 (E) 16.8 Predefined macro names N2800=08-0310 16.8 Predefined macro names
The macros
__DATE__
and__TIME__
will be automatically updated to current locale date/time (see autoDefineDateTime). - autoDefineDateTime (
bool
) – If True then the macros__DATE__
and__TIME__
will be automatically updated to current locale date/time. Mostly this is used for testing. - gccExtensions (
bool
) – Support GCC extensions. Currently just#include_next
is supported. - annotateLineFile (
bool
) –If True then PpToken will output line number and file as cpp. For example:
# 22 "/usr/include/stdio.h" 3 4 # 59 "/usr/include/stdio.h" 3 4 # 1 "/usr/include/sys/cdefs.h" 1 3 4
Returns: NoneType
- tuFileId (
-
__weakref__
¶ list of weak references to the object (if defined)
-
_appendTokenMergingWhitespace
(theList, theToken)¶ Adds a token to the list merging whitespace if possible.
Parameters: - theList (
list([]), list([cpip.core.PpToken.PpToken])
) – List of tokens. - theToken (
cpip.core.PpToken.PpToken
) – The token to append, if whitespace and the last token on the list is also whitespace then this token will be merged into the last token on the list.
Returns: NoneType
- theList (
-
_countNonWsTokens
(theTokS)¶ Returns the integer count of non-whitespace tokens in the given list.
Parameters: theTokS ( list([cpip.core.PpToken.PpToken])
) – List of tokens.Returns: int
– Count of non-whitespace tokens.
-
_cppDefine
(theGen, theFlc)¶ Handles a define directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File line and column numbers.
Returns: cpip.core.PpToken.PpToken
– Yields resulting tokens.- theGen (
-
_cppElif
(theGen, theFlc)¶ Handles a elif directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppElse
(theGen, theFlc)¶ Handles a else directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppEndif
(theGen, theFlc)¶ Handles a endif directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppError
(theGen, theFlc)¶ Handles a error directive.
-
_cppIf
(theGen, theFlc)¶ Handles a if directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppIfdef
(theGen, theFlc)¶ Handles a Ifdef directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppIfndef
(theGen, theFlc)¶ Handles a ifndef directive.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: NoneType,cpip.core.PpToken.PpToken
– The replacement token, a single whitespace.- theGen (
-
_cppInclude
(theGen, theFlc)¶ Handles an #include directive. This handles:
# include <h-char-sequence> new-line # include "q-char-sequence" new-line
This gathers a list of PpTokens up to, and including, a newline with macro replacement. Then it reinterprets the list using
cpip.core.PpTokeniser.PpTokeniser.reduceToksToHeaderName()
to cast tokens to possible#include <header-name>
token.Finally we try and resolve that to a ‘file’ that can be included.
FWIW cpp.exe does not explore
#include
statements when they are conditional so will not error on unreachable files if they are conditionally included.Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column.
Returns: generator
– <insert documentation for return values>- theGen (
-
_cppIncludeGeneric
(theGen, theFlc, theFileIncludeFunction)¶ Handles the target of an #include or #include_next directive. theFileIncludeFunction is the function to call to resolve the target to an actual file.
Parameters: - theGen (
generator
) – Token generator. - theFlc (
cpip.core.FileLocation.FileLineCol([str, int, int])
) – File, line, column. - theFileIncludeFunction (
method
) – Function to process include directive.
Returns: NoneType,cpip.core.PpToken.PpToken
– Yields tokens.Raises: StopIteration
- theGen (
-
_cppIncludeNext
(theGen, theFlc)¶ Handles an #include_next GCC extension. This behaves in a very similar fashion to self._cppInclude but calls includeNextHeaderName() on the include handler
-
_cppIncludeReportError
(theMsg=None)¶ Reports a consistent error message when #indlude is not processed and consumes all tokens up to and including the next newline.
-
_cppLine
(theGen, theFlc)¶ Handles a line directive. This also handles ISO/IEC 9899:1999 (E) 6.10.4 Line control In particular 6.10.4-4 where the form is:
# line digit-sequence "s-char-sequenceopt" new-line
digit-sequence is a a token type pp-number.
The s-char-sequenceopt is a token type ‘string-literal’, this will have the double quote delimeters and may have a ‘L’ prefix. for example L”abc”.
-
_cppPragma
(theGen, theFlc)¶ Handles a pragma directive. ISO/IEC 9899:1999 (E) 6.10.6 Pragma directive
Semantics:
- 1 A preprocessing directive of the form::
- # pragma pp-tokensopt new-line
where the preprocessing token STDC does not immediately follow pragma in the directive (prior to any macro replacement)146) causes the implementation to behave in an implementation-defined manner. The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any such pragma that is not recognized by the implementation is ignored.
Footnote 146: An implementation is not required to perform macro replacement in pragmas, but it is permitted except for in standard pragmas (where STDC immediately follows pragma). If the result of macro replacement in a non-standard pragma has the same form as a standard pragma, the behavior is still implementation-defined; an implementation is permitted to behave as if it were the standard pragma, but is not required to.
-
_cppUndef
(theGen, theFlc)¶ Handles a undef directive.
-
_cppWarning
(theGen, theFlc)¶ Handles a warning directive. Not in the standard but we support it.
-
_diagnosticDebugMessage
(theM)¶ Sends a message to the diagnostic object.
Parameters: theM ( str
) – The messageReturns: NoneType
-
_genPpTokensRecursive
(theGen)¶ Given a token generator this applies the lexical rules and generates tokens. This means handling preprocessor directives and macro replacement.
With
#include
‘d files this become recursive.Parameters: theGen ( generator
) – Token generator.Returns: cpip.core.PpToken.PpToken
– Yields tokens.Raises: StopIteration
-
_genPreIncludeTokens
()¶ Reads all the pre-include files and loads the macro environment.
Returns: NoneType
Raises: AttributeError, StopIteration
-
_lineFileAnnotation
(flags)¶ Returns a list of PpTokens that represent the line number and file name. For example:
# 22 "/usr/include/stdio.h" 3 4 # 59 "/usr/include/stdio.h" 3 4 # 1 "/usr/include/sys/cdefs.h" 1 3 4
Trailing numbers are described here: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html
'1'
- This indicates the start of a new file.'2'
- This indicates returning to a file (after having included another file).'3'
- This indicates that the following text comes from a system header file, so certain warnings should be suppressed.'4'
- This indicates that the following text should be treated as being wrapped in an implicitextern "C"
block.We don’t support ‘4’
-
_nextNonWsOrNewline
(theGen, theDiscardList=None)¶ Returns the next non-whitespace token or whitespace that contains a newline.XXX
Parameters: - theGen (
generator
) – Token generator. - theDiscardList (
list([cpip.core.PpToken.PpToken])
) – If theDiscardList is non-None intermediate tokens will be appended to it.
Returns: cpip.core.PpToken.PpToken
– Next non-whitespace token or whitespace that contains a newline.- theGen (
-
_pptPop
()¶ End a #included file.
Returns: NoneType
-
_pptPostPop
()¶ Called immediately after _pptPop() this, optionally, returns a list of PpToken’s that can be yielded.
Returns: list([])
– Tokens to yield.
-
_pptPostPush
()¶ Called immediately after _pptPush() this, optionally, returns a list of PpToken’s that can be yielded.
Returns: list([])
– Tokens to yield.
-
_pptPush
(theFpo)¶ This takes a
cpip.core.IncludeHandler.FilePathOrigin
object and pushes it onto the FileIncludeStack which creates a PpTokneiser object on the stack.This returns that PpTokeniser generator function.
Parameters: theFpo ( cpip.core.IncludeHandler.FilePathOrigin([_io.StringIO, str, NoneType, str]), cpip.core.IncludeHandler.FilePathOrigin([_io.TextIOWrapper, str, str, str])
) – FilePathOrigin.Returns: generator
– The cpip.core.PpTokeniser.PpTokeniser object.
-
_processCppDirective
(theTtt, theGen)¶ Processes a token as a CPP directive. ISO/IEC ISO/IEC 14882:1998(E) 16 Preprocessing directives [cpp] This consumes tokens and generates others.
Parameters: - theTtt (
cpip.core.PpToken.PpToken
) – Current token. - theGen (
generator
) – Token generator.
Returns: cpip.core.PpToken.PpToken
– Yields resulting tokens.Raises: StopIteration
- theTtt (
-
_reportSpuriousTokens
(theCmd)¶ Reports the presence of spurious tokens in things like: #else spurious 1 ) tokens ... Used by #else and #endif which expect no semantically significant tokens to follow them. Typical cpp.exe behaviour: cpp.exe: <stdin>:3:7: warning: extra tokens at end of #else directive
-
_retDefineAndTokens
(theGen)¶ Returns 1 or 0 if a macro is defined and the literal tokens as as string..
Parameters: theGen ( generator
) – Token generator.Returns: tuple([int, str])
– (bool, literal tokens)
-
_retDefinedSubstitution
(theGen)¶ Returns a list of tokens from the supplied argument with
defined...
and!defined...
handled appropriately and other tokens expanded where appropriate.This is used by
#if
,#elif
.Reporting conditional state, for example:
#define F(a) a % 2 #define X 5 What to say? This? Or? Or? Or? #if F(X) == 1 F(X) == 1 F(5) == 1 (5 % 2) == 1 1 == 1 ... #else !F(X) == 1 !F(5) == 1 !(5 % 2) == 1 !(1 == 1) ... #endif
The current implementation takes the first as most useful:
"F(X) == 1"
. This means capturing the original token stream as well as the (possibly replaced) evaluated token stream.TODO: There is an issue here is with poorly specified #if/#elif statements For example:
#if deeeefined SPAM cpp.exe: <stdin>:1:7: missing binary operator before token "SPAM" #if 1 SPAM cpp.exe: <stdin>:1:7: missing binary operator before token "SPAM"
Parameters: theGen ( generator
) – Token generator.Returns: tuple([list([cpip.core.PpToken.PpToken]), list([cpip.core.PpToken.PpToken])])
– (Replacment tokens, raw tokens).
-
_retHeaderName
(theGen)¶ This returns the first PpToken of type header-name it finds up to a newline token or None if none found. It handles:
# include <h-char-sequence> new-line # include "q-char-sequence" new-line
This gathers a list of PpTokens up to, and including, a newline with macro replacement. Then it reinterprets the list using
cpip.core.PpTokeniser.PpTokeniser.reduceToksToHeaderName()
to cast tokens to possible#include
header-name token.Parameters: theGen ( generator
) – Token generator.Returns: cpip.core.PpToken.PpToken
– First token of header name.
-
_retIfEvalAndTokens
(theGen)¶ Returns
(bool | None, tokenStr)
from processing a #if or #elif conditional statement. This also handles defined... and !defined...- bool - True/False based on the evaluation of the constant expression.
- This will be None on evaluation failure.
- tokenStr - A string of raw (original) PpTokens that made up the constant
- expression.
Parameters: theGen ( generator
) – Token generator.Returns: tuple([int, str])
– (bool, literal tokens)
-
_retListReplacedTokens
(theTokS)¶ Takes a list of PpToken objects and returns a list of PpToken objects where macros are replaced in the current environment where possible. TODO: get pragma to use this.
-
_tokensToEol
(theGen, macroReplace)¶ Returns a list of PpToken objects from a generator up to and including the first token that has a newline.
Parameters: - theGen (
generator
) – Token generator. - macroReplace (
bool
) – IfmacroReplace
isTrue
then macros are replaced with the current environment.
Returns: list([cpip.core.PpToken.PpToken])
– List of consumed tokens.- theGen (
-
colNum
¶ Returns the current column number as an integer during processing.
-
condCompGraph
¶ The conditional compilation graph as a
cpip.core.CppCond.CppCondGraph
object.Returns: cpip.core.CppCond.CppCondGraph
– The conditional compilation graph.
-
condState
¶ The conditional state as (boolean, string).
-
currentFile
¶ Returns the file ID on the top of the file stack.
-
definedMacros
¶ Returns a string representing the currently defined macros.
-
fileIncludeGraphRoot
¶ Returns the
cpip.core.FileIncludeGraph.FileIncludeGraphRoot
object.Returns: cpip.core.FileIncludeGraph.FileIncludeGraphRoot
– The file include graph root.
-
fileLineCol
¶ Returns a FileLineCol object or None.
Returns: cpip.core.FileLocation.FileLineCol
– File location as(str, int, int)
.
-
fileName
¶ Returns the current file name during processing.
Returns: str
– File name.
-
fileStack
¶ Returns the file stack.
Returns: list([str])
– The file stack.
-
finalise
()¶ Finalisation, may raise any Exception.
Returns: NoneType
Raises: Exception
- Any exception.
-
includeDepth
¶ Returns the integer depth of the include stack.
-
lineNum
¶ Returns the current line number as an integer during processing or None.
Returns: NoneType,int
– Line number.
-
macroEnvironment
¶ The current Macro environment as a
cpip.core.MacroEnv.MacroEnv
object.Caution
Write to this at your own risk. Your write might be ignored or cause undefined behaviour.
Returns: cpip.core.MacroEnv.MacroEnv
– The macro environment.
-
ppTokens
(incWs=True, minWs=False, condLevel=0)¶ A generator for providing a sequence of
PpToken.PpToken
in accordance with section 16 of ISO/IEC 14882:1998(E).Parameters: - incWs (
bool
) – IfTrue
then also include all whitespace tokens. - minWs (
bool
) – IfTrue
then whitespace runs will be minimised to a single space or, if newline is in the whitespace run, a single newline. - condLevel (
int
) –If != 0 then conditionally compiled tokens will be yielded and they will have have
tok.isCond == True
. The fileIncludeGraphRoot will be marked up with the appropriate conditionality. Levels are:0: No conditionally compiled tokens. The fileIncludeGraphRoot will not have any information about conditionally included files. 1: Conditionally compiled tokens are generated but not from conditionally included files. The fileIncludeGraphRoot will have a reference to a conditionally included file but not that included file's includes. 2: Conditionally compiled tokens including tokens from conditionally included files. The fileIncludeGraphRoot will have all the information about conditionally included files recursively.
(see _cppInclude where we check if self._condStack.isTrue():).
Returns: cpip.core.PpToken.PpToken
– Yields tokens.Raises: StopIteration
- incWs (
-
tuFileId
¶ Returns the user supplied ID of the translation unit.
Returns: str
– Translation unit ID.
-
-
cpip.core.PpLexer.
UNNAMED_FILE_NAME
= 'Unnamed Pre-include'¶ Used when file objects have no name