Welcome to CPIP’s documentation!

Contents:

CPIP

CPIP is a C/C++ Preprocessor implemented in Python. It faithfully records all aspects of preprocessing and can produce visualisations that make debugging preprocessing far easier.

Features

  • Conformant C/C++ preprocessor.
  • Gives programatic access to every preprocessing token and the state of the preprocessor at any point during preprocessing.
  • Top level tools such as CPIPMain.py can generate preprocessor visualisations from the command line.
  • Requires only Python 2.7 or 3.3+
  • Fully documented: https://cpip.readthedocs.io.
  • Free software: GNU General Public License v2

Installation

You can either clone the public repository:

$ git clone git://github.com/paulross/cpip

Or download the tarball:

$ curl -OL https://github.com/paulross/cpip/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

To run the tests:

$ python setup.py test

Visualising Preprocessing

The top level script CPIPMain.py acts like a preprocessor that generates HTML and SVG output for a source code file or directory. This output makes it easy to understand what the preprocessor is doing to your source.

Here is some of that output when preprocessing a single Linux kernel file cpu.c (complete output). The index.html page shows how CPIPMain.py was invoked [1], this has a link to to preprocessing pages for that file:

CPIPMain.py's index.html landing page.

This page has a single link that takes you to the landing page for the file cpu.c, at the top this links to other pages that visualise source code, #include dependencies, conditional compilation and macros:

CPIP landing page after preprocessing cpu.c from the Linux kernel.

Lower down this page is a table of files that were involved in preprocessing:

CPIP landing page after preprocessing cpu.c from the Linux kernel.

Visualising the Source Code

From the cpu.c landing page the link “Original Source” takes you to a syntax highlighted page of the original source of cpu.c.

Annotated source code of cpu.c

The cpu.c landing page link “Translation Unit” takes you to a page that shows the complete translation unit of cpu.c (i.e. incorporating all the #include files). This page is annotated so that you can understand what part of the translation unit comes from which file.

Annotated translation unit produced by cpu.c

Visualising the #include Dependencies

The cpu.c landing page link “Normal [SVG]” takes you to a page that shows the dependencies created by #include directives. This is a very rich page that represents a tree with the root at center left. #include‘s are in order from top to bottom. Each block represents a file, the size is proportional to the number of preprocessing tokens.

Example of the file stack pop-up in the SVG include graph.

Zooming in with the controls at the top gives more detail. If the box is coloured cyan it is because the file does not add any content to the translation unit, usually because of conditional compilation:

Example of the file stack pop-up in the SVG include graph.

The page is dynamic and hovering over various areas provides more information:

How and Why the File was Included

Hovering just to the left of the file box produces a popup that explains how the file inclusion process worked for this file, it has the following fields:

  • Inc: The filename and line number of the #include directive.
  • As: The conditional compilation state at the point of the #include directive.
  • How: The text of the #include directive followed by the directory that this file was found in, this directory is prefixed by sys= for a system include and usr= for a user include.
How the file got included

Hovering over the filename above the file box shows the file stack (children are below parents).

Example of the file stack pop-up in the SVG include graph.

This plot can also tell you what types of preprocessor tokens were processed for each file. The coloured bars on the left of the file box indicate the proportion of preprocessing token types, the left is the file on its own, the right is the file and its child files. To understand the legend hover over those bars:

Legend for preprocessing token types.

To see the actual count of preprocessing tokens hover over the file box:

Count of preprocessing token types.

Visualising Conditional Compilation

The preprocessor is also responsible for handling conditional compilation which becomes very complicated for large projects. CPIPMain.py produces a succinct representation showing only the conditional directives. The links in each comment takes you to the syntax highlighted page for that file.

Conditional compilation in the translation unit.

Understanding Macros

CPIP tracks every macro definition and usage and CPIPMain.py produces a page that describes all the macros encountered:

The top of the macro page with down page links to details of each macro.

Each link on the page takes you to a description of the macro containing:

  • The macro name, how many times it was referenced and whether it is still defined at the end of preprocessing.
  • The verbatim macro definition (rewritten over several lines for long macros).
  • File name and line number of definition, linked.
  • Places that the macro was used, directly or indirectly. This is a table of file paths with links to the use point.
  • Dependencies, two way:
    • Macros that this macro invokes.
    • Macros that invoke this macro.
Macro BITMAP_LAST_WORD_MASK details: definition, where defined, where used and two way dependencies.

Status

https://img.shields.io/pypi/v/cpip.svg https://img.shields.io/travis/paulross/cpip.svg Documentation Status Updates

Licence

CPIP is a C/C++ Preprocessor implemented in Python. Copyright (C) 2008-2017 Paul Ross

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Also many thanks to SourceForge that hosted this project for many years.

Footnotes

[1]This was invoked by:
$ python3 CPIPMain.py -kp -l20 -o ../../output/linux/cpu -S __STDC__=1 -D __KERNEL__ -D __EXPORTED_HEADERS__ -D BITS_PER_LONG=64 -D CONFIG_HZ=100 -D __x86_64__ -D __GNUC__=4 -D __has_feature(x)=0 -D __has_extension=__has_feature -D __has_attribute=__has_feature -D __has_include=__has_feature -P ~/dev/linux/linux-3.13/include/linux/kconfig.h -J /usr/include/ -J /usr/include/c++/4.2.1/ -J /usr/include/c++/4.2.1/tr1/ -J /Users/paulross/dev/linux/linux-3.13/include/ -J /Users/paulross/dev/linux/linux-3.13/include/uapi/ -J ~/dev/linux/linux-3.13/arch/x86/include/uapi/ -J ~/dev/linux/linux-3.13/arch/x86/include/ -J ~/dev/linux/linux-3.13/arch/x86/include/generated/ ~/dev/linux/linux-3.13/kernel/cpu.c

Installation

CPIP has been tested with Python 2.7 and 3.3 to 3.6. CPIP used to run just fine on Windows but I haven’t had a recent opportunity (or reason) to test CPIP on a Windows box.

First make a virtual environment in your <PYTHONVENVS>, say ~/pyvenvs:

$ python3 -m venv <PYTHONVENVS>/CPIP
$ . <PYTHONVENVS>/CPIP/bin/activate
(CPIP) $

Stable release

To install cpip, run this command in your terminal:

(CPIP) $ pip install cpip

This is the preferred method to install cpip, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for cpip can be downloaded from the Github repo.

You can either clone the public repository:

(CPIP) $ git clone git://github.com/paulross/cpip

Or download the tarball:

(CPIP) $ curl -OL https://github.com/paulross/cpip/tarball/master

Once you have a copy of the source, you can install it with:

(CPIP) $ python setup.py install

Install the test dependencies and run CPIP’s tests:

(CPIP) $ pip install pytest
(CPIP) $ pip install pytest-runner
(CPIP) $ python setup.py test

Developing with CPIP

If you are developing with CPIP you need test coverage and documentation tools.

Test Coverage

Install pytest-cov:

(CPIP) $ pip install pytest-cov

The most meaningful invocation that elimates the top level tools is:

(CPIP) $ pytest --cov=cpip.core --cov=cpip.plot --cov=cpip.util --cov-report html tests/

Documentation

If you want to build the documentation you need to:

(CPIP) $ pip install Sphinx
(CPIP) $ cd docs
(CPIP) $ make html

The landing page is docs/_build/html/index.html.

Testing the Demo Code

See the PpLexer Tutorial for an example of running a CPIP PpLexer on the demonstration code. This gives the core CPIP software a good workout.

CPIP Introduction

CPIP is a C/C++ pre-processor implemented in Python. Most pre-processors regard pre-processing as a dirty job that just has to be done as soon as possible. This can make it very hard to track down subtle defects at the pre-processing stage as pre-processors throw away a lot of useful information in favor of getting the result as cheaply as possible.

Few developers really understand pre-processing, to many it is an obscure bit of black magic. CPIP aims to improve that and by recording every detail of preprocessing so CPIP can can produce some wonderfully visual information about file dependencies, macro usage and so on.

CPIP is not designed to be a replacement for cpp (or any other established pre-processor), instead CPIP regards clarity and understanding as more important than speed of processing.

CPIP takes its standard as C99 or, more formally, ISO/IEC 9899:1999 (E) [1].

Pre-processing C and C++

The basic task of any preprocessor is to produce a Translation Unit for a compiler to work with. To do this the pre-processor has to do three inter-related tasks [2]:

  • File inclusion i.e. responding to #include commands.
  • Conditional Compilation #if, #ifdef etc.
  • Macro definition and replacement.

File Inclusion

CPIP supports file inclusion just like any other pre-processor. In fact it goes further as CPIP recognises that whilst the C99 standard (and any other standard) specifies the syntax of the #include statement it leaves it as implementation defined how the file is located.

CPIP provides a reference implementation that behaves as CPP/RVCT/LLVM behave. CPIP also allows users to construct their own include handlers that obtain files from, for example, a URL or database.

Conditional Compilation

CPIP supports all conditional compilation statements. What is more CPIP can generate a conditionally compiled view of the source code which makes it much easier to see what part of the code is active.

Macro Replacement

CPIP supports macro replacement according to C99, CPIP keeps track of where macros were defined (and undefined) and where they were either tested (by an #if statement for example) or used in a substitution. All this information is available using public APIs after CPIP has finished processing a Translation Unit.

Macros represent one of the most complicated parts of preprocessing, it seems simple doesn’t it? But consider this source code:

#define f(a) a*g
#define g(a) f(a)
f(2)(9)

What is the result of the last statement?

It is either:

2*f(9)

Or:

2*9*g

Which is it? Puzzled? Well the C standards body responded thus:

“The C89 Committee intentionally left this behavior ambiguous as it saw no useful purpose in specifying all the quirks of preprocessing for such questionably useful constructs.” [3]

So any pre-processor implementation could produce either result at any time and it would still be a compliant implementation.

CPIP and Pre-processing

CPIP is capable of doing all these aspects of preprocessing and it produces a Translation Unit just like any other pre-processor.

What makes CPIP unique is that it retains all pre-processing information discovered along the way and can present it in many ways. CPIP provides a number of interfaces to that information, not least:

  • A command line tool that acts as a C/C++ pre-processor but produces all sorts of wonderful information about pre-processing: CPIPMain.py Examples.
  • A Python interface to pre-processing via the PpLexer, see the PpLexer Tutorial. If you want to construct your own pre-processor or understand a specific aspect of preprocessing then this is for you.

CPIP Core Architecture

CPIP provides a set of Command Line Tools built round a core of Python code. The architecture of this core code is illustrated below, at the heart of it is the PpLexer. The user interacts with this in two ways:

  • Constructing a PpLexer with the following:
    • A file-like object that represents the Initial Translation Unit i.e.the file to be pre-processed.
    • Any pre-include files.
    • An include handler that manages #include statements.
    • Optionally: a CppDiagnostic to handle error conditions.
    • Optionally: a Pragmahandler to handle #pragma statements.
  • Processing the file (and its #include‘s) token by token.

For the PpLexer its construction is fairly straightforward; it just takes a reference to the user supplied objects.

Processing the ITU is a more serious matter. The PpLexer uses a PpTokeniser to generate pre-processing tokens (shown in yellow below) according to translation phases one to three. The PpTokeniser also keeps track of logical to physical file location.

Depending on the parser state the PpLexer may/may not pass the token to various internal objects (shown in purple below) that keep track of:

  • File inclusion.
  • Conditional compilation.
  • Macro Environment.

The resulting token (if any) after that processing is yielded to the user.

An extremely useful feature of CPIP is that the PpLexer maintains all these data structures and provides an interface to them for the user. Some examples of what can be done with this information is here: CPIPMain.py Examples.

CPIP Architecture.

Footnotes

[1]Other standards are of interest: “C++98” [ISO/IEC 14882:1998(E)] describes more limited pre-processing (no variadic macros for example). “C++11” [ISO/IEC JTC 1/SC 22 N 4411 in draft] and C++14 does not substantially change this. In any case CPIP attempts to emulate common custom and practice (yes, including variadic macros).
[2]Of course the preprocessor has to do many other minor tasks such as replacing trigraphs and removing comments.
[3]Rationale for International Standard - Programming Languages - C Revision 5.10 April-2003 Sect. 6.10.3.4

CPIPMain.py Examples

Screenshots

This section shows some screenshots of CPIPMain.py‘s output. Some Real Examples are shown below.

CPIPMain.py produces a set of HTML and SVG pages for each source file preprocessed. As well as the Translation Unit CPIPMain.py generates information about the three important tasks for a preprocessor: file inclusion, conditional compilation and macro replacement.

Home Page

The index.html shows the list of files preprocessed in this pass (linked to file specific pages).

It also shows the command line used and an explanation from the CPIPMain.py help system as to what each option means.

For example:

The overall SVG diagram.

Each in the list of files preprocessed in this pass is linked to file specific page.

Preprocessed File Specific Page

These describe the results of preprocessing a single file, it contains links to:

  1. The source and the translation unit as HTML.
  2. The results of file inclusion.
  3. The results of conditional compilation.
  4. Macro processing results.
  5. The total token count.
  6. What files were included and how many times.

The top of the page includes links to these sections (described in detail below):

The overall SVG diagram.

Further down the page is a table showing what files were included, from where and how many times:

The overall SVG diagram.

Here is an explanation for the table:

Files used.
Original File and the Translation Unit
Original File

All processed source code (original file and and included files) is presented as syntax highlighted HTML.

The syntax is the C pre-preprocessor language. Macro names are linked to their definition in the Macro Definitions page.

A individual file in HTML.
Translation Unit

The preprocessed file and all its #include‘s become a Translation Unit which CPIPMain.py represents as an HTML page.

Each #include statement is represented in a nested fashion, any source code in the translation unit is presented syntax highlighted. The syntax is, of course, the C pre-processor language thus both typedef and char are pre-processor identifiers even if later on typedef is seen as a C keyword.

The numbered links thus [       19] are to an HTML representation of the orignal source code file/line.

The other navigational element present is when the file path is the file being pre-processed a forward link is there to to the next part of this file, thus skipping over intermediate #include‘d code.

The complete translation unit represented in HTML.

Further down you can see the actual code from cpu.c, notice the macro expansion on line 76.

The complete translation unit represented in HTML.
The SVG Include Graph

The file specific page offers a link to an SVG visualisation of the file include graph.

The Overall Picture

The diagram represents a tree with the root (the file being preprocessed) at center left. Each node represents a file and each edge represents an #include directive. Increasing include depth is left-to-right and source code order (i.e. order of the #include directives) is top to bottom.

At the top are various zoom factors that you can use to view the graph, initially the page opens at the smallest scale factor to give you an impression of what is going on:

The overall SVG diagram for cpu.c.
A Detailed Look

Zooming in to 100% on one part of the graph gives a wealth of information. In this picture the processor.h file is represented on the left and the files that it #include‘s to its right.:

The overall SVG diagram for cpu.c.

Each file is represented by a fixed width block, the height is proportional to the number of preprocessing tokens produced by a file (and its #include‘s) [1]. Cyan coloured blocks represent files that are included but contain no effective content, usually because it has already been included and the header guards use conditional compilation to prevent preprocessing more than once (types.h for example).

The ‘V’ symbol in the block represents the relative size of the file and its descendants, if the ‘V’ touches top and bottom then all the tokens come from this file (personality.h for example). Where the ‘V’ is closed, or almost so, it means the bulk of the tokens are coming from the descendent includes (msr.h for example).

The coloured bars on the left represent the count of different token types, the left bar being the current file, the right bar being the total of the descendants. See below for which Token Types corespond to each colour.

Many parts of this diagram can display additional information when moving the mouse over various bits of the file block.

File Path

For example mousing over the file name above the box shows the the absolute path of the file stack as a pop-up yellow block. At the top of this list is the file we are preprocessing, then the stack of included files downwards to processor.h:

The result of mousing over the file name.
How it was Included?

Moving the mouse over to the left of the block reveals a host of information about the file inclusion process:

The result of mousing over the left hand ``?``.

This pop-up yellow block contains the following:

  • Where: Where this was included from. This file is included from line 22 of the arch/x86/include/asm/thread_info.h file.
  • Why: Why it was included. This is the current state of the conditional compilation stack.
  • How: How this file was included. This string starts with the text that follows the #include statement, in this case #include <asm/processor.h>. This is followed by the search results, in this case this file was found by searching the system includes (sys=) and was found in arch/x86/include. There may be more than one search made as fallback mechanisms are used and a failure will be shown with None. For example usr=None sys=spam/eggs means that the user include directories were searched first and nothing came up, then the system include directories were searched and the file was found in spam/eggs. A special case; CP: means ‘the current place’.
Token Types

If you are interested in what types of preprocessor tokens were encountered than there is a host of information available to you. On the left hand side of each file block is a colour coded histogram of token types. If the file includes others then there will be two, the left hand one is for the file, the right hand one is for all the files it includes. Hovering over either histogram pops up the legend thus:

The legend for the histogram bars.

The actual count of tokens is seen when moving the mouse over the centre of the box. There are three sets of two columns, the left column of the set is total tokens, the right column is for significant tokens, that is those that are not conditionally excluded by #if etc. statements.

The first set is for the specific file, the second set is the descendents and the third set is the total.

Token counts.
Conditional Compilation

One tricky area for comprehending source code is understanding what code is conditionally compiled. Looking at a source file it is not immediately obvious which #if etc. statements are actually being processed.

As an aid CPIP produces an HTML page that is the translation unit with only the conditional compilation statements, what is more they are nested according to their logical execution condition and colour coded according to the resolved state; green means code will be part of the translation unit and red means those statements will be ignored. The links in the (artificial) comment following the statement are to the HTML representation of the file where the statement occurs.

Here is an example:

Conditional compilation.
Macro Definitions

CPIP retains all information about marcos during preprocessing and the file specific page containing macro information starts like this:

Macro information.

The contents starts with a list of links to macro information further down the page; the first set of links is alphabetical to all macros that are declared, even if they are not used. The second set is to any macros that are actually used in pre-processing this file.

These are all linked to the macro details that looks like this, for example BITMAP_LAST_WORD_MASK:

Macro information.

Each macro description has the following:

  • The macro name followed by the reference count for the macro i.e. how many times the pre-processor was required to invoke the definition. This line ends with whether it is still defined at the end of preprocessing (True in this case).
  • The macro definition (this is artificially wrapped for clarity).
  • Following defined @ is where the macro was defined and a link to the source file where the the macro is defined.
  • Then follows a table of locations that the macro was used. In this case it was referenced by include/linux/bitmap.h on line 176, column 20, then line 228, column 20 and so on. Each of these references is a link to the source file representation where the macro is used. NOTE: Where macros are defined in terms of other macros then this location will not necessarily have the literal macro name, it is implicit because of macro dependencies. For example if you look at the last entry kernel/cpu.c line 653, column 47 then you do not see BITMAP_LAST_WORD_MASK, instead you see CPU_BITS_ALL however CPU_BITS_ALL is defined in terms of BITMAP_LAST_WORD_MASK.
  • After “I depend on these macros” is a table (a tree actually) of other macros (with links) that BITMAP_LAST_WORD_MASK depend on, in this case only one, BITS_PER_LONG as you can see in the definition.
  • After “These macros depend on me” is another table (a tree) of other macros (with links) that depend on BITMAP_LAST_WORD_MASK.
A Most Powerful Feature

CPIP’s knowledge about macros and its ability to generate linked documents provides an especially powerful feature for understanding macros.

Some Real Examples

CPIPMain is a command line tool that you can invoke very much like your favorite pre-processor. CPIPMain produces a number of HTML pages and SVG files that make it easier to understand what is happening during preprocessing. This section shows some examples of the kind of thing that CPIP can do.

From the Linux Kernel

Here is CPIPMain.py pre-processing the cpu.c file from the Linux Kernel.

From the CPython Interpreter

Here is CPIPMain.py pre-processing the dictobject.c file which implements the Python 3.6.2 dictionary.

Footnotes

[1]A special case is that the may be a file "Unnamed Pre-Include" at the top left and joined to the preprocessed file with a thick light grey line. This ‘virtual’ file contains the macro declarations made on the CPIPMain.py command line.

Command Line Tools

CPIP has a number of tools run from the command line that can analyse source code. The main one is CPIPMain.py. On installation the command line tool cpipmain is created which just calls main() in CPIPMain.py.

CPIPMain

CPIPMain.py acts very much like a normal pre-processor but, instead of writing out a Translation Unit as test it emits a host of HTML and SVG pages about each file to be pre-processed. Here are Some Real Examples.

Usage

usage: CPIPMain.py [-h] [-c] [-d DUMP] [-g GLOB] [--heap] [-j JOBS] [-k]
                   [-l LOGLEVEL] [-o OUTPUT] [-p] [-r] [-t] [-G]
                   [-S PREDEFINES] [-C] [-D DEFINES] [-P PREINC] [-I INCUSR]
                   [-J INCSYS]
                   path

CPIPMain.py - Preprocess the file or the files in a directory.
  Created by Paul Ross on 2011-07-10.
  Copyright 2008-2017. All rights reserved.
  Licensed under GPL 2.0
USAGE

positional arguments:
  path                  Path to source file or directory.

optional arguments:
  -h, --help            show this help message and exit
  -c                    Add conditionally included files to the plots.
                        [default: False]
  -d DUMP, --dump DUMP  Dump output, additive. Can be: C - Conditional
                        compilation graph. F - File names encountered and
                        their count. I - Include graph. M - Macro environment.
                        T - Token count. R - Macro dependencies as an input to
                        DOT. [default: []]
  -g GLOB, --glob GLOB  Pattern match to use when processing directories.
                        [default: *.*]
  --heap                Profile memory usage. [default: False]
  -j JOBS, --jobs JOBS  Max simultaneous processes when pre-processing
                        directories. Zero uses number of native CPUs [4]. 1
                        means no multiprocessing. [default: 0]
  -k, --keep-going      Keep going. [default: False]
  -l LOGLEVEL, --loglevel LOGLEVEL
                        Log Level (debug=10, info=20, warning=30, error=40,
                        critical=50) [default: 30]
  -o OUTPUT, --output OUTPUT
                        Output directory. [default: out]
  -p                    Ignore pragma statements. [default: False]
  -r, --recursive       Recursively process directories. [default: False]
  -t, --dot             Write an DOT include dependency table and execute DOT
                        on it to create a SVG file. [default: False]
  -G                    Support GCC extensions. Currently only #include_next.
                        [default: False]
  -S PREDEFINES, --predefine PREDEFINES
                        Add standard predefined macro definitions of the form
                        name<=definition>. They are introduced into the
                        environment before anything else. They can not be
                        redefined. __DATE__ and __TIME__ will be automatically
                        allocated in here. __FILE__ and __LINE__ are defined
                        dynamically. See ISO/IEC 9899:1999 (E) 6.10.8
                        Predefined macro names. [default: []]
  -C, --CPP             Sys call 'cpp -dM' to extract and use platform
                        specific macros. These are inserted after -S option
                        and before the -D option. [default: False]
  -D DEFINES, --define DEFINES
                        Add macro definitions of the form name<=definition>.
                        These are introduced into the environment before any
                        pre-include. [default: []]
  -P PREINC, --pre PREINC
                        Add pre-include file path, this file precedes the
                        initial translation unit. [default: []]
  -I INCUSR, --usr INCUSR
                        Add user include search path. [default: []]
  -J INCSYS, --sys INCSYS
                        Add system include search path. [default: []]

Note

Multiprocessing: The pre-processor, and information derived from it, can only be run as a single process but writing individual source files can take advantage of multiple processes. As the latter constitutes the bulk of the time CPIPMain.py takes then using the -j option on multi-processor machines can save a lot of time.

Options
Option Description
--version Show program’s version number and exit
-h, --help Show this help message and exit.
-c Even if a file is conditionally included then add it to the plot. This is experimental so use it at your own risk! [default False]
-d DUMP, --dump=DUMP Dump various outputs to stdout (see below). This option can be repeated [default: []]
-g GLOB, --glob=GLOB Pattern to use when searching directories (ignored for #includes). [default: *.*]
--heap Profile memory usage (requires guppy to be installed). [default: False]
-j JOBS, --jobs=JOBS Max processes when multiprocessing. Zero uses number of native CPUs [4]. Value of 1 disables multiprocessing. [default: 0]
-k Keep going as far as sensible, for some definition of “sensible”. [default: False]
-l LOGLEVEL, --loglevel=LOGLEVEL Log Level (debug=10, info=20, warning=30, error=40, critical=50) [default: 30]
-o OUTPUT, --output=OUTPUT Output directory [default: “out”]
-p Ignore pragma statements. [default: False]
-r Recursively provesses directories. [default: False]
-t, --dot Write an DOT include dependency file and execute DOT on it to create a SVG file. Requires GraphViz. [default: False]
-C , --CPP Sys call cpp -dM to extract and use platform specific macros. These are inserted after -S option and before the -D option. [default: False]
-G Support GCC extensions. Currently only #include_next. [default: False]
-I INCUSR, --usr=INCUSR Add user include search path (additive). This option can be repeated [default: []]
-J INCSYS, --sys=INCSYS Add system include search path (additive). This option can be repeated [default: []]
-S PREDEFINES, --predefine=PREDEFINES Add standard predefined macro defintions of the form name<=defintion>. These are introduced into the environment before anything else. These macros can not be redefined. __DATE__ and __TIME__ will be automatically defined. This option can be repeated [default: []]
-D DEFINES, --define=DEFINES Add macro definitions of the form name<=definition>. These are introduced into the environment before any pre-include. This option can be repeated [default: []]
-P PREINC, --pre=PREINC Add a pre-include file, this will be included before any header. This option can be repeated [default: []]

The -d option can be repeated to generate multiple text outputs on stdout:

Output Description
-d C Conditional compilation graph.
-d F File names encountered and their count.
-d I Include graph.
-d M Macro environment.
-d T Token count.
-d R Macro dependencies as an input to DOT.

Examples of these are shown below Using -d Option.

Arguments

One or more paths of file(s) to be preprocessed.

Examples

Here is a simple example of processing the demo code that is in the PpLexer tutorial here: Files to Pre-Process.

Here we set:

  • l 20 sets logging to INFO
  • -o sets the output to ../../demo/output_00/
  • -C is used to get the platform specific macros.
  • -J is used to set a single system include as ../../demo/sys/
  • -I is used to set a single user include as ../../demo/usr/

We are processing ../../demo/src/main.cpp and stdout is something like this:

$ python3 CPIPMain.py -l 20 -C -o ../../demo/output_00/ -J ../../demo/sys/ -I ../../demo/usr/ ../../demo/src/main.cpp
2012-03-20 07:41:38,655 INFO     TU in HTML:
2012-03-20 07:41:38,655 INFO       ../../demo/output_00/main.cpp.html
2012-03-20 07:41:38,664 INFO     Processing TU done.
2012-03-20 07:41:38,665 INFO     Macro history to:
2012-03-20 07:41:38,665 INFO       ../../demo/output_00/main.cpp_macros.html
2012-03-20 07:41:38,668 INFO     Include graph (SVG) to:
2012-03-20 07:41:38,668 INFO       ../../demo/output_00/main.cpp.include.svg
2012-03-20 07:41:38,679 INFO     Writing include graph (TEXT) to:
2012-03-20 07:41:38,679 INFO       ../../demo/output_00/main.cpp.include.svg
2012-03-20 07:41:38,679 INFO     Writing include graph (DOT) to:
2012-03-20 07:41:38,679 INFO       ../../demo/output_00/main.cpp.include.svg
2012-03-20 07:41:38,679 INFO     Creating include Graph for DOT...
2012-03-20 07:41:38,692 INFO     dot returned 0
2012-03-20 07:41:38,693 INFO     Creating include Graph for DOT done.
2012-03-20 07:41:38,693 INFO     Conditional compilation graph in HTML:
2012-03-20 07:41:38,693 INFO       ../../demo/output_00/main.cpp.ccg.html
2012-03-20 07:41:38,698 INFO     Done: ../../demo/src/main.cpp
2012-03-20 07:41:38,698 INFO     ITU in HTML: ...\main.cpp
2012-03-20 07:41:38,708 INFO     ITU in HTML: ...\system.h
2012-03-20 07:41:38,711 INFO     ITU in HTML: ...\user.h
2012-03-20 07:41:38,716 INFO     All done.
CPU time =    0.051 (S)
Bye, bye!

In the output directory will be the HTML and SVG results.

Using -d Option

All these are using the following command where ? is replace with a letter:

$ python3 CPIPMain.py -d? -o ../../demo/output_00/ -J ../../demo/sys/ -I ../../demo/usr/ ../../demo/src/main.cpp

Multiple outputs are obtained with, for example, -dC -dF

-d C

Conditional compilation graph:

---------------------- Conditional Compilation Graph ----------------------
#ifndef __USER_H__ /* True "../../demo/usr/user.h" 1 0 */
    #ifndef __SYSTEM_H__ /* True "../../demo/sys/system.h" 1 4 */
    #endif /* True "../../demo/sys/system.h" 6 13 */
#endif /* True "../../demo/usr/user.h" 7 20 */
#if defined(LANG_SUPPORT) && defined(FRENCH) /* True "../../demo/src/main.cpp" 5 69 */
#elif defined(LANG_SUPPORT) && defined(AUSTRALIAN) /* False "../../demo/src/main.cpp" 7 110 */
#else /* False "../../demo/src/main.cpp" 9 117 */
#endif /* False "../../demo/src/main.cpp" 11 124 */
-------------------- END Conditional Compilation Graph --------------------
-d F

Files encountered and how many times processed:

------------------------ Count of files encountered -----------------------
   1  ../../demo/src/main.cpp
   1  ../../demo/sys/system.h
   1  ../../demo/usr/user.h
---------------------- END Count of files encountered ---------------------
-d I

The include graph:

------------------------------ Include Graph ------------------------------
../../demo/src/main.cpp [43, 21]:  True "" ""
000002: #include ../../demo/usr/user.h
        ../../demo/usr/user.h [10, 6]:  True "" "['"user.h"', 'CP=None', 'usr=../../demo/usr/']"
        000004: #include ../../demo/sys/system.h
                ../../demo/sys/system.h [10, 6]:  True "!def __USER_H__" "['<system.h>', 'sys=../../demo/sys/']"
---------------------------- END Include Graph ----------------------------
-d M

The macro environment and history:

---------------------- Macro Environment and History ----------------------
Macro Environment:
#define FRENCH /* ../../demo/usr/user.h#5 Ref: 1 True */
#define LANG_SUPPORT /* ../../demo/sys/system.h#4 Ref: 2 True */
#define __SYSTEM_H__ /* ../../demo/sys/system.h#2 Ref: 0 True */
#define __USER_H__ /* ../../demo/usr/user.h#2 Ref: 0 True */

Macro History (referenced macros only):
In scope:
#define FRENCH /* ../../demo/usr/user.h#5 Ref: 1 True */
    ../../demo/src/main.cpp 5 38
#define LANG_SUPPORT /* ../../demo/sys/system.h#4 Ref: 2 True */
    ../../demo/src/main.cpp 5 13
    ../../demo/src/main.cpp 7 15
-------------------- END Macro Environment and History --------------------
-d T

The token count:

------------------------------- Token count -------------------------------
       0  header-name
       8  identifier
       1  pp-number
       0  character-literal
       1  string-literal
      11  preprocessing-op-or-punc
       0  non-whitespace
      11  whitespace
       0  concat
      32  TOTAL
----------------------------- END Token count -----------------------------

Performance

As CPIPMain.py/cpipmain is written in Python it is pretty slow, far slower than gcc or clang. Internally in cpip there are some fairly agressive integrity checks such as _assertDefineMapIntegrity() in cpip.core.MacroEnv.MacroEnv. These integrity checks are invoked as asserts, for example:

assert(self._assertDefineMapIntegrity())

So that they can be turned off by using optimisation level 1.

For CPIPMain.py:

$ python3 -O CPIPMain.py ...

And cpipmain:

$ PYTHONOPTIMIZE=1 cpipmain ...

This optimisation can reduce the execution time by around 30%.

CPIP Tutorials

Various Tutorials on how to use CPIP.

Contents:

PpLexer Tutorial

The PpLexer module represents the user side view of pre-processing. This tutorial shows you how to get going.

Setting Up

Files to Pre-Process

First let’s get some demonstration code to pre-process. You can find this at cpip/demo/ and the directory structure looks like this:

\---demo/
    |   cpip.py
    |
    \---proj/
        +---src/
        |       main.cpp
        |
        +---sys/
        |       system.h
        |
        \---usr/
                user.h

In proj/ is some source code that includes files from usr/ and sys/. This tutorial will take you through writing cpip.py to use PpLexer to pre-process them.

First lets have a look at the source code that we are preprocessing. It is a pretty trivial variation of a common them, but beware, pre-processing directives abound!

The file demo/proj/src/main.cpp looks like this:

#include "user.h"

int main(char **argv, int argc)
{
#if defined(LANG_SUPPORT) && defined(FRENCH)
    printf("Bonjour tout le monde\n");
#elif defined(LANG_SUPPORT) && defined(AUSTRALIAN)
    printf("Wotcha\n");
#else
    printf("Hello world\n");
#endif
    return 1;
}

That includes a file user.h that can be found at demo/proj/usr/user.h:

#ifndef __USER_H__
#define __USER_H__

#include <system.h>
#define FRENCH

#endif // __USER_H__

In turn that includes a file system.h that can be found at demo/proj/sys/system.h:

#ifndef __SYSTEM_H__
#define __SYSTEM_H__

#define LANG_SUPPORT

#endif // __SYSTEM_H__

Clearly since the system is mandating language support and the user is specifying French as their language of choice then you would not expect this to write out “Hello World”, or would you?

Well you are in the hands of the pre-processor and that is what CPIP knows all about. First we need to create a PpLexer.

Creating a PpLexer

This is the template that we will use for the tutorial, it just takes a single argument from the command line sys.argv[1]:

1
2
3
4
5
6
7
8
import sys

def main():
    print('Processing:', sys.argv[1])
    # Your code here

if __name__ == "__main__":
    main()

Of course this doesn’t do much yet, invoking it just gives:

python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp

We now need to import and create and PpLexer.PpLexer object, and this takes at least two arguments; firstly the file to pre-process, the secondly an include handler. The latter is need because the C/C++ standards do not specify how an #include directive is to be processed as that is as an implementation issue. So we need to provide an defined implementation of something that can find #include'd files.

CPIP provides several such implementations in the module IncludeHandler and the one that does what, I guess, most developers expect from a pre-processor is IncludeHandler.CppIncludeStdOs. This class takes at least two arguments; a list of search paths to the user include directories and a list of search paths to the system include directories. With this we can construct a PpLexer object so our code now looks like this:

import sys
from cpip.core import PpLexer, IncludeHandler

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['proj/usr',],
        theSysDirs=['proj/sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)

if __name__ == "__main__":
    main()

This still doesn’t do much yet, invoking it just gives:

python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp

But, in the absence of error, shows that we can construct a PpLexer.

Put the PpLexer to Work

To get PpLexer to do something, we need to make the call to PpLexer.PpTokens(). This function is a generator of preprocessing tokens.

Lets just print them out with this code:

import sys
from cpip.core import PpLexer, IncludeHandler

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['proj/usr',],
        theSysDirs=['proj/sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    for tok in myLex.ppTokens():
        print(tok)

if __name__ == "__main__":
    main()

Invoking it now gives:

$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
...
PpToken(t="int", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="main", tt=identifier, line=True, prev=False, ?=False)
PpToken(t="(", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="char", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="*", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="*", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="argv", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=",", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="int", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="argc", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=")", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="{", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="printf", tt=identifier, line=True, prev=False, ?=False)
PpToken(t="(", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=""Bonjour tout le monde\n"", tt=string-literal, line=False, prev=False, ?=False)
PpToken(t=")", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t=";", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="return", tt=identifier, line=True, prev=False, ?=False)
PpToken(t=" ", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="1", tt=pp-number, line=False, prev=False, ?=False)
PpToken(t=";", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)
PpToken(t="}", tt=preprocessing-op-or-punc, line=False, prev=False, ?=False)
PpToken(t="\n", tt=whitespace, line=False, prev=False, ?=False)

The PpLexer is yielding PpToken objects that are interesting in themselves because they not only have content but the type of content (whitespace, punctuation, literals etc.). A simplification is to change the code to print out the token value by changing a line in the code from:

print tok

To:

print tok.t

To give:

Processing: proj/src/main.cpp










int   main ( char   * * argv ,   int   argc ) 
{ 

printf ( "Bonjour tout le monde\n" ) ; 

return   1 ; 
} 

It is definately pre-processed and although the output is correct it is rather verbose because of all the whitespace generated by the pre-processing (newlines are always the consequence of pre-processing directives).

We can clean this whitespace up very simply by invoking PpTokens.ppTokens() with a suitable argument to reduce spurious whitespace thus: myLex.ppTokens(minWs=True). This minimises the whitespace runs to a single space or newline. Our code now looks like this:

import sys
from cpip.core import PpLexer, IncludeHandler

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['proj/usr',],
        theSysDirs=['proj/sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    for tok in myLex.ppTokens(minWs=True):
        print(tok.t, end=' ')

if __name__ == "__main__":
    main()

Invoking it now gives:

Processing: proj/src/main.cpp

int   main ( char   * * argv ,   int   argc ) 
{ 
printf ( "Bonjour tout le monde\n" ) ; 
return   1 ; 
} 

This is exactly the result that one would expect from pre-processing the original source code.

And now for something Completely Different

So far, so boring because any pre-processor can do the same, PpLexer can do far more than this. PpLexer keeps track of a large amount of significant pre-processing information and that is available to you through the PpLexer APIs.

For a moment lets remove the minWs=True from myLex.ppTokens() so that we can inspect the state of the PpLexer at every token (rather than skipping whitespace tokens that might represent pre-processing directives).

File Include Stack

Changing the code to this shows the include file hierarchy every step of the way:

for tok in myLex.ppTokens():
    print myLex.fileStack

Gives the following output:

$ python cpip.py proj/src/main.cpp
Processing: proj/src/main.cpp
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h', 'proj/sys/system.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp', 'proj/usr/user.h']
['proj/src/main.cpp']
...
Conditional State

Changing the code to this:

for tok in myLex.ppTokens(condLevel=1):
    print myLex.condState

Produces this output:

Processing: proj/src/main.cpp
(True, '')
...
(True, '')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(True, 'defined(LANG_SUPPORT) && defined(FRENCH)')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && defined(LANG_SUPPORT) && defined(AUSTRALIAN))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(False, '(!(defined(LANG_SUPPORT) && defined(FRENCH)) && !(defined(LANG_SUPPORT) && defined(AUSTRALIAN)))')
(True, '')
...
(True, '')

State of the PpLexer After Pre-processing

A more common use case is to query the PpLexer after processing the file. The following code example will:

  • Capture all tokens as a Translation Unit and write it out with minimal whitespace [lines 11-16].
  • Print out a text representation of the file include graph [lines 18-21].
  • Print out a text representation of the conditional compilation graph [lines 23-26].
  • Print out a text representation of the macro environment as it exists at the end of processing the Translation Unit [lines 28-31].
  • Print out a text representation of the macro history for all macros, whether referenced or not, as it exists at the end of processing the Translation Unit [lines 33-36].

Here is the code, named cpip_07.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import sys
from cpip.core import PpLexer, IncludeHandler

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['proj/usr',],
        theSysDirs=['proj/sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    tu = ''.join(tok.t for tok in myLex.ppTokens(minWs=True))
    
    print()
    print(' Translation Unit '.center(75, '='))
    print(tu)
    print(' Translation Unit END '.center(75, '='))
    
    print()
    print(' File Include Graph '.center(75, '='))
    print(myLex.fileIncludeGraphRoot)
    print(' File Include Graph END '.center(75, '='))
    
    print()
    print(' Conditional Compilation Graph '.center(75, '='))
    print(myLex.condCompGraph)
    print(' Conditional Compilation Graph END '.center(75, '='))
    
    print()
    print(' Macro Environment '.center(75, '='))
    print(myLex.macroEnvironment)
    print(' Macro Environment END '.center(75, '='))
    
    print()
    print(' Macro History '.center(75, '='))
    print(myLex.macroEnvironment.macroHistory(incEnv=False, onlyRef=False))
    print(' Macro History END '.center(75, '='))

if __name__ == "__main__":
    main()

Invoking this code thus:

$ python3 cpip_07.py ../src/main.cpp

Gives this output:

Processing: ../src/main.cpp
============================= Translation Unit ============================

int main(char **argv, int argc)
{
printf("Bonjour tout le monde\n");
return 1;
}

=========================== Translation Unit END ==========================

============================ File Include Graph ===========================
../src/main.cpp [43, 21]:  True "" ""
000002: #include ../usr/user.h
        ../usr/user.h [10, 6]:  True "" "['"user.h"', 'CP=None', 'usr=../usr']"
        000004: #include ../sys/system.h
                ../sys/system.h [10, 6]:  True "!def __USER_H__" "['<system.h>', 'sys=../sys']"
========================== File Include Graph END =========================

====================== Conditional Compilation Graph ======================
#ifndef __USER_H__ /* True "../usr/user.h" 1 0 */
    #ifndef __SYSTEM_H__ /* True "../sys/system.h" 1 4 */
    #endif /* True "../sys/system.h" 6 13 */
#endif /* True "../usr/user.h" 7 20 */
#if defined(LANG_SUPPORT) && defined(FRENCH) /* True "../src/main.cpp" 5 69 */
#elif defined(LANG_SUPPORT) && defined(AUSTRALIAN) /* False "../src/main.cpp" 7 110 */
#else /* False "../src/main.cpp" 9 117 */
#endif /* False "../src/main.cpp" 11 124 */
==================== Conditional Compilation Graph END ====================

============================ Macro Environment ============================
#define FRENCH /* ../usr/user.h#5 Ref: 1 True */
#define LANG_SUPPORT /* ../sys/system.h#4 Ref: 2 True */
#define __SYSTEM_H__ /* ../sys/system.h#2 Ref: 0 True */
#define __USER_H__ /* ../usr/user.h#2 Ref: 0 True */
========================== Macro Environment END ==========================

============================== Macro History ==============================
Macro History (all macros):
In scope:
#define FRENCH /* ../usr/user.h#5 Ref: 1 True */
    ../src/main.cpp 5 38
#define LANG_SUPPORT /* ../sys/system.h#4 Ref: 2 True */
    ../src/main.cpp 5 13
    ../src/main.cpp 7 15
#define __SYSTEM_H__ /* ../sys/system.h#2 Ref: 0 True */
#define __USER_H__ /* ../usr/user.h#2 Ref: 0 True */
============================ Macro History END ============================

This is simple to the point of crude as the PpLexer supplies a far richer data seam than just text.

File Include Graph interface is described here: FileIncludeGraph Tutorial

Summary

There are several ways that you can inspect pre-processing with PpLexer:

  • Supplying arguments to PpLexer.ppTokens() with arguments such as minWs or incCond.
  • Accessing the state of each token as it is generated such as tok.tt or tok.isCond.
  • Accessing the state of PpLexer as each token as it is generated or once all tokens have been generated such as PpLexer.condState.
  • Creating PpLexer with a user specified behaviour. This is the subject of the next section.

Advanced PpLexer Construction

The PpLexer constructor allows you to change the behaviour of pre-processing is a number of ways, effectively these are hooks into pre-processing that can:

  • Varying how #include‘d files are inserted into the Translation Unit.
  • Pre-including header files.
  • Changing the behaviour of PpLexer in unusual circumstances (errors etc.).
  • Handling #pragma statements, in this way various compilers can be imitated.
Include Handler

When an #include directive is encountered a compliant implementation is required to search for and insert into the Translation Unit the content referenced by the payload of the #include directive.

The standard does not specify how this should be accomplished. In CPIP the how is achieved by an implementation of an cpip.core.IncludeHandler.

An Aside

It is entirely acceptable within the standard to have an #include system that does not rely on a file system at all. Perhaps it might rely on a database like this:

#include "SQL:spam.eggs#1284"

An include handler could take that payload and recover the content from some database rather than the local file system.

Or, more prosaically, an include mechanism such as this:

#include "http:://some.url.org/spam/eggs#1284"

That leads to a fairly obvious way of managing that #include payload.

Implementation

If you want to create a new include mechanism then you should sub-class the base class cpip.core.IncludeHandler.CppIncludeStd [reference documentation: IncludeHandler].

Sub-classing this requires implementing the following methods :

  • def initialTu(self, theTuIdentifier):

    Given an Translation Unit Identifier this should return a class FilePathOrigin or None for the initial translation unit. As a precaution this should include code to check that the stack of current places is empty. For example:

    if len(self._cpStack) != 0:
        raise ExceptionCppInclude('setTu() with CP stack: %s' % self._cpStack)
    
  • def _searchFile(self, theCharSeq, theSearchPath):

    Given an HcharSeq/Qcharseq and a searchpath this should return a class FilePathOrigin or None.

As examples there are a couple of reference implementations in cpip.core.IncludeHandler:

Pre-includes

The PpLexer can be supplied with an ordered list of file like objects that are pre-include files. These are processed in order before the ITU is processed. Macro redefinition rules apply.

For example CPIPMain.py can take a list of user defined macros on the command line. It then creates a list with a single pre-include file thus:

import io
from cpip.core import PpLexer

# defines is a list thus:
# ['spam(x)=x+4', 'eggs',]

myStr = '\n'.join(['#define '+' '.join(d.split('=')) for d in defines])+'\n'
myPreIncFiles = [io.StringIO(myStr), ]
# Create other constructor information here...
myLexer = PpLexer.PpLexer(
            anItu, # File to pre-process
            myIncH, # Include handler
            preIncFiles=myPreIncFiles,
        )
Diagnostic

You can pass in to PpLexer a diagnostic object, this controls how the lexer responds to various conditions such as warning error etc. The default is for the lexer to create a CppDiagnostic.PreprocessDiagnosticStd.

If you want to create your own then sub-class the cpip.core.CppDiagnostic.PreprocessDiagnosticStd class in the module cpip.ref.CppDiagnostic.

Sub-classing PreprocessDiagnosticStd allows you to override any of the following that might be called by the PpLexer:

  • def undefined(self, msg, theLoc=None): Reports when an ‘undefined’ event happens.
  • def partialTokenStream(self, msg, theLoc=None): Reports when an partial token stream exists (e.g. an unclosed comment).
  • def implementationDefined(self, msg, theLoc=None): Reports when an ‘implementation defined’ event happens.
  • def error(self, msg, theLoc=None): Reports when an error event happens.
  • def warning(self, msg, theLoc=None): Reports when an warning event happens.
  • def handleUnclosedComment(self, msg, theLoc=None): Reports when an unclosed comment is seen at EOF.
  • def unspecified(self, msg, theLoc=None): Reports when unspecified behaviour is happening, For example order of evaluation of ‘#’ and ‘##’.
  • def debug(self, msg, theLoc=None): Reports a debug message.

There are a couple of implementations in the CppDiagnostic module that may be of interest:

Pragma

You can pass in a specialised handler for #pragma statements [default: None]. This shall sub-class cpip.core.PragmaHandler.PragmaHandlerABC and can implement:

  • The boolean attribute replaceTokens is to be implemented. If True then the tokens following the #pragma statement will be be macro replaced by the PpLexer using the current macro environment before being passed to this pragma handler.
  • A method def pragma(self, theTokS): that takes a non-zero length list of PpTokens the last of which will be a newline token. Any token this method returns will be yielded as part of the Translation Unit (and thus subject to macro replacement for example).

Have a look at the core module cpip.core.PragmaHandler for some example implementations.

FileIncludeGraph Tutorial

The PpLexer module collects the file include graph. This tutorial shows you how to use it for you own ends.

Creating a FileIncludeGraph

A FileIncludeGraph object is one of the artifacts produced by a PpLexer [see the tutorial here: PpLexer Tutorial].

Once the PpLexer has processed the Translation Unit it has and attribute fileIncludeGraphRoot which is an instance of the class FileIncludeGraph.FileIncludeGraphRoot.

Here is the code to create a file include graph:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import sys
from cpip.core import PpLexer
from cpip.core import IncludeHandler

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['../usr',],
        theSysDirs=['../sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    tu = ''.join(tok.t for tok in myLex.ppTokens(minWs=True))
    print(repr(myLex.fileIncludeGraphRoot))

if __name__ == "__main__":
    main()

Invoking this code thus (in the manner of the PpLexer Tutorial):

$ python3 cpip_08.py ../src/main.cpp

Gives this output:

Processing: ../src/main.cpp
<cpip.core.FileIncludeGraph.FileIncludeGraphRoot object at 0x100753790>

FileIncludeGraph Structure

The structure is a tree with each node being an included file, the root being the Initial Translation Unit i.e. the file being pre-processed. Source code order is ‘left-to-right’ and depth is the degree of #include statements.

The class FileIncludeGraph.FileIncludeGraphRoot has a fairly rich interface, reference documentation for the module is here: FileIncludeGraph

A File Graph Visitor

The FileIncludeGraph.FileIncludeGraphRoot has a method def acceptVisitor(self, visitor): can accept a visitor object (that can inherit from FigVisitorBase) for traversing the graph. This takes the visitor object and calls visitor.visitGraph(self, theFigNode, theDepth, theLine) on that object where depth is the current depth in the graph as an integer and line the line that is a non-monotonic sibling node ordinal.

There are a number of visitor examples in the FileIncludeGraph test code. CPIPMain has a number of visitor implementations.

visitGraph(self, theFigNode, theDepth, theLine)

theFigNode is a cpip.core.FileIncludeGraph.FileIncludeGraph object. See FileIncludeGraph

Example Visitor

Here we create a simple visitor [lines 6-9]. After processing the Translation Unit [line 18] we create a visitor and traverse the include graph [lines 19-20]. At each node in the graph the visitor merely prints out the file (node) name and the findLogic string i.e. how this file was found for inclusion [line 9].

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import sys
from cpip.core import PpLexer
from cpip.core import IncludeHandler
from cpip.core import FileIncludeGraph

class Visitor(FileIncludeGraph.FigVisitorBase):
    
    def visitGraph(self, theFigNode, theDepth, theLine):
        print(theFigNode.fileName, theFigNode.findLogic)

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['../usr',],
        theSysDirs=['../sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    tu = ''.join(tok.t for tok in myLex.ppTokens(minWs=True))
    myVis = Visitor()
    myLex.fileIncludeGraphRoot.acceptVisitor(myVis)

if __name__ == "__main__":
    main()

Invoking this code thus (in the manner of the PpLexer Tutorial):

$ python3 cpip_09.py ../src/main.cpp

Gives this output:

Processing: ../src/main.cpp
../src/main.cpp
../usr/user.h ['"user.h"', 'CP=None', 'usr=../usr']
../sys/system.h ['<system.h>', 'sys=../sys']

For example, in line 3, this means that the file ../usr/user.h was included with a #include "user.h" statement, first the “Current Place” (CP) was searched (unsuccessfully so result None), then the user include directories were searched and the file was found in the ..usr directory.

Creating a Bespoke Tree From a FileIncludeGraph

The use case here is, given a FileIncludeGraph, can I simply create a tree of objects of my own definition from the graph? An example would be creating a structure that makes it easy to plot an SVG graph. The class should sub-class cpip.core.FileIncludeGraph.FigVisitorTreeNodeBase.

The solution is to create a cpip.core.FileIncludeGraph.FigVisitorTree object with a class definition for the node objects. This class definition must take in its constructor a file node (None for the root) and a line number.

Here is an example that is used to create a tree of file name and token counts. A class MyVisitorTreeNode is defined thqat on construction extracts file name and token count data from the file include graph node. The other requirement is to implement finalise at the the end of tree construction that updates the token count with those of the nodes children. Finally it suplies some string representation of itself.

The special code is on lines 40-43 where the FileIncludeGraph.FigVisitorTree visitor is created with a cls specification of MyVisitorTreeNode. The file include graph is then presented with the visitor (line 41). Finally a tree of MyVisitorTreeNode objects is retrieved with a call to tree().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import sys
from cpip.core import PpLexer
from cpip.core import IncludeHandler
from cpip.core import FileIncludeGraph

class MyVisitorTreeNode(FileIncludeGraph.FigVisitorTreeNodeBase):
    PAD = '  '
    def __init__(self, theFig, theLineNum):
        super(MyVisitorTreeNode, self).__init__(theLineNum)
        if theFig is None:
            self._name = None
            self._t = 0
        else:
            self._name = theFig.fileName
            self._t = theFig.numTokens
            
    def finalise(self):
        # Tot up tokens
        for aChild in self._children:
            aChild.finalise()
            self._t += aChild._t
            
    def __str__(self):
        return self.retStr(0)
        
    def retStr(self, d):
        r = '%s%04d %s %d\n' % (self.PAD*d, self._lineNum, self._name, self._t)
        for aC in self._children:
            r += aC.retStr(d+1)
        return r

def main():
    print('Processing:', sys.argv[1])
    myH = IncludeHandler.CppIncludeStdOs(
        theUsrDirs=['../usr',],
        theSysDirs=['../sys',],
        )
    myLex = PpLexer.PpLexer(sys.argv[1], myH)
    tu = ''.join(tok.t for tok in myLex.ppTokens(minWs=True))
    myVis = FileIncludeGraph.FigVisitorTree(MyVisitorTreeNode)
    myLex.fileIncludeGraphRoot.acceptVisitor(myVis)
    myTree = myVis.tree()
    print(myTree)

if __name__ == "__main__":
    main()

Invoking this so:

$ python3 cpip_10.py ../src/main.cpp

Gives this output:

Processing: ../src/main.cpp
-001 None 63
  -001 ../src/main.cpp 63
    0002 ../usr/user.h 20
      0004 ../sys/system.h 10

Further examples can be found in the code in IncGraphSVGBase.py and IncGraphXML.py

CPIP Reference

CPIPMain

CPIPMain.py – Preprocess the file or the files in a directory.

class cpip.CPIPMain.FigVisitorDot(lenPrefix=0)

Simple visitor that collects parent/child links for plotting the graph with dot.

visitGraph(theFigNode, theDepth, theLine)

.

class cpip.CPIPMain.FigVisitorLargestCommanPrefix

Simple visitor that walks the tree and finds the largest common file name prefix.

visitGraph(theFigNode, theDepth, theLine)

Capture the file name.

class cpip.CPIPMain.MainJobSpec(incHandler, preDefMacros, preIncFiles, diagnostic, pragmaHandler, keepGoing, conditionalLevel, dumpList, helpMap, includeDOT, cmdLine, gccExtensions)
cmdLine

Alias for field number 10

conditionalLevel

Alias for field number 6

diagnostic

Alias for field number 3

dumpList

Alias for field number 7

gccExtensions

Alias for field number 11

helpMap

Alias for field number 8

incHandler

Alias for field number 0

includeDOT

Alias for field number 9

keepGoing

Alias for field number 5

pragmaHandler

Alias for field number 4

preDefMacros

Alias for field number 1

preIncFiles

Alias for field number 2

class cpip.CPIPMain.PpProcessResult(ituPath, indexPath, tuIndexFileName, total_files, total_lines, total_bytes)
indexPath

Alias for field number 1

ituPath

Alias for field number 0

total_bytes

Alias for field number 5

total_files

Alias for field number 3

total_lines

Alias for field number 4

tuIndexFileName

Alias for field number 2

cpip.CPIPMain.main()

Processes command line to preprocess a file or a directory.

cpip.CPIPMain.preProcessFilesMP(dIn, dOut, jobSpec, glob, recursive, jobs)

Multiprocessing code to preprocess directories. Returns a count of ITUs processed.

cpip.CPIPMain.preprocessDirToOutput(inDir, outDir, jobSpec, globMatch, recursive, numJobs)

Pre-process all the files in a directory. Returns a count of the TUs. This uses multiprocessing where possible. Any Exception (such as a KeyboardInterupt) will terminate this function but write out an index of what has been achieved so far.

cpip.CPIPMain.preprocessFileToOutput(ituPath, outDir, jobSpec)

Preprocess a single file. May raise ExceptionCpip (or worse!). Returns a: PpProcessResult(ituPath, indexPath, tuIndexFileName(ituPath) total_files, total_lines, total_bytes)

cpip.CPIPMain.preprocessFileToOutputNoExcept(ituPath, *args, **kwargs)

Preprocess a single file and catch all ExceptionCpip exceptions and log them.

cpip.CPIPMain.retFileCountMap(theLexer)

Visits the Lexers file include graph and returns a dict of: {file_name : (inclusion_count, line_count, bytes_count).

The line_count, bytes_count are obtained by reading the file.

cpip.CPIPMain.retOptionMap(theOptParser, theOpts)

Returns map of {opt_name : (value, help), ...} from the current options.

cpip.CPIPMain.writeIndexHtml(theItuS, theOutDir, theJobSpec, time_start, total_files, total_lines, total_bytes)

Writes the top level index.html page for a pre-processed file.

theOutDir - The output directory.

theTuS - The list of translation units processed.

theCmdLine - The command line as a string.

theOptMap is a map of {opt_name : (value, help), ...} from the command line options. TODO: This is fine but has too many levels of indent.

cpip.CPIPMain.writeTuIndexHtml(theOutDir, theTuPath, theLexer, theFileCountMap, theTokenCntr, hasIncDot, macroHistoryIndexName)

Write the index.html for a single TU.

theOutDir
The output directory to write to.
theTuPath
The path to the original ITU.
theLexer
The pre-processing Lexer that has pre-processed the ITU/TU.
theFileCountMap
dict of {file_path : data, ...} where data is things like inclusion count, lines, bytes and so on.
theTokenCntr
cpip.core.PpTokenCount.PpTokenCount containing the token counts.
hasIncDot
bool to emit graphviz .dot files.
macroHistoryIndexName
String of the filename of the macro history.

Returns: (total_files, total_lines, total_bytes) as integers.

CppCondGraphToHtml

Writes out the Cpp Conditional processing graph as HTML.

class cpip.CppCondGraphToHtml.CcgVisitorToHtml(theHtmlStream)

Writing CppCondGraph visitor object.

visitPost(theCcgNode, theDepth)

Post-traversal call with a CppCondGraphNode and the integer depth in the tree.

visitPre(theCcgNode, theDepth)

Pre-traversal call with a CppCondGraphNode and the integer depth in the tree.

cpip.CppCondGraphToHtml.processCppCondGrphToHtml(theLex, theHtmlPath, theTitle, theIdxPath)

Given the PpLexer write out the Cpp Cond Graph to the HTML file. theLex is a PpLexer. theHtmlPath is the file path of the output. theTitle is the page title. theIdxPath is the file name of the index page. theTuIndexer is a TuIndexer.TuIndexer object.

FileStatus

Provides a command line tool for finding out information on files:

$ python3 src/cpip/FileStatus.py -r src/cpip/
Cmd: src/cpip/FileStatus.py -r src/cpip/
File                                     SLOC      Size                               MD5  Last modified
src/cpip/CPIPMain.py                     1072     44829  4dee8712b7d51f978689ef257cf1fd34  Wed Sep 27 08:57:00 2017
src/cpip/CppCondGraphToHtml.py            124      4862  4f0d5731ef6f3d47ec638f00e7646a9f  Fri Sep  8 15:30:41 2017
src/cpip/DupeRelink.py                    269     11795  914ed2149dce6584e6f3f55ec0e2b923  Wed Sep 27 11:35:32 2017
src/cpip/FileStatus.py                    218      8015  6db0658622e82d32a9a9b4c8eb9e82e5  Thu Sep 28 11:13:40 2017
src/cpip/IncGraphSVG.py                  1026     45049  7b82651dadd44eb4ed65d390f6c052df  Fri Sep  8 15:30:41 2017
...
src/cpip/util/Tree.py                     166      5719  cdb81d1eaaf6a1743e5182355f2e75bb  Fri Sep  8 15:30:41 2017
src/cpip/util/XmlWrite.py                 425     15114  48563685ace3ec0f6d734695cac17ede  Tue Sep 12 15:38:55 2017
src/cpip/util/__init__.py                  31      1161  208abac9edd9682f438945906a451473  Fri Sep  8 15:30:41 2017
Total [54]                              19475    789349
CPU time =    0.041 (S)
Bye, bye!
class cpip.FileStatus.FileInfo(thePath)

Holds information on a text file.

count

Files processed.

size

Size in bytes.

sloc

Lines in file.

write(theS=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, incHash=True)

Writes the number of lines and bytes (optionally MD5) to stream.

writeHeader(theS=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)

Writes header to stream.

class cpip.FileStatus.FileInfoSet(thePath, glob=None, isRecursive=False)

Contains information on a set of files.

processDir(theDir, glob, isRecursive)

Read a directory and return a map of {path : class FileInfo, ...}

processPath(theP, glob=None, isRecursive=False)

Process a file or directory.

write(theS=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)

Write summary to stream.

cpip.FileStatus.main()

Prints out the status of files in a directory:

$ python ../src/cpip/FileStatus.py --help
Cmd: ../src/cpip/FileStatus.py --help
Usage: FileStatus.py [options] dir
Counts files and sizes.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -g GLOB, --glob=GLOB  Space separated list of file match patterns. [default:
                        *.py]
  -l LOGLEVEL, --loglevel=LOGLEVEL
                        Log Level (debug=10, info=20, warning=30, error=40,
                        critical=50) [default: 30]
  -r                    Recursive. [default: False]

IncGraphSVG

class cpip.IncGraphSVG.SVGTreeNodeMain(theFig, theLineNum)

This does most of the heavy lifting of plotting the include graph SVG. The challenges are plotting things in the ‘right’ order and with the ‘right’ JavaScript so that the DHTML does not look too hideous.

Basic principle here is that plotInitialise() writes static data. In our case just the pretty histogram pop-up (Ed. is this right???).

Then SVGTreeNodeBase.plotToSVGStream() is called - this is implemented in the base class.

Finally plotFinalise() is called - this overlays the DHTML text. This is a little tricky as our way of DHTML is to switch opacity on underlying objects the switching boundary being the overlying object (e.g. ‘?’). So _all_ the underlying objects need to be written first so that the overlying objects are always ‘visible’ to trigger onmouseover/onmouseout on the underlying object.

condComp

A string of conditional tests.

finalise()

Finalisation this sets up all the bounding boxes of me and my children.

findLogic

The find logic as a string.

plotFinalise(theSvg, theDatumL, theTpt)

Finish the plot. In this case we write the text overlays.

plotInitialise(theSvg, theDatumL, theTpt)

Plot the histogram legend once only.

tokenCounter

This is the PpTokenCount.PpTokenCount() for me only.

tokenCounterChildren

This is the computed PpTokenCount.PpTokenCount() for all my descendents.

tokenCounterTotal

This is the computed PpTokenCount.PpTokenCount() me plus my descendents.

writeAltTextAndMouseOverRect(theSvg, theId, theAltPt, theAltS, theTrigPt, theTrigRect)

Composes and writes the (pop-up) alternate text. Also writes a trigger rectangle.

writePreamble(theS)

Write any preamble such as CSS or JavaScript. To be implemented by child classes.

IncGraphSVGBase

Provides basic functionality to take the #include graph of a preprocessed file and plots it as a diagram in SVG.

Event handlers for onmouseover/onmouseout

We would like to have more detailed information available to the user when they mouseover an object on the SVG image. After a lot of experiment the most cross browser way this is done by providing an event handler to switch the opacity of an element between 0 and 1. See IncGraphSVG.writeAltTextAndMouseOverRect().

cpip.IncGraphSVGBase.processIncGraphToSvg(theLex, theFilePath, theClass, tptPos, tptSweep)

Convert a Include graph from a PpLexer to SVG in theFilePath.

IncGraphXML

Generates an XML file from an include graph.

This is implemented as a hierarchical visitor pattern. This could have be implemented as a non-hierarchical visitor pattern using less memory at the expense of more code.

cpip.IncGraphXML.processIncGraphToXml(theLex, theFilePath)

Convert a Include graph from a PpLexer to SVG in theFilePath.

IncList

ItuToHtml

Converts an ITU to HTML.

class cpip.ItuToHtml.ItuToHtml(theItu, theHtmlDir, keepGoing=False, macroRefMap=None, cppCondMap=None, ituToTuLineSet=None)

Converts an ITU to HTML and write it to the output directory.

MacroHistoryHtml

Writes out a macro history in HTML.

Macros can be: Active - In scope at the end of processing a translation unit (one per identifier). Inactive - Not in scope at the end of processing a translation unit (>=0 per identifier). And: Referenced - Have had some influence over the processing of the translation unit. Not Referenced - No influence over the processing of the translation unit.

Example test:

Macros with reference counts of zero are not that interesting so they are relegated to a page (<file>_macros_noref.html) that just describes their definition and where they where defined.

Macros _with_ reference counts are presented on a page (<file>_macros_ref.html) with one section per macro. The section has: definition, where defined, [This macro depends on the following macros:], [Macros that depend on this macro:],

These two HTML pages are joined by a <file>_macros.html this lists (and links to) the identifiers in this order:

  • Active, ref count >0
  • Inactive, ref count >0
  • Active, ref count =0
  • Inactive, ref count =0

Macro HTML IDs

This is identifier + ‘_’ + n For any active macro the value of n is the number of previously defined macros. Current code is like this:

myUndefIdxS, isDefined = myMacroMap[aMacroName]
# Write the undefined ones
for anIndex in myUndefIdxS:
    myMacro = theEnv.getUndefMacro(anIndex)
    startLetter = _writeTrMacro(theS, theHtmlPath, myMacro,
                               anIndex, startLetter, retVal) 
# Now the defined one
if isDefined:
    myMacro = theEnv.macro(aMacroName)
    startLetter = _writeTrMacro(theS, theHtmlPath, myMacro,
                               len(myUndefIdxS), startLetter, retVal) 
cpip.MacroHistoryHtml.processMacroHistoryToHtml(theLex, theHtmlPath, theItu, theIndexPath)

Write out the macro history from the PpLexer as HTML. Returns a map of: {identifier : [(fileId, lineNum, href_name), ...], ...} which can be used by src->html generator for providing links to macro pages.

cpip.MacroHistoryHtml.splitLine(theStr, splitLen=60, splitLenHard=80)

Splits a long string into string that is a set of lines with continuation characters.

cpip.MacroHistoryHtml.splitLineToList(sIn, splitLen=60, splitLenHard=80)

Splits a long string into a list of lines. This tries to do it nicely at whitespaces but will force a split if necessary.

TokenCss

CSS Support for ITU+TU files in HTML.

cpip.TokenCss.writeCssForFile(theFile)

Writes the CSS file into to the directory that the file is in.

cpip.TokenCss.writeCssToDir(theDir)

Writes the CSS file into to the directory.

Tu2Html

Converts an initial translation unit to HTML.

TODO: For making anchors in the TU HTML that the conditional include graph can link to. If we put an <a name=”...” on every line most browsers can not handle that many. What we could do here is to keep a copy of the conditional include stack and for each token see if it has changed (like the file stack). If so that write a marker that the conditional graph can later link to.

cpip.Tu2Html.processTuToHtml(theLex, theHtmlPath, theTitle, theCondLevel, theIdxPath, incItuAnchors=True)

Processes the PpLexer and writes the tokens to the HTML file.

theHtmlPath
The path to the HTML file to write.
theTitle
A string to go into the <title> element.
theCondLevel
The Conditional level to pass to theLex.ppTokens()
theIdxPath
Path to link back to the index page.
incItuAnchors
boolean, if True will write anchors for lines in the ITU that are in this TU. If True then setItuLineNumbers returned is likely to be non-empty.

Returns a pair of (PpTokenCount.PpTokenCount(), set(int)) The latter is a set of integer line numbers in the ITU that are in the TU, these line numbers with have anchors in this HTML file of the form: <a name=”%d” />.

TuIndexer

Provides a means of linking to a translation unit to HTML.

exception cpip.TuIndexer.ExceptionTuIndexer

Exception when handling PpLexer object.

class cpip.TuIndexer.TuIndexer(tuFileName)

Provides a means of indexing into a TU html file.

add(theTuIndex)

Adds an integer index to the list of markers, returns the href name.

href(theTuIndex, isLB)

Returns an href string for the TuIndex. If isLB is true returns the nearest lower bound, otherwise the nearest upper bound.

cpip.core

CPIP Core contains the core code for pre-processing. The architecture is described here: CPIP Core Architecture.

ConstantExpression

Handles the Python interpretation of a constant-expression. See ISO/IEC 14882:1998(E)

class cpip.core.ConstantExpression.ConstantExpression(theTokTypeS)

Class that interpret a stream of pre-processing tokens (cpip.core.PpToken.PpToken objects) and evaluate it as a constant expression.

evaluate()

Evaluates the constant expression and returns 0 or 1.

translateTokensToString()

Returns a string to be evaluated as a constant-expression.

ISO/IEC ISO/IEC 14882:1998(E) 16.1 Conditional inclusion sub-section 4 i.e. 16.1-4

All remaining identifiers and keywords 137) , except for true and false, are replaced with the pp-number 0

exception cpip.core.ConstantExpression.ExceptionConditionalExpression

Exception when conditional expression e.g. ... ? ... : ... fails to evaluate.

exception cpip.core.ConstantExpression.ExceptionConditionalExpressionInit

Exception when initialising a ConstantExpression class.

exception cpip.core.ConstantExpression.ExceptionConstantExpression

Simple specialisation of an exception class for the ConstantExpression classes.

exception cpip.core.ConstantExpression.ExceptionEvaluateExpression

Exception when conditional expression e.g. 1 < 2 fails to evaluate.

CppCond

Provides a state stack of booleans to facilitate conditional compilation as: ISO/IEC 9899:1999(E) section 6.10.1 (‘C’) and ISO/IEC 14882:1998(E) section 16.1 (‘C++’) [cpp.cond]

This does not interpret any semantics of either standard but instead provides a state class that callers that do interpret the language semantics can use.

In particular this provides state change operations that might be triggered by the following six pre-processing directives:

#if constant-expression new-line group opt
#ifdef identifier new-line group opt
#ifndef identifier new-line group opt
#elif constant-expression new-line group opt
#else new-line group opt
#endif new-line

In this module a single CppCond object has a stack of ConditionalState objects. The latter has both a boolean state and an ‘explanation’ of that state at any point in the translation. The latter is represented by a list of string representations of either constant-expression or identifier tokens.

The stack i.e. CppCond can also be queried for its net boolean state and its net ‘explanation’.

Basic boolean stack operations:

Directive   Argument                Stack, s, boolean operation
---------   --------                -----------------------
#if         constant-expression     s.push(bool)
#ifdef      identifier              s.push(bool)
#ifndef     identifier              s.push(!bool)
#elif       constant-expression     s.pop(), s.push(bool)
#else       N/A                     Either s.push(!s.pop()) or s.flip()
#endif      N/A                     s.pop()

Basic boolean ‘explanation’ string operations:

The '!' prefix is parameterised as TOKEN_NEGATION so that any subsequent processing can recognise '!!' as '' and '!!!' as '!':

Directive   Argument                Matrix, m, strings
---------   --------                ------------------
#if         constant-expression     m.push(['%s' % tokens,])
#ifdef      identifier              m.push(['(defined %s)' % identifier)])
#ifndef     identifier              m.push(['!(defined %s)' % identifier)])
#elif       constant-expression     m[-1].push('!%s' % m[-1].pop()),
                                    m[-1].push(['%s' % tokens,])
                                    Note: Here we flip the existing state via
                                    a push(!pop())) then push the additional
                                    condition so that we have multiple
                                    contitions that are and'd together.
#else       N/A                     m[-1].push('!%s' % m[-1].pop())
                                    Note: This is the negation of the sum of
                                    the previous #if, #elif statements.
#endif      N/A                     m.pop()

Note

The above does not include error checking such as pop() from an empty stack.

Stringifying the matrix m:

flatList = []
for aList in m:
    assert(len(aList) > 0)
    if len(aList) > 1:
        # Add parenthesis so that when flatList is flattened then booleans are
        # correctly protected.
        flatList.append('(%s)' % ' && '.join(aList))
    else:
        flatList.append(aList[0])
return ' && '.join(flatList)

This returns for something like m is: [['a < 0',], ['!b', 'c > 45'], ['d < 27',],]

Then this gives: "a < 0 && (!b && c > 45) && d < 27"

class cpip.core.CppCond.ConditionalState(theState, theIdOrCondExpr)

Holds a single conditional state.

constExprStr(invert=False)

Returns self as a string which is the concatenation of constant-expressions.

flip()

Inverts the boolean such as for #else directive.

flipAndAdd(theBool, theConstExpr)

This handles an #elif command on this item in the stack. This flips the state (if theBool is True) and negates the last expression on the condition list then appends theConstExpr onto the condition list.

hasBeenTrue

Return True if the state has been True at any time in the lifetime of this object.

negateLastState()

Inverts the state of the last item on the stack.

state

Returns boolean state of self.

class cpip.core.CppCond.CppCond

Provides a state stack to handle conditional compilation. This could be used by an implementation of conditional inclusion e.g. ISO/IEC 14882:1998(E) section 16.1 Conditional inclusion [cpp.cond]

Essentially this class provides a state machine that can be created altered and queried. The APIs available to the caller correspond to the if-section part of the the applicable standard (i.e. #if #elif etc). Most APIs take two arguments;

theBool
Is a boolean that is the result of the callers evaluation of a constant-expression.
theIce
A string that represents the identifier or constant-expression in a way that the caller sees fit (i.e. this is not evaluated locally in any way). Combinations of such strings _are_ merged by use of boolean logic ('!') and LPAREN and RPAREN.
close()

Finalisation, may raise ExceptionCppCond is stack non-empty.

hasBeenTrueAtCurrentDepth()

Return True if the ConditionalState at the current depth has ever been True. This is used to decide whether to evaluate #elif expressions. They don’t need to be if the ConditionalState has already been True, and in fact, the C Rationale (6.10) says that bogus #elif expressions should not be evaluated in this case - i.e. ignore syntax errors.

isTrue()

Returns True if all of the states in the stack are True, False otherwise.

oElif(theBool, theConstExpr)

Deal with the result of a #elif.

theBool
Is a boolean that is the result of the callers evaluation of a constant-expression.
theConstExpr
A string that represents the identifier or constant-expression in a way that the caller sees fit (i.e. this is not evaluated locally in any way). Combinations of such strings _are_ merged by use of boolean logic (‘!’) and LPAREN and RPAREN.
oElse()

Deal with the result of a #else.

oEndif()

Deal with the result of a #endif.

oIf(theBool, theConstExpr)

Deal with the result of a #if.

theBool
Is a boolean that is the result of the callers evaluation of a constant-expression.
theConstExpr
A string that represents the identifier or constant-expression in a way that the caller sees fit (i.e. this is not evaluated locally in any way). Combinations of such strings _are_ merged by use of boolean logic (‘!’) and LPAREN and RPAREN.
oIfdef(theBool, theConstExpr)

Deal with the result of a #ifdef.

theBool
Is a boolean that is the result of the callers evaluation of a constant-expression.
theConstExpr
A string that represents the identifier or constant-expression in a way that the caller sees fit (i.e. this is not evaluated locally in any way). Combinations of such strings _are_ merged by use of boolean logic (‘!’) and LPAREN and RPAREN.
oIfndef(theBool, theConstExpr)

Deal with the result of a #ifndef.

theBool
Is a boolean that is the result of the callers evaluation of a constant-expression.
theConstExpr
A string that represents the identifier or constant-expression in a way that the caller sees fit (i.e. this is not evaluated locally in any way). Combinations of such strings _are_ merged by use of boolean logic (‘!’) and LPAREN and RPAREN.
stackDepth

Returns the depth of the conditional stack as an integer.

class cpip.core.CppCond.CppCondGraph

Represents a graph of conditional preprocessing directives.

isComplete

True if the last if-section, if present is completed with an #endif.

oElif(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #elif.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
theCe
The constant expression as a string (not evaluated).
oElse(theFlc, theTuIdx, theBool)

Deal with the result of a #else.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
oEndif(theFlc, theTuIdx, theBool)

Deal with the result of a #endif.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
oIf(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #if.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
theCe
The constant expression as a string (not evaluated).
oIfdef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifdef.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
theCe
The constant expression as a string (not evaluated).
oIfndef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifndef.

theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
theCe
The constant expression as a string (not evaluated).
visit(theVisitor)

Take a visitor object and pass it around giving it each CppCondGraphNode object.

class cpip.core.CppCond.CppCondGraphIfSection(theIfCppD, theFlc, theTuIdx, theBool, theCe)

Class that represents a conditionally compiled section starting with #if... and ending with #endif.

theIfCppD
A string, one of ‘#if’, ‘#ifdef’, ‘#ifndef’.
theFlc
A cpip.core.FileLocation.FileLineColumn object that identifies the position in the file.
theTuIndex
An integer that represents the position in the translation unit.
theBool
The current state of the conditional stack.
theCe
The constant expression as a string (not evaluated).
oElif(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #elif.

oElse(theFlc, theTuIdx, theBool)

Deal with the result of a #else.

oEndif(theFlc, theTuIdx, theBool)

Deal with the result of a #endif.

oIf(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #if.

oIfdef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifdef.

oIfndef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifndef.

visit(theVisitor, theDepth)

Take a visitor object make the pre/post calls.

class cpip.core.CppCond.CppCondGraphNode(theCppDirective, theFileLineCol, theTuIdx, theBool, theConstExpr=None)

Base class for all nodes in the CppCondGraph.

canAccept(theCppD)

True if I can accept a Preprocessing Directive; theCppD.

oElif(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #elif.

oElse(theFlc, theTuIdx, theBool)

Deal with the result of a #else.

oEndif(theFlc, theTuIdx, theBool)

Deal with the result of a #endif.

oIf(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #if.

oIfdef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifdef.

oIfndef(theFlc, theTuIdx, theBool, theCe)

Deal with the result of a #ifndef.

retStrList(theDepth)

Returns a list of string representation.

visit(theVisitor, theDepth)

Take a visitor object make the pre/post calls.

class cpip.core.CppCond.CppCondGraphVisitorBase

Base class for a CppCondGraph visitor object.

visitPost(theCcgNode, theDepth)

Post-traversal call with a CppCondGraphNode and the integer depth in the tree.

visitPre(theCcgNode, theDepth)

Pre-traversal call with a CppCondGraphNode and the integer depth in the tree.

class cpip.core.CppCond.CppCondGraphVisitorConditionalLines

Allows you to find out if any particular line in a file is compiled or not. This is useful to be handed to the ITU to HTML generator that can colourize the HTML depending if any line is compiled or not.

This is a visitor class that walks the graph creating a dict of: {file_id : [(line_num, boolean), ...], ...} It then decomposes those into a map of {file_id : LineConditionalInterpretation(), ...} which can perfom the actual conditional state determination.

API is really isCompiled() and this returns -1 or 0 or 1. 0 means NO. 1 means YES and -1 means sometimes - for re-included files in a different macro environment perhaps.

fileIdS

An unordered list of file IDs.

isCompiled(fileId, lineNum)

Returns 1 if this line is compiled, 0 if not or -1 if it is ambiguous i.e. sometimes it is and somtimes not when multiple inclusions.

visitPre(theCcgNode, theDepth)

Capture the fileID, line number and state.

exception cpip.core.CppCond.ExceptionCppCond

Simple specialisation of an exception class for the CppCond.

exception cpip.core.CppCond.ExceptionCppCondGraph

Simple specialisation of an exception class for the CppCondGraph.

exception cpip.core.CppCond.ExceptionCppCondGraphElif

When the CppCondGraph sees an #elif preprocessing directive in the wrong sequence.

exception cpip.core.CppCond.ExceptionCppCondGraphElse

When the CppCondGraph sees an #endif preprocessing directive in the wrong sequence.

exception cpip.core.CppCond.ExceptionCppCondGraphIfSection

Exception for a CppCondGraphIfSection.

exception cpip.core.CppCond.ExceptionCppCondGraphNode

When the CppCondGraphNode sees an preprocessing directive in the wrong sequence.

class cpip.core.CppCond.LineConditionalInterpretation(theList)

Class that represents the conditional compilation state of every line in a file. This takes a list of [(line_num, boolean), ...] and interprets individual line numbers as to whether they are compiled or not.

If the same file is included twice with a different macro environment then it is entirely possible that line_num is not monotonic. In any case not every line number is present, the state of any unmentioned line is the state of the last mentioned line. Thus a simple dict is not useful.

We have to sort theList by line_num and if there are duplicate line_num with different boolean values then the conditional compilation state at that point is ambiguous.

isCompiled(lineNum)

Returns 1 if this line is compiled, 0 if not or -1 if it is ambiguous i.e. sometimes it is and sometimes not when multiply included.

This requires a search for the previously mentioned line state.

Will raise a ValueError if no prior state can be found, for example if there are no conditional compilation directives in the file. In this case it is up to the caller to handle this. CppCondGraphVisitorConditionalLines does this during visitPre() by artificially inserting line 1. See CppCondGraphVisitorConditionalLines.isCompiled()

cpip.core.CppCond.StateConstExprFileLine

alias of StateConstExprLoc

CppDiagnostic

Describes how a preprocessor class behaves under abnormal conditions.

exception cpip.core.CppDiagnostic.ExceptionCppDiagnostic

Exception class for representing CppDiagnostic.

exception cpip.core.CppDiagnostic.ExceptionCppDiagnosticPartialTokenStream

Exception class for representing partial remaining tokens.

exception cpip.core.CppDiagnostic.ExceptionCppDiagnosticUndefined

Exception class for representing undefined behaviour.

class cpip.core.CppDiagnostic.PreprocessDiagnosticKeepGoing

Sub-class that does not raise exceptions.

partialTokenStream(msg, theLoc=None)

Reports when an partial token stream exists (e.g. an unclosed comment).

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
undefined(msg, theLoc=None)

Reports when an undefined event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
class cpip.core.CppDiagnostic.PreprocessDiagnosticRaiseOnError

Sub-class that raises an exception on a #error directive.

error(msg, theLoc=None)

Reports when an error event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
class cpip.core.CppDiagnostic.PreprocessDiagnosticStd

Describes how a preprocessor class behaves under abnormal conditions.

debug(msg, theLoc=None)

Reports a debug message.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
error(msg, theLoc=None)

Reports when an error event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
eventList

A list of events in the order that they appear. An event is a pair of strings: (type, message)

handleUnclosedComment(msg, theLoc=None)

Reports when an unclosed comment is seen at EOF.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
implementationDefined(msg, theLoc=None)

Reports when an implementation defined event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
isDebug

Whether a call to debug() will result in any log output.

partialTokenStream(msg, theLoc=None)

Reports when an partial token stream exists (e.g. an unclosed comment).

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
undefined(msg, theLoc=None)

Reports when an undefined event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
unspecified(msg, theLoc=None)

Reports when unspecified behaviour is happening. For example order of evaluation of '#' and '##'.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.
warning(msg, theLoc=None)

Reports when an warning event happens.

msg
The main message, a string.
theLoc
The file locator e.g. FileLocation.FileLineCol. If present it must have: (fileId, lineNum colNum) attributes.

FileIncludeGraph

Captures the #include graph of a preprocessed file.

cpip.core.FileIncludeGraph.DUMMY_FILE_LINENUM = -1

In the graph the line number is ignored for dummy roots and this one used instead

cpip.core.FileIncludeGraph.DUMMY_FILE_NAME = None

The file ID for a ‘dummy’ file. This is used as the artificial root node for all the pre-includes and the ITU

exception cpip.core.FileIncludeGraph.ExceptionFileIncludeGraph

Simple specialisation of an exception class for the FileIncludeGraph classes.

exception cpip.core.FileIncludeGraph.ExceptionFileIncludeGraphRoot

Exception for issues for dummy file ID’s.

exception cpip.core.FileIncludeGraph.ExceptionFileIncludeGraphTokenCounter

Exception for issues for token counters.

class cpip.core.FileIncludeGraph.FigVisitorBase

Base class for visitors, see FigVisitorTreeNodeBase for base class for tree visitors.

visitGraph(theFigNode, theDepth, theLine)

Hierarchical visitor pattern. This is given a FileIncludeGraph as a graph node. theDepth is the current depth in the graph as an integer, theLine the line that is a non-monotonic sibling node ordinal.

class cpip.core.FileIncludeGraph.FigVisitorFileSet

Simple visitor that just collects the set of file IDs in the include graph and a count of how often they are seen.

fileNameMap

Dictionary of number of times each file is seen: {file : count, ...}.

fileNameSet

The set of file names seen.

visitGraph(theFigNode, theDepth, theLine)

Hierarchical visitor pattern.

theFigNode
A FileIncludeGraph as a graph node.
theDepth
The current depth in the graph as an integer.
theLine
The line that is a non-monotonic sibling node ordinal.
class cpip.core.FileIncludeGraph.FigVisitorTree(theNodeClass)

This visitor can visit a graph of FileIncludeGraphRoot and FileIncludeGraph that recreates a tree of Node(s) the type of which are supplied by the user. Each node instance will be constructed with either an instance of a FileIncludeGraphRoot or FileIncludeGraph or, in the case of a pseudo root node then None.

depth

Returns the current depth in this graph representation. Changes to this determine if the node is a child, sibling or ancestor.

tree()

Returns the top level node object as the only copy. This also finalises the tree.

visitGraph(theFigNode, depth, line)

Visit the give node.

class cpip.core.FileIncludeGraph.FigVisitorTreeNodeBase(theLineNum)

Base class for nodes created by a tree visitor. See FigVisitorBase for the base class for non-tree visitors.

addChild(theObj)

Add the object as a child.

finalise()

This will be called on finalisation. This is an opportunity for the root (None) not to accumulate properties from its immediate children for example. For depth first finalisation the child class should call finalise on each child first as this function does.

lineNum

The line number of the parent file that included me.

class cpip.core.FileIncludeGraph.FileIncludeGraph(theFile, theState, theCondition, theLogic)

Recursive class that holds a graph of include files and and line numbers of the file that included them.

This class builds up a graph (actually a tree) of file includes. The insertion order is significant in that it is expected to be the order experienced by a translation unit processor. addBranch() is the way to add to the data structure.

theFile - a file ID (e.g. a path)

theState - a boolean conditional compilation state.

theCondition - a conditional compilation condition string e.g. “a >= b+2”.

thelogic - a string explanation of how that the file was found.

If theLogic is taken from an IncludeHandler as a list of items. e.g. [‘<foo.h>, CP=None, sys=None, usr=include/foo.h’] Each string after item[0] is of the form: key=value Where:

key is a key in self.INCLUDE_ORIGIN_CODES = is the ‘=’ character. value is the result, or ‘None’ if not found.

[0] is the invocation [-1] is the final resolution.

The intermediate ones are various tries in order. So [‘<foo.h>’, ‘CP=None’, ‘sys=None’, ‘usr=include/foo.h’] would mean:

  1. ‘<foo.h>’ the include directive was: #include <foo.h>
  2. ‘CP=None’ the Current place was searched and nothing found.
  3. ‘sys=None’ the system include(s) were searched and nothing found.
  4. ‘usr=include/foo.h’ the user include(s) were searched and include/foo.h was found.

This class does not distinguish between conditional compilation states that are True or False. Nor does this class evaluate theCondition in any way, it is merely stored for representation.

acceptVisitor(visitor, depth, line)

Hierarchical visitor pattern. This accepts a visitor object and calls visitor.visitGraph(self, depth, line) on that object where depth is the current depth in the graph as an integer and line the line that is a non-monotonic sibling node ordinal.

addBranch(theFileS, theLine, theIncFile, theState, theCondition, theLogic)

Adds a branch to the graph.

theFileS is a list of files that form the branch.

theLine is an integer value of the line number of the #include statement of the last named file in theFileS.

theIncFile is the file that is included.

theState is a boolean that describes the conditional compilation state.

theCondition is the conditional compilation test e.g. ‘1>0’

theLogic is a string representing how the branch was obtained.

May raise ExceptionFileIncludeGraph if:

  1. The branch is zero length.
  2. The branch does not match the existing graph (this function just immediately checks the first item on the branch but the others are done recursively).
  3. theLine is a duplicate of an existing line.
  4. The branch has missing nodes.
condComp

Returns the condition, as a string, under which this file was included e.g. "(a > b) && (1 > 0)".

condCompState

Returns the recorded conditional compilation state as a boolean.

dumpGraph(theS=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, theI='')

Writes out the graph to a stream.

fileName

Returns the current file name.

findLogic

Returns the findLogic string passed in in the constructor.

genChildNodes()

Yields each child node as a FileIncludeGraph object.

numTokens

The total number of tokens seen by the PpLexer. Returns None if not initialised. Note: This is the number of tokens for this file only, it does not include the tokens that this file might include.

numTokensIncChildren

The total number of tokens seen by the PpLexer including tokens from files included by this one. Returns None if not initialised.

May raise ExceptionFileIncludeGraphTokenCounter is the token counters have been loaded inconsistently (i.e. the children have not been loaded).

numTokensSig

The number of significant tokens seen by the PpLexer. A significant token is a non-whitespace, non-conditionally compiled token. Returns None if not initialised.

Note

This is the number of tokens for this file only, it does not include the tokens that this file might include.

numTokensSigIncChildren

The number of significant tokens seen by the PpLexer including tokens from files included by this one. A significant token is a non-whitespace, non-conditionally compiled token. Returns None if not initialised.

May raise ExceptionFileIncludeGraphTokenCounter is the token counters have been loaded inconsistently (i.e. the children have not been loaded).

retBranches()

Returns a list of lists of the branches with ‘#’ then the line number.

retLatestBranch()

Returns the branch to the last inserted leaf as a list of branch strings.

retLatestBranchDepth()

Walks the graph and returns an integer that is the depth of the latest branch.

retLatestBranchPairs()

Returns the branch to the last inserted leaf as a list of pairs (filename, integer_line).

retLatestLeaf()

Returns the last inserted leaf, a FileIncludeGraph object.

retLatestNode(theBranch)

Returns the last inserted node, a FileIncludeGraph object on the supplied branch.

This is generally used during dynamic construction by a caller that understands the state of the file include branch.

setTokenCounter(theTokCounter)

Sets the token counter for this node which is a PpTokenCount object. The PpLexer sets this as the token count for this file only. This files #includes are a separate token counter.

tokenCounter

Gets the token counter for this node, a PpTokenCount object.

class cpip.core.FileIncludeGraph.FileIncludeGraphRoot

Root class of the file include graph. This is used when there is a virtual or dummy root. It contains a list of FileIncludeGraph objects. In this way it can represent the list of graphs that would result from a list of pre-includes followed by the ITU itself.

In practice this is used by the PpLexer for this purpose where the dummy root is represented by None.

acceptVisitor(visitor)

Hierarchical visitor pattern. This accepts a visitor object and calls visitor.visitGraph(self, depth, line) on that object where depth is the current depth in the graph as an integer and line the line that is a non-monotonic sibling node ordinal.

addGraph(theGraph)

Add a FileIncludeGraph object.

dumpGraph(theS=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)

Dump the node for debug/trace.

graph

The latest FileIncludeGraph object I have. Will raise a ExceptionFileIncludeGraphRoot if nothing there.

numTrees()

Returns the number of FileIncludeGraph objects.

FileIncludeStack

This module represents a stack of file includes as used by the PpLexer.PpLexer

exception cpip.core.FileIncludeStack.ExceptionFileIncludeStack

Exception for FileIncludeStack object.

class cpip.core.FileIncludeStack.FileInclude(theFpo, theDiag)

Represents a single TU fragment with a PpTokeniser and a token counter.

theFpo
A FilePathOrigin object that identifies the file.
theDiag
A CppDiagnostic object to give to the PpTokeniser.
tokenCountInc(tok, isUnCond, num=1)

Increment the token counter.

tokenCounterAdd(theC)

Add a token counter to my token counter (used when a macro is declared).

class cpip.core.FileIncludeStack.FileIncludeStack(theDiagnostic)

This maintains information about the stack of file includes. This holds several stacks (or representations of them):

self._ppts
A stack of PpTokeniser.PpTokeniser objects.
self._figr
A FileIncludeGraph.FileIncludeGraphRoot for tracking the #include graph.
self._fns
A stack of file IDs as strings (e.g. the file path).
self._tcs
A PpTokenCount.PpTokenCountStack object for counting tokens.
currentFile

Returns the file ID from the top of the stack.

depth

Returns the current include depth as an integer.

fileIncludeGraphRoot

The FileIncludeGraph.FileIncludeGraphRoot object.

fileLineCol

Return an instance of FileLineCol from the current physical line column.

fileStack

Returns a copy of the stack of file IDs.

finalise()

Finalisation, may raise an ExceptionFileIncludeStack.

includeFinish()

End an #include file, returns the file ID that has been finished.

includeStart(theFpo, theLineNum, isUncond, condStr, incLogic)

Start an #include file.

theFpo
A FileLocation.FilePathOrigin object that identifies the file.
theLineNum
The integer line number of the file that includes (None if Root).
isUncond
A boolean that is the conditional compilation state.
condStr
A string of the conditional compilation stack.
incLogic
A string that describes the find include logic.
ppt

Returns the PpTokeniser from the top of the stack.

tokenCountInc(tok, isUnCond, num=1)

Increment the token counter.

tokenCounter()

Returns the Token Counter object at the tip of the stack.

tokenCounterAdd(theC)

Add a token counter to my token counter (used when a macro is declared).

IncludeHandler

Provides handlers for #including files.

class cpip.core.IncludeHandler.CppIncludeStd(theUsrDirs, theSysDirs)

Class that applies search rules for #include statements.

Search tactics based on RVCT and Berkeley UNIX search rules:

I is the usr includes.
J is the sys includes.
Size of I Size of J   #include <...>      #include "..."
0         0           None                CP

0         >0          SYSTEMINCLUDEdirs   CP, SYSTEMINCLUDEdirs

>0        0           USERINCLUDEdirs     CP, USERINCLUDEdirs

>0        >0          SYSTEMINCLUDEdirs,  CP, USERINCLUDEdirs,
                      USERINCLUDEdirs      SYSTEMINCLUDEdirs

ISO/IEC 9899:1999 (E) 6.10.2-3 means that a failure of q-char must be retried as if it was a h-char. i.e. A failure of a q-char-sequence thus: #include "..."

Is to be retried as if it was written as a h-char-sequence thus: #include <...>

See: _includeQcharseq()

INCLUDE_ORIGIN_CODES = {'usr': 'User include directories', 'comp': 'Compiler specific directories', 'sys': 'System include directories', 'TU': 'Translation unit', None: 'Not found', 'CP': 'Current Place'}

Codes for the results of a search for an include

canInclude()

Returns True if the last include succeeded.

clearFindLogic()

Clears the list of find results for a single #include statement.

clearHistory()

Clears the CP stack. This needed if you use this class as a persistent one and it encounters an exception. You need to call this function before you can reuse it.

cpStack

Returns the current stack of current places.

cpStackPop()

Pops and returns the CP string off the current place stack. This is public so that the PpLexer can use it when processing pre-include files that might themselves include other files.

cpStackPush(theFpo)

Appends the CP from the FilePathOrigin to the current place stack. This is public so that the PpLexer can use it when processing pre-include files that might themselves include other files.

cpStackSize

Returns the size of the current stack of current places.

currentPlace

Returns the last current place or None if #include failed.

endInclude()

Notify end of #include’d file. This pops the CP stack.

finalise()

Finalise at the end of the translation unit. Might raise a ExceptionCppInclude.

findLogic

Returns a list of strings that describe _how_ the file was found For example:

['<foo.h>', 'CP=None', 'sys=None', 'usr=include/foo.h']

Each string after [0] is of the form: key=value Where:

  1. key is a key in self.INCLUDE_ORIGIN_CODES
  2. = is the '=' character.
  3. value is the result, or ‘None’ if not found.
  4. Item [0] is the invocation
  5. Item [-1] is the final resolution.

The intermediate ones are various tries in order. So:

['<foo.h>', 'CP=None', 'sys=None', 'usr=include/foo.h']

Wwould mean:

  • [0]: '<foo.h>' the include directive was: #include <foo.h>
  • [1]: 'CP=None' the Current place was searched and nothing found.
  • [2]: 'sys=None' the system include(s) were searched and nothing found.
  • [3]: 'usr=include/foo.h' the user include(s) were searched and include/foo.h was found.
includeHeaderName(theStr)

Return the file location of a #include header-name where the header-name is a pp-token either a <h-char-sequence> or a “q-char-sequence” (including delimiters). If not None return value this also records the CP for the file.

includeNextHeaderName(theStr)

Return the file location of a #include_next header-name where the header-name is a pp-token either a <h-char-sequence> or a “q-char-sequence” (including delimiters).

This is a GCC extension, see: https://gcc.gnu.org/onlinedocs/cpp/Wrapper-Headers.html

This never records the CP for the found file (if any).

initialTu(theTuIdentifier)

Given an Translation Unit Identifier this should return a class FilePathOrigin or None for the initial translation unit. As a precaution this should include code to check that the stack of current places is empty. For example:

if len(self._cpStack) != 0:
    raise ExceptionCppInclude('setTu() with CP stack: %s' % self._cpStack)
validateCpStack()

Tests the coherence of the CP stack. A None can not be followed by a non-None.

class cpip.core.IncludeHandler.CppIncludeStdOs(theUsrDirs, theSysDirs)

This implements _searchFile() based on an OS file system call.

initialTu(theTuPath)

Given an path as a string this returns the class FilePathOrigin or None for the initial translation unit

class cpip.core.IncludeHandler.CppIncludeStdin(theUsrDirs, theSysDirs)

This reads stdin for the ITU but delegates _searchFile() to the OS file system call.

initialTu(theTuPath)

Given an path as a string this returns the class FilePathOrigin or None for the initial translation unit

class cpip.core.IncludeHandler.CppIncludeStringIO(theUsrDirs, theSysDirs, theInitialTuContent, theFilePathToContent)

This implements _searchFile() based on a lookup of stings that returns StringIO file-like object.

initialTu(theTuIdentifier)

Given an path as a string this returns the class FilePathOrigin or None for the initial translation unit

exception cpip.core.IncludeHandler.ExceptionCppInclude

Simple specialisation of an exception class for the CppInclude.

class cpip.core.IncludeHandler.FilePathOrigin(fileObj, filePath, currentPlace, origin)

FilePathOrigin is a class used externally to collect:

  • An open file object that can be read by the caller.
  • The file path of that object, wherever found.
  • The ‘current place’ of that file, wherever found. This will affect subsequent calls.
  • The origin code, i.e. how it was found.

Any or all or these attributes may be None as the methods _searchFile(), _includeQcharseq() and _includeHcharseq() return such an object (or None).

currentPlace

Alias for field number 2

fileObj

Alias for field number 0

filePath

Alias for field number 1

origin

Alias for field number 3

ItuToTokens

Converts an ITU (i.e. a file like object and tokenises it into extended preprocessor tokens. This does not act on any preprocessing directives.

class cpip.core.ItuToTokens.ItuToTokens(theFileObj=None, theFileId=None, theDiagnostic=None)

Tokensises a file like object.

genTokensKeywordPpDirective()

Process the file and generate tokens. This changes the type to a keyword or preprocessing-directive if it can do so.

MacroEnv

This an environment of macro declarations

It implements ISO/IEC 9899:1999(E) section 6 (aka ‘C’) and ISO/IEC 14882:1998(E) section 16 (aka ‘C++’)

exception cpip.core.MacroEnv.ExceptionMacroEnv

Exception when handling MacroEnv object.

exception cpip.core.MacroEnv.ExceptionMacroEnvInvalidRedefinition

Exception for a invalid redefinition of a macro. NOTE: Under C rules (C Rationale 6.10.3) callers should merely issue a suitable diagnostic.

exception cpip.core.MacroEnv.ExceptionMacroEnvNoMacroDefined

Exception when trying to access a PpDefine that is not currently defined.

exception cpip.core.MacroEnv.ExceptionMacroIndexError

Exception when an access to a PpDefine that generates a IndexError.

exception cpip.core.MacroEnv.ExceptionMacroReplacementInit

Exception in the constructor.

exception cpip.core.MacroEnv.ExceptionMacroReplacementPredefinedRedefintion

Exception for a redefinition of a macro id that is predefined.

class cpip.core.MacroEnv.MacroEnv(enableTrace=False, stdPredefMacros=None)

Represents a set of #define directives that represent a macro processing environment. This provides support for #define and #undef directives. It also provides support for macro replacement see: ISO/IEC 9899:1999 (E) 6.10.3 Macro replacement.

enableTrace
Allows calls to _debugTokenStream() that may or may not produce log output (depending on logging level). If True this makes this code run slower, typically 3x slower
stdPredefMacros

If present should be a dictionary of: {identifier : replacement_string_\n_terminated, ...} For example:

{
    '__DATE__' : 'First of June\n',
    '__TIME__' : 'Just before lunchtime.\n',
}

Each identifier must be in STD_PREDEFINED_NAMES

allStaticMacroDependencies()

Returns a DuplexAdjacencyList() of macro dependencies for the Macro environment. All objects in the cpip.util.Tree.DuplexAdjacencyList are macro identifiers as strings.

A cpip.util.Tree.DuplexAdjacencyList can be converted to a cpip.util.Tree.Tree and that can be converted to a cpip.util.DictTree.DictTree

clear()

Clears the macro environment.

define(theGen, theFile, theLine)

Defines a macro. theGen should be in the state immediately after the #define i.e. this will consume leading whitespace and the trailing newline.

Will raise a ExceptionMacroEnvInvalidRedefinition if the redefinition is not valid. May raise a PpDefine.ExceptionCpipDefineInit (or sub class) on failure.

On success it returns the identifier of the macro as a string.. The insertion is stable i.e. a valid re-definition does not replace the existing definition so that the existing state of the macro definition (file, line, reference count etc. are preserved.

defined(theTtt, flagInvert, theFileLineCol=None)

If the PpToken theTtt is an identifier that is currently defined then this returns 1 as a PpToken, 0 as a PpToken otherwise. If the macro exists in the environment its reference count is incremented.

theFileLineCol
Is a FileLocation.FileLineCol object.

See: ISO/IEC 9899:1999 (E) 6.10.1.

genMacros(theIdentifier=None)

Generates PpDefine objects encountered during my existence. Macros that have been undefined will be generated first in order of un-definition followed by the currently defined macros in identifier order.

Macros that have been #undef’d will have the attribute isCurrentlyDefined as False.

genMacrosInScope(theIdent=None)

Generates PpDefine objects encountered during my existence and still in scope i.e. not yet un-defined.

If theIdent is not None then only that named macros will be yielded.

genMacrosOutOfScope(theIdent=None)

Generates PpDefine objects encountered during my existence but then undefined in the order of un-definition.

If theIdent is not None then only that named macros will be yielded.

getUndefMacro(theIdx)

Returns the PpDefine object from the undef list for the given index. Will raise an ExceptionMacroIndexError if the index is out of range.

hasMacro(theIdentifier)

Returns True if the environment has the macro.

NOTE: This does _not_ increment the reference count so should not be used when processing #ifdef ..., #if defined ... or #if !defined ... for those use isDefined() and defined() instead.

isDefined(theTtt, theFileLineCol=None)

Returns True theTtt is an identifier that is currently defined, False otherwise. If True this increments the macro reference.

theFileLineCol
Is a FileLocation.FileLineCol object.

See: ISO/IEC 9899:1999 (E) 6.10.1.

macro(theIdentifier)

Returns the macro identified by the identifier. Will raise a ExceptionMacroEnvNoMacroDefined is undefined.

macroHistory(incEnv=True, onlyRef=True)

Returns the macro history as a multi-line string

macroHistoryMap()

Returns a map of {ident : ([ints, ...], True/False), ...} Where the macro identifier is mapped to a pair where: pair[0] is a list of indexes into getUndefMacro(). pair[1] is boolean, True if the identifier is currently defined i.e. it is the value ofself.hasMacro(ident). The macro can be obtained by self.macro().

macroNotDefinedDependencies()

Returns a map of {identifier : [class FileLineColumn, ...], ...} where there has been an #ifdef and nothing is defined. Thus these macros, if present, could alter the outcome i.e. it is dependency on them NOT being defined.

macroNotDefinedDependencyNames()

Returns an unsorted list of identifies where there has been an #ifdef and nothing is defined. Thus these macros, if present, could alter the outcome i.e. it is dependency on them NOT being defined.

macroNotDefinedDependencyReferences(theIdentifier)

Returns an ordered list of class FileLineColumn for an identifier where there has been an #ifdef and nothing is defined. Thus these macros, if present, could alter the outcome i.e. it is dependency on them NOT being defined.

macros()

Returns and unsorted list of strings of current macro identifiers.

mightReplace(theTtt)

Returns True if theTok might be able to be expanded. ‘Might’ is not ‘can’ or ‘will’ because of this:

#define FUNC(a,b) a-b
FUNC FUNC(45,3)

Becomes:

FUNC 45 -3

Thus mightReplace('FUNC', ...) is True in both cases but actual replacement only occurs once for the second FUNC.

referencedMacroIdentifiers(sortedByRefcount=False)

Returns an unsorted list of macro identifiers that have a reference count > 0. If sortedByRefcount is True the list will be in increasing order of reference count then by name. Use reverse() on the result to get decreasing order. If sortedByRefcount is False the return value is unsorted.

replace(theTtt, theGen, theFileLineCol=None)

Given a PpToken this returns the replacement as a list of [class PpToken, ...] that is the result of the substitution of macro definitions.

theGen
Is a generator that might be used in the case of function-like macros to consume their argument lists.
theFileLineCol
Is a FileLocation.FileLineCol object.
set__FILE__(theStr)

This sets the __FILE__ macro directly.

set__LINE__(theStr)

This sets the __LINE__ macro directly.

undef(theGen, theFile, theLine)

Removes a definition from the map and adds the PpDefine to self._undefS. It returns None. If no definition exists this has no side-effects on the internal representation.

PpDefine

This handles definition, undefinition, redefintion, replacement and rescaning of macro declarations

It implements: ISO/IEC 9899:1999(E) section 6 (aka ‘C99’) and/or: ISO/IEC 14882:1998(E) section 16 (aka ‘C++98’)

exception cpip.core.PpDefine.ExceptionCpipDefine

Exception when handling PpDefine object.

exception cpip.core.PpDefine.ExceptionCpipDefineBadArguments

Exception when scanning an argument list for a function style macro fails. NOTE: This is only raised during replacement not during initialisation.

exception cpip.core.PpDefine.ExceptionCpipDefineBadWs

Exception when calling bad whitespace is in a define statement. See: ISO/IEC 9899:1999(E) Section 6.10-f and ISO/IEC 14882:1998(E) 16-2

exception cpip.core.PpDefine.ExceptionCpipDefineDupeId

Exception for a function-like macro has duplicates in the identifier-list.

exception cpip.core.PpDefine.ExceptionCpipDefineInit

Exception when creating PpDefine object fails.

exception cpip.core.PpDefine.ExceptionCpipDefineInitBadLine

Exception for a bad line number given as argument.

exception cpip.core.PpDefine.ExceptionCpipDefineInvalidCmp

Exception for a redefinition where the identifers are different.

exception cpip.core.PpDefine.ExceptionCpipDefineMissingWs

Exception when calling missing ws between identifier and replacement tokens.

See: ISO/IEC 9899:1999(E) Section 6.10.3-3 and ISO/IEC 14882:1998(E) Section ???

Note

The executable, cpp, says for #define PLUS+

src.h:1:13: warning: ISO C requires whitespace after the macro name
exception cpip.core.PpDefine.ExceptionCpipDefineReplace

Exception when replacing a macro definition fails.

class cpip.core.PpDefine.PpDefine(theTokGen, theFileId, theLine)

Represents a single #define directive and performs ISO/IECISO/IEC 9899:1999 (E) 6.10.3 Macro replacement.

theTokGen
A PpToken generator that is expected to generate pp-tokens that appear after the start of the #define directive from the first non-whitespace token onwards i.e. the __init__ will, itself, consume leading whitespace.
theFileId
A string that represents the file ID.
theLine
A positive integer that represents the line in theFile that the #define statement occurred.

Definition example, object-like macros:

[identifier, [replacement-list (opt)], new-line, ...]

Or function-like macros:

[
    identifier,
    lparen,
    [identifier-list(opt),],
    ')',
    replacement-list,
    new-line,
    ...
]

Note

No whitespace is allowed between the identifier and the lparen of function-like macros.

The identifier-list of parameters is stored as a list of names. The replacement-list is stored as a list of preprocessor tokens. Leading and trailing whitespace in the replacement list is removed to facilitate redefinition comparison.

CPP_CONCAT_OP = '##'

C standard definition of concatenation operator

CPP_STRINGIZE_OP = '#'

C standard definition of string’izing operator

IDENTIFIER_SEPERATOR = ','

C standard definition of identifier separator in function-like macros

INITIAL_REF_COUNT = 0

This is what the reference count is set to on construction

LPAREN = '('

C standard definition of left parenthesis

PLACEMARKER = None

Our representation of a placemarker token

RPAREN = ')'

C standard definition of right parenthesis

STRINGIZE_WHITESPACE_CHAR = ' '

Whitespace runs are replaced by a single space ISO/IEC 9899:1999 (E) 6.10.3.2-2

VARIABLE_ARGUMENT_IDENTIFIER = '...'

Variable argument (variadic) macro definitions

VARIABLE_ARGUMENT_SUBSTITUTE = '__VA_ARGS__'

Variable argument (variadic) macro substitution

assertReplListIntegrity()

Tests that any identifier tokens in the replacement list are actually replaceable. This will raise an assertion failure if not. It is really an integrity tests to see if an external entity has grabbed a reference to the replacement list and set a token to be not replaceable.

consumeFunctionPreamble(theGen)

This consumes tokens to the preamble of a Function style macro invocation. This really means consuming whitespace and the opening LPAREN.

This will return either:

  • None - Tokens including the leading LPAREN have been consumed.
  • List of (token, token_type) if the LPAREN is not found.

For example given this:

#define t(a) a+2
t   (21) - t  ;

For the first t this would consume '   (' and return None leaving the next token to be (‘21’, ‘pp-number’).

For the second t this would consume '  ;' and return:

[
    ('  ', 'whitespace'),
    (';',   'preprocessing-op-or-punc'),
]

This allows the MacroReplacementEnv to generate the correct result:

21 +2 - t ;
expandArguments

The flag that says whether arguments should be expanded. For object like macros this will be False. For function like macros this will be False if there is a stringize (#) or a token pasting operator (##). True otherwise.

fileId

The file ID given as an argument in the constructor.

identifier

The macro identifier i.e. the name as a string.

incRefCount(theFileLineCol=None)

Increment the reference count. Typically callers do this when replacement is certain of in the event of definition testing

theFileLineCol
A FileLocation.FileLineCol object.

For example:

#ifdef SPAM or defined(SPAM) etc.

Or if the macro is expanded e.g. #define SPAM_N_EGGS spam and eggs

The menu is SPAM_N_EGGS.

isCurrentlyDefined

Returns True if the current instance is a valid definition i.e. it has not been #undef’d.

isObjectTypeMacro

True if this is an object type macro and False if it is a function type macro.

isReferenced

Returns True if the reference count has been incremented since construction.

isSame(other)

Tests ‘sameness’. Returns: -1 if the identifiers are different. 1 if the identifiers are the same but redefinition is NOT allowed. 0 if the identifiers are the same but redefinition is allowed i.e. the macros are equivelent.

isValidRefefinition(other)

Returns True if this is a valid redefinition of other, False otherwise. Will raise an ExceptionCpipDefineInvalidCmp if the identifiers are different. Will raise an ExceptionCpipDefine if either is not currently defined.

From: ISO/IEC 9899:1999 (E) 6.10.3:

  1. Two replacement lists are identical if and only if the preprocessing
    tokens in both have the same number, ordering, spelling, and white-space separation, where all white-space separations are considered identical.
  2. An identifier currently defined as a macro without use of lparen
    (an object-like macro) may be redefined by another #define preprocessing directive provided that the second definition is an object-like macro definition and the two replacement lists are identical, otherwise the program is ill-formed.
  3. An identifier currently defined as a macro using lparen (a
    function-like macro) may be redefined by another #define preprocessing directive provided that the second definition is a function-like macro definition that has the same number and spelling of parameters, and the two replacement lists are identical, otherwise the program is ill-formed.

See also: ISO/IEC 14882:1998(E) 16.3 Macro replacement [cpp.replace]

line

The line number given as an argument in the constructor.

parameters

The list of parameter names as strings for a function like macros or None if this is an object type Macro.

refCount

Returns the current reference count as an integer less its initial value on construction.

refFileLineColS

Returns the list of FileLineCol objects where this macro was referenced.

replaceArgumentList(theArgList)

Given an list of arguments this does argument substitution and returns the replacement token list. The argument list is of the form given by retArgumentListTokens(). The caller must have replaced any macro invocations in theArgList before calling this method.

Note

For function style macros only.

replaceObjectStyleMacro()

Returns a list of [(token, token_type), ...] from the replacement of an object style macro.

replacementTokens

The list of zero or more replacement token as a list of PpToken.PpToken

replacements

The list of zero or more replacement tokens as strings.

retArgumentListTokens(theGen)

For a function macro this reads the tokens following a LPAREN and returns a list of arguments where each argument is a list of PpToken objects.

Thus this function returns a list of lists of PpToken.PpToken objects, for example given this:

#define f(x,y) ...
f(a,b)

This function, then passed a,b) returns:

[
    [
        PpToken.PpToken('a', 'identifier'),
    ],
    [
        PpToken.PpToken('b', 'identifier'),
    ],
]

And an invocation of: f(1(,)2,3) i.e. this gets passed via the generator "1(,)2,3)" and returns two argunments:

[
    [
        PpToken('1', 'pp-number'),
        PpToken('(', 'preprocessing-op-or-punc'),
        PpToken(',', 'preprocessing-op-or-punc'),
        PpToken(')', 'preprocessing-op-or-punc'),
        PpToken('2', 'pp-number'),
    ],
    [
        PpToken('3', 'pp-number'),
    ],
]

So this function supports two cases:

  1. Parsing function style macro declarations.
  2. Interpreting function style macro invocations where the argument list is subject to replacement before invoking the macro.

In the case that an argument is missing a PpDefine.PLACEMARKER token is inserted. For example:

#define FUNCTION_STYLE(a,b,c) ...
FUNCTION_STYLE(,2,3)

Gives:

[
    PpDefine.PLACEMARKER,
    [
        PpToken.PpToken('2',       'pp-number'),
    ],
    [
        PpToken.PpToken('3',       'pp-number'),
    ],
]

Placemarker tokens are not used if the macro is defined with no arguments. This might raise a ExceptionCpipDefineBadArguments if the list does not match the prototype or a StopIteration if the token list is too short. This ignores leading and trailing whitespace for each argument.

TODO: Raise an ExceptionCpipDefineBadArguments if there is a #define statement. e.g.:

#define f(x) x x
f (1
#undef f
#define f 2
f)
strIdentPlusParam()

Returns the identifier name and parameters if a function-like macro as a string.

strReplacements()

Returns the replacements tokens with minimised whitespace as a string.

tokenCounter

The PpTokenCount object that counts tokens that have been consumed from the input.

tokensConsumed

The total number of tokens consumed by the class.

undef(theFileId, theLineNum)

Records this instance of a macro #undef‘d at a particular file and line number. May raise an ExceptionCpipDefine if already undefined or the line number is bad.

undefFileId

The file ID where this macro was undef’d or None.

undefLine

The line number where this macro was undef’d or None.

PpLexer

Generates tokens from a C or C++ translation unit.

TODO: Fix accidental token pasting. See: TestFromCppInternalsTokenspacing and, connected is: TODO: Set setPrevWs flag on the token where necessary.

TODO: Preprocessor statements in arguments of function like macros. Sect. 3.9 of cpp.pdf and existing MacroEnv tests.

exception cpip.core.PpLexer.ExceptionConditionalExpression

Exception when eval() conditional expressions.

exception cpip.core.PpLexer.ExceptionPpLexer

Exception when handling PpLexer object.

exception cpip.core.PpLexer.ExceptionPpLexerAlreadyGenerating

Exception when two generators are created then the internal state will become inconsistent.

exception cpip.core.PpLexer.ExceptionPpLexerCallStack

Exception when finding issues with the call stack or nested includes.

exception cpip.core.PpLexer.ExceptionPpLexerCallStackTooSmall

Exception when sys.getrecursionlimit() is too small.

exception cpip.core.PpLexer.ExceptionPpLexerCondLevelOutOfRange

Exception when handling a conditional token generation level.

exception cpip.core.PpLexer.ExceptionPpLexerDefine

Exception when loading predefined macro definitions.

exception cpip.core.PpLexer.ExceptionPpLexerNestedInclueLimit

Exception when nested #include limit exceeded.

exception cpip.core.PpLexer.ExceptionPpLexerNoFile

Exception when can not find file.

exception cpip.core.PpLexer.ExceptionPpLexerPreInclude

Exception when loading pre-include files.

exception cpip.core.PpLexer.ExceptionPpLexerPreIncludeIncNoCp

Exception when loading a pre-include file that has no current place (e.g. a StringIO object) and the pre-include then has an #include statement.

exception cpip.core.PpLexer.ExceptionPpLexerPredefine

Exception when loading predefined macro definitions.

cpip.core.PpLexer.PREPROCESSING_DIRECTIVES = ['if', 'ifdef', 'ifndef', 'elif', 'else', 'endif', 'include', 'define', 'undef', 'line', 'error', 'pragma']

Allowable preprocessing directives

class cpip.core.PpLexer.PpLexer(tuFileId, includeHandler, preIncFiles=None, diagnostic=None, pragmaHandler=None, stdPredefMacros=None, autoDefineDateTime=True, gccExtensions=False, annotateLineFile=False)

Create a translation unit tokeniser that applies ISO/IEC 9899:1999(E) Section 6 and/or ISO/IEC 14882:1998(E) section 16.

tuFileId
A file ID that will be given to the include handler to find the translation unit. Typically this will be the file path (as a string) to the file that is the Initial Translation Unit (ITU) i.e. the file being preprocessed.
includeHandler
A handler to file #includ‘d files typically a IncludeHandler.IncludeHandlerStd. This might have user and system include path information and a means of resolving file references.
preIncFiles
An ordered list of file like objects that are pre-include files. These are processed in order before the ITU is processed. Macro redefinition rules apply.
diagnostic
A diagnostic object, defaults to a CppDiagnostic.PreprocessDiagnosticStd.
pragmaHandler

A handler for #pragma statements.

This must have the attribute replaceTokens is to be implemented, if True then the tokens stream will be be macro replaced before being passed to the pragma handler.

This must have a function pragma() defined that takes a non-zero length list of PpToken.PpToken the last of which will be a newline token. The tokens returned will be yielded.

stdPredefMacros

A dictionary of Standard pre-defined macros. See for example: ISO/IEC 9899:1999 (E) 6.10.8 Predefined macro names ISO/IEC 14882:1998 (E) 16.8 Predefined macro names N2800=08-0310 16.8 Predefined macro names

The macros __DATE__ and __TIME__ will be automatically updated to current locale date/time (see autoDefineDateTime).

autoDefineDateTime
If True then the macros __DATE__ and __TIME__ will be automatically updated to current locale date/time. Mostly this is used for testing.
gccExtensions
Support GCC extensions. Currently just #include_next is supported.
annotateLineFile - if True then PpToken will output line number and file as cpp.

For example:

# 22 "/usr/include/stdio.h" 3 4
# 59 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4

TODO: Set flags here rather than supplying them to a generator? This would make the API simply the ctor and ppTokens/next(). Flags would be: incWs - Include whitespace tokens. condLevel - (0, 1, 2) thus:

0: No conditionally compiled tokens. The fileIncludeGraphRoot will
not have any information about conditionally included files.
1: Conditionally compiled tokens are generated but not from
conditionally included files. The fileIncludeGraphRoot will have a reference to a conditionally included file but not that included file’s includes.
2: Conditionally compiled tokens including tokens from conditionally
included files. The fileIncludeGraphRoot will have all the information about conditionally included files recursively.
CALL_STACK_DEPTH_ASSUMED_PPTOKENS = 10

Each include The call stack depth, D = A + B + C*L Where L is the number of levels of nested includes and A is the call stack A above:

CALL_STACK_DEPTH_FIRST_INCLUDE = 3

B above:

CALL_STACK_DEPTH_PER_INCLUDE = 3

C above:

COND_LEVEL_DEFAULT = 0

Conditianlity settings for token generation

COND_LEVEL_OPTIONS = range(0, 3)

Conditionality level (0, 1, 2)

MAX_INCLUDE_DEPTH = 200

The maximum value of nested #include’s

colNum

Returns the current column number as an integer during processing.

condCompGraph

The conditional compilation graph as a CppCond.CppCondGraph object.

condState

The conditional state as (boolean, string).

currentFile

Returns the file ID on the top of the file stack.

definedMacros

Returns a string representing the currently defined macros.

fileIncludeGraphRoot

Returns the FileIncludeGraph.FileIncludeGraphRoot object.

fileLineCol

Returns a FileLineCol object or None

fileName

Returns the current file name during processing.

fileStack

Returns the file stack.

finalise()

Finalisation, may raise any Exception.

includeDepth

Returns the integer depth of the include stack.

lineNum

Returns the current line number as an integer during processing or None.

macroEnvironment

The current Macro environment as a MacroEnv.MacroEnv object.

Caution

Write to this at your own risk. Your write might be ignored or cause undefined behaviour.

ppTokens(incWs=True, minWs=False, condLevel=0)

A generator for providing a sequence of PpToken.PpToken in accordance with section 16 of ISO/IEC 14882:1998(E).

incWs - if True than whitespace tokens are included (i.e. tok.isWs() == True).

minWs - if True then whitespace runs will be minimised to a single space or, if newline is in the whitespace run, a single newline

condLevel - if !=0 then conditionally compiled tokens will be yielded and they will have have tok.isCond == True. The fileIncludeGraphRoot will be marked up with the appropriate conditionality. Levels are:

0: No conditionally compiled tokens. The fileIncludeGraphRoot will
not have any information about conditionally included files.

1: Conditionally compiled tokens are generated but not from 
conditionally included files. The fileIncludeGraphRoot will have
a reference to a conditionally included file but not that
included file's includes.

2: Conditionally compiled tokens including tokens from conditionally
included files. The fileIncludeGraphRoot will have all the
information about conditionally included files recursively.

(see _cppInclude where we check if self._condStack.isTrue():).

tuFileId

Returns the user supplied ID of the translation unit.

cpip.core.PpLexer.UNNAMED_FILE_NAME = 'Unnamed Pre-include'

Used when file objects have no name

PpToken

Represents a preprocessing Token in C/C++ source code.

cpip.core.PpToken.ENUM_NAME = {0: 'header-name', 1: 'identifier', 2: 'pp-number', 3: 'character-literal', 4: 'string-literal', 5: 'preprocessing-op-or-punc', 6: 'non-whitespace', 7: 'whitespace', 8: 'concat'}

Map of {integer : PREPROCESS_TOKEN_TYPE, ...} So this can be used thus:

if ENUM_NAME[token_type] == 'header-name':
exception cpip.core.PpToken.ExceptionCpipToken

Used by PpToken.

exception cpip.core.PpToken.ExceptionCpipTokenIllegalMerge

Used by PpToken when PpToken.merge() is called illegally.

exception cpip.core.PpToken.ExceptionCpipTokenIllegalOperation

Used by PpToken when an illegal operation is performed.

exception cpip.core.PpToken.ExceptionCpipTokenReopenForExpansion

Used by PpToken when a non-expandable token is made available for expansion.

exception cpip.core.PpToken.ExceptionCpipTokenUnknownType

Used by PpToken when the token type is out of range.

cpip.core.PpToken.LEX_PPTOKEN_TYPES = ['header-name', 'identifier', 'pp-number', 'character-literal', 'string-literal', 'preprocessing-op-or-punc', 'non-whitespace', 'whitespace', 'concat']

Types of preprocessing-token From: ISO/IEC 14882:1998(E) 2.4 Preprocessing tokens [lex.pptoken] and ISO/IEC 9899:1999 (E) 6.4.7 Header names .. note:

Para 3 of the latter says that: "A header name preprocessing token is
recognized only within a ``#include`` preprocessing directive."

So in other contexts a header-name that is a q-char-sequence should be treated
as a string-literal

This produces interesting issues in this case:

#define str(s) # s
#include str(foo.h)

The stringise operator creates a string-literal token but the #include directive expects a header-name. So in certain contexts (macro stringising followed by #include instruction) we need to ‘downcast’ a string-literal to a header-name.

See cpip.core.PpLexer.PpLexer for how this is done

cpip.core.PpToken.LEX_PPTOKEN_TYPE_ENUM_RANGE = range(0, 9)

Range of allowable enum values

cpip.core.PpToken.NAME_ENUM = {'preprocessing-op-or-punc': 5, 'concat': 8, 'string-literal': 4, 'character-literal': 3, 'header-name': 0, 'identifier': 1, 'non-whitespace': 6, 'pp-number': 2, 'whitespace': 7}

Map of {PREPROCESS_TOKEN_TYPE : integer, ...} So this can be used thus:

self._cppTokType = NAME_ENUM['header-name']
class cpip.core.PpToken.PpToken(t, tt, lineNum=0, colNum=0, isReplacement=False)

Holds a preprocessor token, its type and whether the token can be replaced.

t is the token (a string) and tt is either an enumerated integer or a string. Internally tt is stored as an enumerated integer. If the token is an identifier then it is eligible for replacement unless marked otherwise.

SINGLE_SPACE = ' '

Representation of a single whitespace

WORD_REPLACE_MAP = {'/': '//', 'true': 'True', 'false': 'False', '||': ' or ', '&&': ' and '}

Operators that are replaced directly by Python equivalents for constant evaluation

canReplace

Flag to control whether this token is eligible for replacement

colNum

Returns the column number of the start of the token as an integer.

copy()

Returns a shallow copy of self. This is useful where the same token is added to multiple lists and then a merge() operation on one list will be seen by the others. To avoid this insert self.copy() in all but one of the lists.

evalConstExpr()

Returns an string value suitable for eval’ing in a constant expression. For numbers this removes such tiresome trivia as ‘u’, ‘L’ etc. For others it replaces ‘&&’ with ‘and’ and so on.

See ISO/IEC ISO/IEC 14882:1998(E) 16.1 Conditional inclusion sub-section 4 i.e. section 16.1-4

and: ISO/IEC 9899:1999 (E) 6.10.1 Conditional inclusion sub-section 3 i.e. section 6.10.1-3

getIsReplacement()

Gets the flag that records that this token is the result of macro replacement

getPrevWs()

Gets the flag that records prior whitespace.

getReplace()

Gets the flag that controls whether this can be replaced.

isCond

Flag that if True indicates that the token appeared within a section that was conditionally compiled. This is False on construction and can only be set True by setIsCond()

isIdentifier()

Returns True if the token type is ‘identifier’.

isReplacement

Flag that records that this token is the result of macro replacement

isUnCond

Flag that if True indicates that the token appeared within a section that was un-conditionally compiled. This is the negation of isCond.

isWs()

Returns True if the token type is ‘whitespace’.

lineNum

Returns the line number of the start of the token as an integer.

merge(other)

This will merge by appending the other token if they are different token types the type becomes ‘concat’.

prevWs

Flag to indicate whether this token is preceded by whitespace

replaceNewLine()

Replace any newline with a single whitespace character in-place.

See: ISO/IEC 9899:1999(E) 6.10-3 and C++ ISO/IEC 14882:1998(E) 16.3-9

This will raise a ExceptionCpipTokenIllegalOperation if I am not a whitespace token.

setIsCond()

Sets self._isCond to be True.

setIsReplacement(val)

Sets the flag that records that this token is the result of macro replacement.

setPrevWs(val)

Sets the flag that records prior whitespace.

setReplace(val)

Setter, will raise if I am not an identifier or val is True and if I am otherwise not expandable.

shrinkWs()

Replace all whitespace with a single ‘ ‘

This will raise a ExceptionCpipTokenIllegalOperation if I am not a whitespace token.

subst(t, tt)

Substitutes token value and type.

t

Returns the token as a string.

tokEnumToktype

Returns the token and the enumerated token type as a tuple.

tokToktype

Returns the token and the token type (as a string) as a tuple.

tt

Returns the token type as a string.

cpip.core.PpToken.tokensStr(theTokens, shortForm=True)

Given a list of tokens this returns them as a string. If shortForm is True then the lexical string is returned. If False then the PpToken representations separated by ‘ | ‘ is returned. e.g. PpToken(t="f", tt=identifier, line=True, prev=False, ?=False) | ...

PpTokenCount

Keeps a count of Preprocessing tokens.

exception cpip.core.PpTokenCount.ExceptionPpTokenCount

Exception when handling PpTokenCount object.

exception cpip.core.PpTokenCount.ExceptionPpTokenCountStack

Exception when handling PpTokenCountStack object.

class cpip.core.PpTokenCount.PpTokenCount

Maps of {token_type : integer_count, ...} self._cntrTokAll is all tokens.

__iadd__(other)

In-place add of the contents of another PpTokenCount object.

__weakref__

list of weak references to the object (if defined)

inc(tok, isUnCond, num=1)

Increment the count. tok is a PpToken, isUnCond is a boolean that is True if this is not conditionally compiled. num is the number of tokens to increment.

tokenCount(theType, isAll)

Returns the token count of a particular type. If isAll is true then the count of all tokens is returned, if False the count of unconditional tokens is returned.

tokenCountNonWs(isAll)

Returns the token count of a particular type. If isAll is true then the count of all tokens is returned, if False the count of unconditional tokens is returned.

tokenTypesAndCounts(isAll, allPossibleTypes=True)

Generator the yields (type, count) in PpToken.LEX_PPTOKEN_TYPES order where type is a string and count an integer.

If isAll is true then the count of all tokens is returned, if False the count of unconditional tokens is returned.

If allPossibleTypes is True the counts of all token types are yielded even if zero, if False then only token types encountered will be yielded i.e. all counts will be non-zero.

totalAll

The total token count.

totalAllConditional

The token count of conditional tokens.

totalAllUnconditional

The token count of unconditional tokens.

class cpip.core.PpTokenCount.PpTokenCountStack

This simply holds a stack of PpTokenCount objects that can be created and popped of the stack.

__init__()

ctor with empty stack.

__weakref__

list of weak references to the object (if defined)

close()

Finalisation, will raise a ExceptionPpTokenCountStack if there is anything on the stack.

counter()

Returns a reference to the current PpTokenCount object.

pop()

Pops the current PpTokenCount object off the stack and returns it.

push()

Add a new counter object to the stack.

PpTokeniser

Performs translation phases 0, 1, 2, 3 on C/C++ source code.

Translation phases from ISO/IEC 9899:1999 (E):

5.1.1.2 Translation phases 5.1.1.2-1 The precedence among the syntax rules of translation is specified by the following phases.

Phase 1. Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.

Phase 2. Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.

Phase 3. The source file is decomposed into preprocessing tokens6) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.

TODO: Do phases 0,1,2 as generators i.e. not in memory?

TODO: Check coverage with a complete but minimal example of every token

TODO: remove self._cppTokType and have it as a return value?

TODO: Remove commented out code.

TODO: Performance of phase 1 processing.

TODO: rename next() as genPpTokens()?

TODO: Perf rewrite slice functions to take an integer argument of where in the array to start inspecting for a slice. This avoids calls to ...[x:] e.g. myCharS = myCharS[sliceIdx:] in genLexPptokenAndSeqWs.

cpip.core.PpTokeniser.COMMENT_REPLACEMENT = ' '

Comments are replaced by a single space

cpip.core.PpTokeniser.C_KEYWORDS = ('auto', 'break', 'case', 'char', 'const', 'continue', 'default', 'do', 'double', 'else', 'enum', 'extern', 'float', 'for', 'goto', 'if', 'inline', 'int', 'long', 'register', 'restrict', 'return', 'short', 'signed', 'sizeof', 'static', 'struct', 'switch', 'typedef', 'union', 'unsigned', 'void', 'volatile', 'while', '_Bool', '_Complex', '_Imaginary')

ISO/IEC 9899:1999 (E) 6.4.1 Keywords

cpip.core.PpTokeniser.DIGRAPH_TABLE = {'bitor': '|', '<%': '{', 'compl': '~', ':>': ']', '<:': '[', 'not_eq': '!=', 'not': '!', 'xor': '^', '%:': '#', '%>': '}', 'xor_eq': '^=', 'or': '||', 'bitand': '&', 'and_eq': '&=', 'or_eq': '|=', '%:%:': '##', 'and': '&&'}

Map of Digraph alternates

exception cpip.core.PpTokeniser.ExceptionCpipTokeniser

Simple specialisation of an exception class for the preprocessor.

exception cpip.core.PpTokeniser.ExceptionCpipTokeniserUcnConstraint

Specialisation for when universal character name exceeds constraints.

cpip.core.PpTokeniser.LEN_SOURCE_CHARACTER_SET = 96

Size of the source code character set

class cpip.core.PpTokeniser.PpTokeniser(theFileObj=None, theFileId=None, theDiagnostic=None)

Imitates a Preprocessor that conforms to ISO/IEC 14882:1998(E).

Takes an optional file like object. If theFileObj has a ‘name’ attribute then that will be use as the name otherwise theFileId will be used as the file name.

Implementation note: On all _slice...() and __slice...() functions: A _slice...() function takes a buffer-like object and an integer offset as arguments. The buffer-like object will be accessed by index so just needs to implement __getitem__(). On overrun or other out of bounds index an IndexError must be caught by the _slice...() function. i.e. len() should not be called on the buffer-like object, or rather, if len() (i.e. __len__()) is called a TypeError will be raised and propagated out of this class to the caller.

StrTree, for example, conforms to these requirements.

The function is expected to return an integer that represents the number of objects that can be consumed from the buffer-like object. If the return value is non-zero the PpTokeniser is side-affected in that self._cppTokType is set to a non-None value. Before doing that a test is made and if self._cppTokType is already non-None then an assertion error is raised.

The buffer-like object should not be side-affected by the _slice...() function regardless of the return value.

So a _slice...() function pattern is:

def _slice...(self, theBuf, theOfs):
    i = theOfs
    try:
        # Only access theBuf with [i] so that __getitem__() is called
        ...theBuf[i]...
        # Success as the absence of an IndexError!
        # So return the length of objects that pass
        # First test and set for type of slice found
        if i > theOfs:
            assert(self._cppTokType is None), '_cppTokType was %s now %s' % (self._cppTokType, ...)
            self._cppTokType = ...
        # NOTE: Return size of slice not the index of the end of the slice
        return i - theOfs
    except IndexError:
        pass
    # Here either return 0 on IndexError or i-theOfs
    return ...

NOTE: Functions starting with __slice... do not trap the IndexError, the caller must do that.

TODO: ISO/IEC 14882:1998(E) Escape sequences Table 5?

cppTokType

Returns the type of the last preprocessing-token found by _sliceLexPptoken().

fileLineCol

Return an instance of FileLineCol from the current physical line column.

fileLocator

Returns the FileLocation object.

fileName

Returns the ID of the file.

filterHeaderNames(theToks)

Returns a list of ‘header-name’ tokens from the supplied stream. May raise ExceptionCpipTokeniser if un-parsable or theToks has non-(whitespace, header-name).

genLexPptokenAndSeqWs(theCharS)

Generates a sequence of PpToken objects. Either:

  • a sequence of whitespace (comments are replaces with a single whitespace).
  • a pre-processing token.

This performs translation phase 3.

NOTE: Whitespace sequences are not merged so '  /\*\*/ ' will generate three tokens each of PpToken.PpToken(' ', 'whitespace') i.e. leading whitespace, comment replced by single space, trailing whitespace.

So this yields the tokens from translation phase 3 if supplied with the results of translation phase 2.

NOTE: This does not generate ‘header-name’ tokens as these are context dependent i.e. they are only valid in the context of a #include directive.

ISO/IEC 9899:1999 (E) 6.4.7 Header names Para 3 says that: “A header name preprocessing token is recognised only within a #include preprocessing directive.”.

initLexPhase12()

Process phases one and two and returns the result as a string.

lexPhases_0()

An non-standard phase that just reads the file and returns its contents as a list of lines (including EOL characters). May raise an ExceptionCpipTokeniser if self has been created with None or the file is unreadable

lexPhases_1(theLineS)

ISO/IEC 14882:1998(E) 2.1 Phases of translation [lex.phases] - Phase one Takes a list of lines (including EOL characters), replaces trigraphs and returns the new list of lines.

lexPhases_2(theLineS)

ISO/IEC 14882:1998(E) 2.1 Phases of translation [lex.phases] - Phase two This joins physical to logical lines. NOTE: This side-effects the supplied lines and returns None.

next()

The token generator. On being called this performs translations phases 1, 2 and 3 (unless already done) and then generates pairs of: (preprocessing token, token type) Token type is an enumerated integer from LEX_PPTOKEN_TYPES. Proprocessing tokens include sequences of whitespace characters and these are not necessarily concatenated i.e. this generator can produce more than one whitespace token in sequence. TODO: Rename this to ppTokens() or something

pLineCol

Returns the current physical (line, column) as integers.

reduceToksToHeaderName(theToks)

This takes a list of PpTokens and retuns a list of PpTokens that might have a header-name token type in them. May raise an ExceptionCpipTokeniser if tokens are not all consumed. This is used at lexer level for re-interpreting PpTokens in the context of a #include directive.

resetTokType()

Erases the memory of the previously seen token type.

substAltToken(tok)

If a PpToken is a Digraph this alters its value to its alternative. If not the supplied token is returned unchanged. There are no side effects on self.

cpip.core.PpTokeniser.TRIGRAPH_PREFIX = '?'

Note: This is redoubled

cpip.core.PpTokeniser.TRIGRAPH_SIZE = 3

Well it is a Trigraph

cpip.core.PpTokeniser.TRIGRAPH_TABLE = {"'": '^', '/': '\\', '=': '#', '!': '|', '<': '{', ')': ']', '(': '[', '>': '}', '-': '~'}

Map of Trigraph alternates after the ?? prefix

PpWhitespace

Understands whitespacey things about source code character streams.

cpip.core.PpWhitespace.DEFINE_WHITESPACE = {' ', '\n', '\t'}

Whitespace characters that are significant in define statements ISO/IEC 14882:1998(E) 16-2 only ‘ ‘ and ‘t’ as ws

cpip.core.PpWhitespace.LEN_WHITESPACE_CHARACTER_SET = 5

Number of whitespace characters

cpip.core.PpWhitespace.LEX_NEWLINE = '\n'

Whitespace newline

cpip.core.PpWhitespace.LEX_WHITESPACE = {'\n', ' ', '\x0b', '\t', '\x0c'}

Whitespace characters

class cpip.core.PpWhitespace.PpWhitespace

A class that does whitespacey type things in accordance with ISO/IEC 9899:1999(E) Section 6 and ISO/IEC 14882:1998(E).

hasLeadingWhitespace(theCharS)

Returns True if any leading whitespace, False if zero length or starts with non-whitespace.

isAllMacroWhitespace(theCharS)

“Return True if theCharS is zero length or only has allowable whitespace for preprocesing macros.

ISO/IEC 14882:1998(E) 16-2 only ‘ ‘ and ‘ ‘ as whitespace.

isAllWhitespace(theCharS)

Returns True if the supplied string is all whitespace.

isBreakingWhitespace(theCharS)

Returns True if whitespace leads theChars and that whitespace contains a newline.

preceedsNewline(theCharS)

Returns True if theChars ends with a newline. i.e. this immediately precedes a new line.

sliceNonWhitespace(theBuf, theOfs=0)

Returns the length of non-whitespace characters that are in theBuf from position theOfs.

sliceWhitespace(theBuf, theOfs=0)

Returns the length of whitespace characters that are in theBuf from position theOfs.

PragmaHandler

exception cpip.core.PragmaHandler.ExceptionPragmaHandler

Simple specialisation of an exception class for the PragmaHandler. If raised this will cause the PpLexer to register undefined behaviour.

exception cpip.core.PragmaHandler.ExceptionPragmaHandlerStopParsing

Exception class for the PragmaHandler to stop parsing token stream.

class cpip.core.PragmaHandler.PragmaHandlerABC

Abstract base class for a pragma handler.

isLiteral

Treat the result of pragma() literally so no further processing required.

pragma(theTokS)

Takes a list of PpTokens, processes then and should return a newline terminated string that will be preprocessed in the current environment.

replaceTokens

An boolean attribute that says whether the supplied tokens should be macro replaced before being passed to self.

class cpip.core.PragmaHandler.PragmaHandlerEcho

A pragma handler that retains the #pragma line verbatim.

isLiteral

This class is just going to echo the line back complete with the ‘#pragma’ prefix. If the PpLexer re-interpreted this it would be an infinite loop.

pragma(theTokS)

Consume and return.

replaceTokens

Tokens do not require macro replacement.

class cpip.core.PragmaHandler.PragmaHandlerNull

A pragma handler that does nothing.

pragma(theTokS)

Consume and return.

replaceTokens

Tokens do not require macro replacement.

class cpip.core.PragmaHandler.PragmaHandlerSTDC

Base class for a pragma handler that implements ISO/IEC 9899:1999 (E) 6.10.5 Error directive para. 2.

DIRECTIVES = ('FP_CONTRACT', 'FENV_ACCESS', 'CX_LIMITED_RANGE')

Standard C acceptable macro directives

ON_OFF_SWITCH_STATES = ('ON', 'OFF', 'DEFAULT')

Standard C macro states

STDC = 'STDC'

Standard C macro

pragma(theTokS)

Inject a macro declaration into the environment.

See ISO/IEC 9899:1999 (E) 6.10.5 Error directive para. 2.

replaceTokens

STDC lines do not require macro replacement.

cpip.util

BufGen

A generator class with a buffer. This allows multiple inspections of the stream issued by a generator. For example this is used by MaxMunchGen.

class cpip.util.BufGen.BufGen(theGen)

A generator class with a buffer.

gen()

Yield objects from the generator via the buffer.

lenBuf

Returns the length of the existing buffer. NOTE: This may not be the final length as the generator might not be exhausted just yet.

replace(theIdx, theLen, theValueS)

Replaces within the buffer starting at theIdx removing theLen objects and replacing them with theValueS.

slice(sliceLen)

Returns a buffer slice of length sliceLen.

exception cpip.util.BufGen.ExceptionBufGen

Exception specialisation for BufGen.

CommonPrefix

Created on 23 Feb 2014

@author: paulross

cpip.util.CommonPrefix.lenCommonPrefix(iterable)

Returns the length of the common prefix of a list of file names. The prefix is limited to directory names.

DictTree

A dictionary that takes a list of hashables as a key and behaves like a tree.

class cpip.util.DictTree.DictTree(valIterable=None)

A dictionary that takes a list of hashables as a key and behaves like a tree

add(k, v)

Add a key/value. k is a list of hashables.

depth()

Returns the maximum tree depth as an integer.

keys()

Return a list of keys where each key is a list of hashables.

remove(k, v=None)

Remove a key/value. k is a list of hashables.

value(k)

Value corresponding to a key or None. k is a list of hashables.

values()

Returns a list of all values.

class cpip.util.DictTree.DictTreeHtmlTable(*args)

A sub-class of DictTree that helps writing HTML row/col span tables Suppose we have a tree like this:

                        |- AAA
                        |
                |- AA --|- AAB
                |       |
                |       |- AAC
        |- A ---|
Root ---|       |- AB
        |       |
        |       |- AC ---- ACA
        |
        |- B
        |
        |- C ---- CA ---- CAA

And we want to represent the tree like this when laid out as an HTML table:

|-----------------------|
| A     | AA    | AAA   |
|       |       |-------|
|       |       | AAB   |
|       |       |-------|
|       |       | AAC   |
|       |---------------|
|       | AB            |
|       |---------------|
|       | AC    | ACA   |
|-----------------------|
| B                     |
|-----------------------|
| C     | CA    | CAA   |
|-----------------------|

In this example the tree is loaded branch by branch thus:

myTree = DictTreeHtmlTable()
myTree.add(('A', 'AA', 'AAA'), None)
myTree.add(('A', 'AA', 'AAB'), None)
myTree.add(('A', 'AA', 'AAC'), None)
myTree.add(('A', 'AB',), None)
myTree.add(('A', 'AC', 'ACA'), None)
myTree.add(('B',), None)
myTree.add(('C', 'CA', 'CAA'), None)

The HTML code generator can be used like this:

# Write: <table border="2" width="100%">
for anEvent in myTree.genColRowEvents():
    if anEvent == myTree.ROW_OPEN:
        # Write out the '<tr>' element
    elif anEvent == myTree.ROW_CLOSE:
        # Write out the '</tr>' element
    else:
        k, v, r, c = anEvent
        # Write '<td rowspan="%d" colspan="%d">%s</td>' % (r, c, v)
# Write: </table>

And the HTML code will look like this:

<table border="2" width="100%">
    <tr valign="top">
        <td rowspan="5">A</td>
        <td rowspan="3">AA</td>
        <td>AAA</td>
    </tr>
    <tr valign="top">
        <td>AAB</td>
    </tr>
    <tr valign="top">
        <td>AAC</td>
    </tr>
    <tr valign="top">
        <td colspan="2">AB</td>
    </tr>
    <tr valign="top">
        <td>AC</td>
        <td>ACA</td>
    </tr>
    <tr valign="top">
        <td colspan="3">B</td>
    </tr>
    <tr valign="top">
        <td>C</td>
        <td>CA</td>
        <td>CAA</td>
    </tr>
</table>
genColRowEvents()

Returns a set of events that are quadruples. (key_branch, value, rowspan_int, colspan_int) The branch is a list of keys the from the branch of the tree. The rowspan and colspan are both integers. At the start of the a <tr> there will be a ROW_OPEN and at row end (</tr> a ROW_CLOSE will be yielded

setColRowSpan()

Top level call that sets colspan and rowspan attributes.

exception cpip.util.DictTree.ExceptionDictTree

Exception when handling a DictTree object.

DirWalk

Provides various ways of walking a directory tree

Created on Jun 9, 2011

exception cpip.util.DirWalk.ExceptionDirWalk

Exception class for this module.

class cpip.util.DirWalk.FileInOut(filePathIn, filePathOut)

A pair of (in, out) file paths

filePathIn

Alias for field number 0

filePathOut

Alias for field number 1

cpip.util.DirWalk.dirWalk(theIn, theOut=None, theFnMatch=None, recursive=False, bigFirst=False)

Walks a directory tree generating file paths.

theIn
The input directory.
theOut
The output directory. If None then input file paths as strings will be generated If non-None this function will yield FileInOut(in, out) objects. NOTE: This does not create the output directory structure, it is up to the caller to do that.
theFnMatch
A glob like match pattern for file names (not tested for directory names). Can be a list of strings any of which can match. If None or empty list then all files match.
recursive
Boolean to recurse into directories or not.
bigFirst
If True then the largest files in directory are given first. If False it is alphabetical.
cpip.util.DirWalk.genBigFirst(d)

Generator that yields the biggest files (name not path) first. This is fairly simple in that it it only looks the current directory not only sub-directories. Useful for multiprocessing.

HtmlUtils

HTML utility functions.

cpip.util.HtmlUtils.pathSplit(p)

Split a path into its components.

Returns a string that is a link to a HTML file.

theSrcPath : str
The path of the original source, whis will be encoded with retHtmlFileName().
theLineNum : int
An integer line number in the target.
cpip.util.HtmlUtils.retHtmlFileName(thePath)

Creates a unique, short, human readable file name base on the input file path.

cpip.util.HtmlUtils.writeCharsAndSpan(theS, theText, theSpan)

Write theText to the stream theS. If theSpan is not None the text is enclosed in a <span class=theSpan> element.

theS
The XHTML stream.
theText : str
The text to write, must be non-empty.
theClass : str, optional
CSS class for the text.
cpip.util.HtmlUtils.writeDictTreeAsTable(theS, theDt, tableAttrs, includeKeyTail)

Writes a DictTreeHtmlTable object as a table, for example as a directory structure.

The key list in the DictTreeHtmlTable object is the path to the file i.e. os.path.abspath(p).split(os.sep) and the value is expected to be a pair of (link, nav_text) or None.

cpip.util.HtmlUtils.writeFileListAsTable(theS, theFileLinkS, tableAttrs, includeKeyTail)

Writes a list of file names as an HTML table looking like a directory structure. theFileLinkS is a list of pairs (file_path, href). The navigation text in the cell will be the basename of the file_path.

cpip.util.HtmlUtils.writeFileListTrippleAsTable(theS, theFileLinkS, tableAttrs, includeKeyTail)

Writes a list of file names as an HTML table looking like a directory structure. theFileLinkS is a list of triples (file_name, href, nav_text).

cpip.util.HtmlUtils.writeFilePathsAsTable(valueType, theS, theKvS, tableStyle, fnTd, fnTrTh=None)

Writes file paths as a table, for example as a directory structure.

valueType
The type of the value: None, |'list' | 'set'
theS
The HTML stream.
theKvS: list
A list of pairs (file_path, value).
tableStyle: str
The style used for the table.
fnTd

A callback function that is executed for a <td> element when there is a non-None value. This is called with the following arguments:

theS
The HTML stream.
attrs : dict
A map of attrs that include the rowspan/colspan for the <td>
k : list
The key as a list of path components.
v
The value given by the caller.
fnTrTh

Callback function for the header that will be called with the following arguments:

theS
The HTML stream.
pathDepth
Maximum depth of the largest path, this can be used for <th colspan=”...”>File path</th>.
cpip.util.HtmlUtils.writeHtmlFileAnchor(theS, theLineNum, theText='', theClass=None, theHref=None)

Writes an anchor.

theS
The XHTML stream.
theLineNum : int
An integer line number in the target.
theText : str, optional
Navigation text.
theClass : str, optional
CSS class for the navigation text.
theHref : str, optional
The href=.

Writes a link to another HTML file that represents source code.

theS
The XHTML stream.
theSrcPath : str
The path of the original source, this will be encoded with retHtmlFileName().
theLineNum : int
An integer line number in the target.
theText : str, optional
Navigation text.
theClass : obj, optional
CSS class for the navigation text.

ListGen

Treats a list as a generator with an optional additional generator. This is used for macro replacement for example.

class cpip.util.ListGen.ListAsGenerator(theList, theGen=None)

Class that takes a list and provides a generator on that list. If the list is exhausted and call for another object is made then it is pulled of the generator (if available).

The attribute listIsEmpty is True if the immediate list is empty.

Iterating through the result and stopping when the list is exhausted using the flag listIsEmpty:

To be clear: when this flag is set, for example if we have a list [0,1,2,3] followed by [‘A’, ‘B’, ‘C’] thus:

myObj = ListAsGenerator(range(3), ListAsGenerator(list('ABC')).next())

And we try to iterate over it with list comprehension:

myGen = myObj.next()
myResult = [x for x in myGen if not myObj.listIsEmpty]

myResult will be [0, 1,] because when 3 is yielded the flag is False as it refers to the _next_ item.

Similarly the list comprehension:

myResult = [x for x in myGen if myObj.listIsEmpty]

Will be [3, ‘A’, ‘B’, ‘C’]

If you want to recover the then this the technique:

myResult = []
if not myObj.listIsEmpty:
    for aVal in myGen:
        myResult.append(aVal)
        if myObj.listIsEmpty:
            break

Or exclude the list then this the technique:

if not myObj.listIsEmpty:
    for aVal in myGen:
        if myObj.listIsEmpty:
            break
myResult = [x for x in myGen]

The rationale for this behaviour is for generating macro replacement tokens in that the list contains tokens for re-examination and the last token may turn out to be a function like macro that needs the generator to (possibly) complete the expansion. Once that last token has been re-examined we do not want to consume any more tokens than necessary.

listIsEmpty

True if the next yield would come from the generator, not the list.

next()

yield the next value. The attribute listIsEmpty will be set True immediately before yielding the last value.

MatrixRep

Makes replacements in a list of lines.

exception cpip.util.MatrixRep.ExceptionMatrixRep

Simple specialisation of an exception class for MatrixRep.

class cpip.util.MatrixRep.MatrixRep

Makes replacements in a list of lines.

addLineColRep(l, c, was, now)

Adds to the IR. No test is made to see if there is an existing or pre-existing conflicting entry or if a sequence of entries makes sense. It is expected that callers call this in line/column order of the original matrix. If not the results of a subsequent call to sideEffect() are undefined.

sideEffect(theMat)

Makes the replacement, if line/col is out of range and ExceptionMatrixRep will be raised and the state of theMat argument is undefined.

MaxMunchGen

Generic Maximal Munch generator.

exception cpip.util.MaxMunchGen.ExceptionMaxMunchGen

Exception specialisation for MaxMunchGen.

class cpip.util.MaxMunchGen.MaxMunchGen(theGen, theFnS, isExclusive=False, yieldReplacement=False)

Provides a generator that applies Maximal munch rules.

gen()

Yields a maximal munch. If yieldReplacement is False these will be pairs of (iterable, kind) where kind is from the function, any replacement will be done on the fly. If yieldReplacement is True these will be triples of (iterable, kind, repl) where kind and repl are from the function with repl being None if no replacement. No replacement will have been done.

TODO: Reconsider this design. Really yieldReplacement decides if the underlying generator buffer contains the replacement rather than whether self yields the replacement.

cpip.util.MaxMunchGen.anyToken(theGen)

A function that always reads one token. This can be used as the last registered function to ensure that the token stream is read to completion. The kind returned is None.

OaS

Various utility functions etc. that don’t obviously fit elsewhere.

exception cpip.util.OaS.ExceptionOas

Simple specialisation of an exception class for this module.

cpip.util.OaS.indexLB(l, v)

Returns the lower bound index in a sorted list l of the value that is equal to v or the nearest lower value to v. Returns -1 if l empty or all values higher than v.

cpip.util.OaS.indexMatch(l, v)

Returns the index of v in sorted list l or -1. This uses Jon Bentley’s binary search algorithm. This uses operators > and <.

cpip.util.OaS.indexUB(l, v)

Returns the upper bound index in a sorted list l of the value that is equal to v or the nearest upper value to v. Returns -1 if l empty or all values lower than v.

StrTree

Treats a string as a tree.

class cpip.util.StrTree.StrTree(theIterable=None)

Initialise the class with a optional list of strings.

add(s)

Add a string.

has(s, i=0)

Returns the index of the end of s that match a complete word in the tree. i.e. [i:return_value] is in the dictionary. Note IndexError and KeyError are trapped here.

values()

Returns all values.

Tree

Represents a simple tree.

Created on 6 Mar 2014

@author: paulross

class cpip.util.Tree.DuplexAdjacencyList

Represents a set of parent/child relationships (and their inverse) as Adjacency Lists.

allChildren

Returns an unordered list of objects that have at least one parent.

allParents

Returns an unordered list of objects that have at least one child.

children(parent)

Returns all immediate children of a given parent.

hasChild(child)

Returns True if the given child has any parents.

hasParent(parent)

Returns True if the given parent has any children.

parents(child)

Returns all immediate parents of a given child.

treeChildParent(theObj)

Returns a Tree() object where the links are the relationships between child and parent. Cycles are not reproduced i.e. if a -> b and b -> c and c-> a then treeChildParent(‘a’) returns [‘a’, ‘c’, ‘b’,] treeChildParent(‘b’) returns [‘b’, ‘a’, ‘c’,] treeChildParent(‘c’) returns [‘c’, ‘b’, ‘a’,]

treeParentChild(theObj)

Returns a Tree() object where the links are the relationships between parent and child. Cycles are not reproduced i.e. if a -> b and b -> c and c-> a then treeParentChild(‘a’) returns [‘a’, ‘b’, ‘c’,] treeParentChild(‘b’) returns [‘b’, ‘c’, ‘a’,] treeParentChild(‘c’) returns [‘c’, ‘a’, ‘c’,]

class cpip.util.Tree.Tree(obj)

Represents a simple tree of objects.

branches()

Returns all the possible branches through the tree as a list of lists of self._obj.

youngestChild

The latest child to be added, may raise IndexError if no children.

XmlWrite

Writes XML and XHTML.

class cpip.util.XmlWrite.Element(theXmlStream, theElemName, theAttrs=None)

Represents an element in a markup stream.

exception cpip.util.XmlWrite.ExceptionXml

Exception specialisation for the XML writer.

exception cpip.util.XmlWrite.ExceptionXmlEndElement

Exception specialisation for end of element.

cpip.util.XmlWrite.RAISE_ON_ERROR = True

Global flag that sets the error behaviour If True then this module may raise an ExceptionXml and that might mask other exceptions. If False no ExceptionXml will be raised but a logging.error(...) will be written. These will not mask other Exceptions.

class cpip.util.XmlWrite.XmlStream(theFout, theEnc='utf-8', theDtdLocal=None, theId=0, mustIndent=True)

Creates and maintains an XML output stream.

characters(theString)

Encodes the string and writes it to the output.

comment(theS, newLine=False)

Writes a comment to the output stream.

endElement(name)

Ends an element.

id

A unique ID in this stream. The ID is incremented on each call.

literal(theString)

Writes theString to the output without encoding.

pI(theS)

Writes a Processing Instruction to the output stream.

startElement(name, attrs)

Opens a named element with attributes.

writeCDATA(theData)

Writes a CDATA section.

Example:

writeCSS(theCSSMap)

Writes a style sheet as a CDATA section. Expects a dict of dicts.

Example:

writeECMAScript(theScript)

Writes the ECMA script.

Example:

xmlSpacePreserve()

Suspends indentation for this element and its descendants.

cpip.util.XmlWrite.decodeString(theS)

Returns a string that is the argument decoded. May raise a TypeError.

cpip.util.XmlWrite.encodeString(theS, theCharPrefix='_')

Returns a string that is the argument encoded. RFC3548:

See section 3 of : http://www.faqs.org/rfcs/rfc3548.html

cpip.util.XmlWrite.nameFromString(theStr)

Returns a name from a string.

See http://www.w3.org/TR/1999/REC-html401-19991224/types.html#type-cdata

“ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (“-”), underscores (“_”), colons (”:”), and periods (”.”).

This also works for in namespaces as ‘:’ is not used in the encoding.

cpip.plot

Coord

Main Classes

Most classes in this module are collections.namedtuple objects.

Class Description Attributes
Dim Linear dimension value units
Box A Box width depth
Pad Padding around a tree object prev next, parent child
Margin Padding around an object left right top bottom
Pt A point in Cartesian space x y
Reference
exception cpip.plot.Coord.ExceptionCoord

Exception class for representing Coordinates.

exception cpip.plot.Coord.ExceptionCoordUnitConvert

Exception raised when converting units.

cpip.plot.Coord.units()

Returns the unsorted list of acceptable units.

cpip.plot.Coord.convert(val, unitFrom, unitTo)

Convert a value from one set of units to another.

class cpip.plot.Coord.Dim

Represents a dimension as an engineering value i.e. a number and units.

scale(factor)

Returns a new Dim() scaled by a factor, units are unchanged.

convert(u)

Returns a new Dim() with units changed and value converted.

__add__(other)

Overload self+other, returned result has the sum of self and other. The units chosen are self’s unless self’s units are None in which case other’s units are used (if not None).

__sub__(other)

Overload self-other, returned result has the difference of self and other. The units chosen are self’s unless self’s units are None in which case other’s units are used (if not None).

__iadd__(other)

Addition in place, value of other is converted to my units and added.

__isub__(other)

Subtraction in place, value of other is subtracted.

__lt__(other)

Returns true if self value < other value after unit conversion.

__le__(other)

Returns true if self value <= other value after unit conversion.

__eq__(other)

Returns true if self value == other value after unit conversion.

__ne__(other)

Returns true if self value != other value after unit conversion.

__gt__(other)

Returns true if self value > other value after unit conversion.

__ge__(other)

Returns true if self value >= other value after unit conversion.

class cpip.plot.Coord.Pad

Padding around another object that forms the Bounding Box. All 4 attributes are Dim() objects

__str__()

Stringifying.

class cpip.plot.Coord.Pt

A point, an absolute x/y position on the plot area. Members are Coord.Dim().

__eq__(other)

Comparison.

__str__()

Stringifying.

convert(u)

Returns a new Pt() with units changed and value converted.

scale(factor)

Returns a new Pt() scaled by a factor, units are unchanged.

cpip.plot.Coord.baseUnitsDim(theLen)

Returns a Coord.Dim() of length and units BASE_UNITS.

cpip.plot.Coord.zeroBaseUnitsDim()

Returns a Coord.Dim() of zero length and units BASE_UNITS.

cpip.plot.Coord.zeroBaseUnitsBox()

Returns a Coord.Box() of zero dimensions and units BASE_UNITS.

cpip.plot.Coord.zeroBaseUnitsPad()

Returns a Coord.Pad() of zero dimensions and units BASE_UNITS.

cpip.plot.Coord.zeroBaseUnitsPt()

Returns a Coord.Dim() of zero length and units BASE_UNITS.

cpip.plot.Coord.newPt(theP, incX=None, incY=None)

Returns a new Pt object by incrementing existing point incX, incY that are both Dim() objects or None.

cpip.plot.Coord.convertPt(theP, theUnits)

Returns a new point with the dimensions of theP converted to theUnits.

Examples
Coord.Dim()

Creation, addition and subtraction:

d = Coord.Dim(1, 'in') + Coord.Dim(18, 'px')
# d is 1.25 inches
d = Coord.Dim(1, 'in') - Coord.Dim(18, 'px')
# d is 0.75 inches
d += Coord.Dim(25.4, 'mm')
# d is 1.75 inches

Scaling and unit conversion returns a new object:

a = Coord.Dim(12, 'px')
b = myObj.scale(6.0)
# b is 72 pixels
c = b.convert('in')
# 1 is 1 inch

Comparison:

assert(Coord.Dim(1, 'in') == Coord.Dim(72, 'px'))
assert(Coord.Dim(1, 'in') >= Coord.Dim(72, 'px'))
assert(Coord.Dim(1, 'in') <= Coord.Dim(72, 'px'))
assert(Coord.Dim(1, 'in') > Coord.Dim(71, 'px'))
assert(Coord.Dim(1, 'in') < Coord.Dim(73, 'px'))
Coord.Pt()

Creation:

p = Coord.Pt(
        Coord.Dim(12, 'px'),
        Coord.Dim(24, 'px'),
        )
print(p)
# Prints: 'Pt(x=Dim(12px), y=Dim(24px))'
p.x # Coord.Dim(12, 'px'))
p.y # Coord.Dim(24, 'px'))
# Scale up by 6 and convert units
pIn = p.scale(6).convert('in')
# pIn now 'Pt(x=Dim(1in), y=Dim(2in))'
Testing

The unit tests are in test/TestCoord.py.

PlotNode

Bounding Boxes

Legend for the drawing below:

**** - Self sigma BB.
~~~~ - Self pad box
#### - Self width and depth.
.... - All children
++++ - Child[n] sigma BB.

i.e. For a child its ++++ is equivalent to my ****.

Points in the drawing below:

  • D - Self datum point.
  • S - Self plot datum point.
  • x[n] - Child datum point.
  • Pl - Parent landing point to self.
  • Pt - Parent take-off point from self.
  • P[n] - Self take off point and landing point to child n.
  • pl[n] - Child n landing point from self.
  • pt[n] - Child n take-off point to self.
  • tdc - Top dead centre.

Box .... has depth of max(Boxes(++++).width) and width max(Box(~~~~), sum(Boxes(++++).depth)).

Each instance of class knows about the following:

Boxes:

  • **** - Self sigma BB as computed Dim() objects: self.bbSigmaDepth
    and self.bbSigmaWidth. Or as computed Box() object self.bbSigma
  • ~~~~ - As computed Dim() objects: self.bbSelfWidth, self.bbSelfDepth
  • #### - Self width and depth as Dim() objects: self.width and self.depth
  • .... - All children as a Box() object: self.bbChildren

And padding between ~~~~ and .... as Dim() object self.bbSpaceChildren

i.e. not ++++ - Child[n] sigma BB. That the caller knows about its children.

Points: given D each instance of this class knows:

S, Pl, Pt, P[0] to P[N-1], x[0], tdc (only).

In the following diagram where lines are adjacent that means that there is no spacing between them. This diagram shows the root at top left and the children from left to right. The default plot of the include graph is to have the root at top left with the processed file centre left with the children running from top to bottom. It is felt that this is more intuitive for source code.

-|-----> x increases
 |
 |
\/
y increases

D ***************************************************************************
*                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                     *
*                ~                                    ~                     *
*                ~    S ### Pl ###tdc### Pt ######    ~                     *
*                ~    #                          #    ~                     *
*                ~    #                          #    ~                     *
*                ~    #         Parent           #    ~                     *
*                ~    #                          #    ~                     *
*                ~    ## P[0] ## P[c] ## P[C-1] ##    ~                     *
*                ~                                    ~                     *
*                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                     *
*                                 ^                                         *
*                                 | == self._bbSpaceChildren                *
*                                 |                                         *
*...........................................................................*
*.x[0] + pl[0] + pt[0] +x[c] + pl[c] + pt[c] ++++++++++++x[C-1]+pl/pt[C-1]+.*
*.+                    ++                               ++                +.*
*.+     Child[0]       ++                               ++                +.*
*.+                    ++                               ++   Child[C-1]   +.*
*.+++++++++++++++++++++++           Child[c]            ++                +.*
*.                      +                               +++++++++++++++++++.*
*.                      +                               +                  .*
*.                      +++++++++++++++++++++++++++++++++                  .*
*...........................................................................*
*****************************************************************************

Note: .... can be narrower than ~~~~

Vertices

The following show root at the left. Linking parent to child:

                    PC_land    PC_stop
                     |            |
                     x>>>>>>>>>>>>x
                    /
                   /
    x>>>>>>>>>>>>x/
    |            |
PC_roll        PC_to

PC_roll and PC_to are determined by the parent. PC_land and PC_stop are determined by the child.

And child to parent:

CP_stop     CP_land
    |          |
    x<<<<<<<<<<x\
                 \
                  \
                   x<<<<<<<<<<<<x
                   |            |
                CP_to        CP_roll

CP_roll and CP_to are determined by the child. CP_land and CP_stop are determined by the parent.

exception cpip.plot.PlotNode.ExceptionPlotNode

Exception when handling PlotNodeBbox object.

class cpip.plot.PlotNode.PlotNodeBbox

This is a class that can hold the width and depth of an object and the bounding box of self and the children. This can then compute various dimensions of self and children.

bbChildren

The bounding box of children as a Coord.Box() or None. i.e. the box ....

bbChildrenDepth

The bounding box depth of children as a Coord.Dim() or None. i.e. the depth of box ....

bbChildrenWidth

The bounding box width of children as a Coord.Dim() or None. i.e. the width of box ....

bbSelfDepth

The depth of self plus padding as a Coord.Dim(). i.e. the depth of box ~~~~

bbSelfPadding

The immediate padding around self as a Coord.Pad().

bbSelfWidth

The width of self plus padding as a Coord.Dim() or None. i.e. the width of box ~~~~

bbSigma

Bounding box of self and my children as a Coord.Box().

bbSigmaDepth

The depth of self+children as a Coord.Dim() or None in the case that I don’t exist and I have no children. i.e. the depth of box ****

bbSigmaWidth

The depth of self+children as a Coord.Dim() or None in the case that I don’t exist and I have no children. i.e. the width of box ****

bbSpaceChildren

The additional distance to give to the children as a Coord.Dim().

box

The Coord.Box() of ####.

childBboxDatum(theDatum)

The point x[0] as a Coord.Pt() given theDatum as Coord.Pt() or None if no children.

depth

The immediate depth of the node, if None then no BB depth or bbSpaceChildrend is allocated. i.e. the depth of box ####

extendChildBbox(theChildBbox)

Extends the child bounding box by the amount theChildBbox which should be a Coord.Box(). This extends the .... line.

hasSetArea

Returns True if width and depth are set, False otherwise.

plotPointCentre(theLd)

Returns the logical point at the centre of the box shown as #### above.

plotPointSelf(theDatum)

The point S as a Coord.Pt() given theDatum as Coord.Pt().

width

The immediate width of the node, if None then no BB width is allocated. i.e. the width of box ####

class cpip.plot.PlotNode.PlotNodeBboxBoxy

Sub-class parent child edges that contact the corners of the box shown as #### above.

cpLand(theLd, childIndex)

The me-as-parent-from-child landing point given the logical datum as a Coord.Pt.

cpRoll(theLd)

The me-as-child-to-parent start point given the logical datum as a Coord.Pt.

cpStop(theLd, childIndex)

The me-as-parent-from-child stop point given the logical datum as a Coord.Pt.

cpTo(theLd)

The me-as-child-to-parent take off point given the logical datum as a Coord.Pt.

pcLand(theLd)

The parent-to-me-as-child landing point given the logical datum as a Coord.Pt.

pcRoll(theDatum, childIndex)

The me-as-parent-to-child logical start point given the logical datum as a Coord.Pt and the child ordinal. This gives equispaced points along the lower edge.

pcStop(theLd)

The parent-to-me-as-child stop point given the logical datum as a Coord.Pt.

pcTo(theDatum, childIndex)

The me-as-parent-to-child logical take off point given the logical datum as a Coord.Pt ind the child ordinal. This gives equispaced points along the lower edge.

class cpip.plot.PlotNode.PlotNodeBboxRoundy

Sub-class for parent child edges that contact the centre of the box shown as #### above.

cpLand(theDatumL, childIndex)

The me-as-parent-from-child landing point given the logical datum as a Coord.Pt.

cpRoll(theDatumL)

The me-as-child-to-parent start point given the logical datum as a Coord.Pt.

cpStop(theDatumL, childIndex)

The me-as-parent-from-child stop point given the logical datum as a Coord.Pt.

cpTo(theDatumL)

The me-as-child-to-parent take off point given the logical datum as a Coord.Pt.

pcLand(theDatumL)

The parent-to-me-as-child landing point given the logical datum as a Coord.Pt.

pcRoll(theDatumL, childIndex)

The me-as-parent-to-child logical start point given the logical datum as a Coord.Pt ind the child ordinal. This gives equispaced points along the lower edge.

pcStop(theDatumL)

The parent-to-me-as-child stop point given the logical datum as a Coord.Pt.

pcTo(theDatumL, childIndex)

The me-as-parent-to-child logical take off point given the logical datum as a Coord.Pt ind the child ordinal. This gives equispaced points along the lower edge.

SVGWriter

An SVG writer.

exception cpip.plot.SVGWriter.ExceptionSVGWriter

Exception class for SVGWriter.

class cpip.plot.SVGWriter.SVGCircle(theXmlStream, thePoint, theRadius, attrs=None)

A circle in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#CircleElement

Initialise the circle with a stream, a Coord.Pt() and a Coord.Dim() objects.

class cpip.plot.SVGWriter.SVGElipse(theXmlStream, ptFrom, theRadX, theRadY, attrs=None)

An elipse in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#EllipseElement

Initialise the elipse with a stream, a Coord.Pt() and a Coord.Dim() objects.

class cpip.plot.SVGWriter.SVGGroup(theXmlStream, attrs=None)

Initialise the group with a stream.

See: http://www.w3.org/TR/2003/REC-SVG11-20030114/struct.html#GElement

Sadly we can’t use **kwargs because of Python restrictions on keyword names. For example stroke-width that is not a valid keyword argument (although stroke_width would be). So instead we pass in an optional dictionary {string : string, ...}

class cpip.plot.SVGWriter.SVGLine(theXmlStream, ptFrom, ptTo, attrs=None)

A line in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#LineElement

Initialise the line with a stream, and two Coord.Pt() objects.

class cpip.plot.SVGWriter.SVGPointList(theXmlStream, name, pointS, attrs)

An abstract class that takes a list of points, derived by polyline and polygon.

Initialise the element with a stream, a name, and a list of Coord.Pt() objects.

NOTE: The units of the points are ignored, it is up to the caller to convert them to the User Coordinate System.

class cpip.plot.SVGWriter.SVGPolygon(theXmlStream, pointS, attrs=None)

A polygon in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#PolygonElement

Initialise the polygon with a stream, and a list of Coord.Pt() objects.

NOTE: The units of the points are ignored, it is up to the caller to convert them to the User Coordinate System.

class cpip.plot.SVGWriter.SVGPolyline(theXmlStream, pointS, attrs=None)

A polyline in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#PolylineElement

Initialise the polyline with a stream, and a list of Coord.Pt() objects.

NOTE: The units of the points are ignored, it is up to the caller to convert them to the User Coordinate System.

class cpip.plot.SVGWriter.SVGRect(theXmlStream, thePoint, theBox, attrs=None)

Initialise the rectangle with a stream, a Coord.Pt() and a Coord.Box() objects. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/shapes.html#RectElement Typical attributes: {‘fill’ : “blue”, ‘stroke’ : “black”, ‘stroke-width’ : “2”, }

class cpip.plot.SVGWriter.SVGText(theXmlStream, thePoint, theFont, theSize, attrs=None)

Text in SVG. See: http://www.w3.org/TR/2003/REC-SVG11-20030114/text.html#TextElement

Initialise the text with a stream, a Coord.Pt() and font as a string and size as an integer. If thePoint is None then no location will be specified (for example for use inside a <defs> element.

class cpip.plot.SVGWriter.SVGWriter(theFile, theViewPort, rootAttrs=None, mustIndent=True)

Initialise the stream with a file and Coord.Box() object. The view port units must be the same for width and depth.

cpip.plot.SVGWriter.dimToTxt(theDim)

Converts a Coord.Dim() object to text for SVG units.

TreePlotTransform

Provides a means of re-interpreting the coordinate system when plotting trees so that the the tree root can be top/left/bottom/right and the child order plotted anti-clockwise or clockwise.

This can convert ‘logical’ positions into ‘physical’ positions. Where a ‘logical’ position is one with the root of the tree at the top and the child nodes below in left-to-right (i.e. anti-clockwise) order. A ‘physical’ position is a plot-able position where the root of the tree is top/left/bottom or right and the child nodes are in anti-clockwise or clockwise order.

Transforming sizes and positions

If the first suffix is ‘l’ this is the “logical” coordinate system. If the first suffix is ‘p’ this is the “physical” coordinate system.

Then:

  • C - The canvas dimension, Cpw is “Canvas physical width”
  • W - Width dimension, physical and logical.
  • D - Depth dimension, physical and logical.
  • B - Box datum position (“top-left”), physical and logical, x and y.
  • P - Arbitrary point, physical and logical, x and y.

So this “logical view” of the tree graph (‘top’ and ‘-‘): i.e. Root(s) is a top and children are written in an anti-clockwise.

 ---> x
 |
 \/
 y

<------------------------ Clw ------------------------>
|                                  To Parent
|                                     |
|             Blx, Bly -->*************************
|                         *                  |    *
Cld                       *                 Dl    *
|                         *<-------- Wl -----|--->*
|                         *                  |    *
|       Plx, Ply ->.      *                  |    *
|                         *************************
|                             |        |       |
|                        To C[0]  To C[c]   To C[C-1]

Or:

Origin Cpw Cpd Wp Dp Bpx Bpy Ppx Ppy
top Clw Cld Wl Dl Blx Bly Plx Ply
left Cld Clw Dl Wl Bly (Clw-Plx-Wl) Ply Clw-Plx
bottom Clw Cld Wl Dl (Clw-Plx-Wl) (Cld-Ply-Dl) Clw-Plx Cld-Ply
right Cld Clw Dl Wl (Cld-Ply-Dl) Blx Cld-Ply Plx

Note the diagonal top-right to bottom-left transference between each pair of columns. That is because with each successive line we are doing a 90 degree rotation (anti-clockwise) plus a +ve y translation by Clw (top->left or bottom->right) or Cld (left->bottom or right->top).

Incrementing child positions

Moving from one child to another is done in the following combinations:

Origin ‘-‘ ‘+’
top right left
left up down
bottom left right
right down up
exception cpip.plot.TreePlotTransform.ExceptionTreePlotTransform

Exception class for TreePlotTransform.

exception cpip.plot.TreePlotTransform.ExceptionTreePlotTransformRangeCtor

Exception class for out of range input on construction.

class cpip.plot.TreePlotTransform.TreePlotTransform(theLogicalCanvas, rootPos='top', sweepDir='-')

Provides a means of re-interpreting the coordinate system when plotting trees.

rootPosition = frozenset([‘top’, ‘bottom’, ‘left’, ‘right’]) default: ‘top’

sweepDirection = frozenset([‘+’, ‘-‘]) default: ‘-‘

Has functionality for interpreting width/depth to actual postions given rootPostion.

bdcL(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns logical bottom dead centre of a box.

bdcP(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns physical bottom dead centre of a box.

boxDatumP(theBlxy, theBl)

Given a logical point and logical box this returns a physical point that is the box datum (“upper left”).

boxP(theBl)

Given a logical box this returns a Coord.Box that describes the physical box.

canvasP()

Returns a Coord.Box that describes the physical canvass.

genRootPos()

Yield all possible root positions.

genSweepDir()

Yield all possible root positions.

incPhysicalChildPos(thePt, theDim)

Given a child physical datum point and a distance to next child this returns the next childs physical datum point. TODO: Remove this as redundant?

nextdcL(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns logical ‘next’ dead centre of a box.

positiveSweepDir

True if positive sweep, false otherwise.

postIncChildLogicalPos(thePt, theBox)

Post-incrempents the child logical datum point (‘top-left’) given the child logical datum point and the child.bbSigma. Returns a Coord.Pt(). This takes into account the sweep direction.

preIncChildLogicalPos(thePt, theBox)

Pre-incrempents the child logical datum point (‘top-left’) given the child logical datum point and the child.bbSigma. Returns a Coord.Pt(). This takes into account the sweep direction.

prevdcL(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns logical ‘previous’ dead centre of a box.

pt(thePt, units=None)

Given an arbitrary logical point as a Coord.Pt(), this returns the physical point as a Coord.Pt(). If units is supplied then the return value will be in those units.

startChildrenLogicalPos(thePt, theBox)

Returns the starting child logical datum point (‘top-left’) given the children logical datum point and the children.bbSigma. Returns a Coord.Pt(). This takes into account the sweep direction.

tdcL(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns logical top dead centre of a box.

tdcP(theBlxy, theBl)

Given a logical datum (logical top left) and a logical box this returns physical top dead centre of a box.

Usage

To use cpip in a project:

import cpip

Have a read of the CPIP Tutorials for how you can use CPIP programatically. In there is a tutorial on how to use the PpLexer that is the equivalent of cpp: PpLexer Tutorial. There is also the FileIncludeGraph Tutorial showing how you can analyse how #include‘d files are processed. This is very useful for undestanding file dependencies.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/paulross/cpip/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

cpip could always use more documentation, whether as part of the official cpip docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/paulross/cpip/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up cpip for local development.

  1. Fork the cpip repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/cpip.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv cpip
    $ cd cpip/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 cpip tests
    $ python setup.py test or py.test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
  3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check https://travis-ci.org/paulross/cpip/pull_requests and make sure that the tests pass for all supported Python versions.

Tips

To run a subset of tests:

$ py.test tests.test_cpip

Credits

Development Lead

Contributors

None yet. Why not be the first?

History

0.9.7 Beta Release (2017-10-04)

  • Minor fixes.
  • Performance optimisations.
  • Builds the CPython source tree in 5 hours with 2 CPUs.
  • DOcumentation improvements.

0.9.5 Beta Release (2017-10-03)

  • Migrate from sourceforge to GitHub.

0.9.1 (2014-09-03)

Version 0.9.1, various minor fixes. Tested on Python 2.7 and 3.3.

Alpha Plus Release (2014-09-04)

Fairly thorough refactor. CPIP now tested on Python 2.7, 3.3. Version 0.9.1. Updated documentation.

Alpha Release (2012-03-25)

Very little functional change. CPIP now tested on Python 2.6, 2.7, 3.2. Added loads of documentation.

Alpha Release (2011-07-14)

This is a pre-release of CPIP. It is tested on BSD/Linux, it will probably work on Windows (although some unit tests will fail on that platform).

Project started in 2008.

Indices and tables