MultiPassString

Converts an ITU to HTML.

class cpip.util.MultiPassString.MultiPassString(theFileObj)

Reads a file, the file can be translated any number of times and marked with word types. The latter can then be generated using BufGen for example:

myBg = BufGen.BufGen(self._mps.genChars())
try:
    i = 0
    while 1:
        print myBg[i]
        i += 1
except IndexError:
    pass
_MultiPassString__set(idx, repl)

Sets a marker.

Parameters:
  • idx (int) – Index of marker.
  • repl (str) – The marker.
Returns:

NoneType

Raises:

ExceptionMultiPass in index error.

__init__(theFileObj)

Constructor.

Parameters:theFileObj (_io.TextIOWrapper) – File like object.
Returns:NoneType
__weakref__

list of weak references to the object (if defined)

_retZeroIndex()

Returns the index to the first non-empty token in the current list.

Returns:int – The index.
clearMarker()

The mark at this point in the input.

Returns:NoneType
genChars()

Generates the current set of characters. This can be used as the generator for the BufGen and that BufGen can be passed to the PpTokeniser _slice... Functions.

genWords()

Generates pairs (word, type) from the original string.

TODO: Solve the overlap problem.

Returns:NoneType, tuple([str, str]) – a pair of (word, type)
hasWord

True if the length of the current word is > 0.

prevChar

The previous character of the input. A slight nod to K&R this is (not) a bit like putc().

removeMarkedWord(isTerm)

Remove the current marked word. isTerm is a boolean that is True if the current position is a terminal character.

For example if you want to split a string into lines then n is a terminal character and you would call this with isTerm=True.

However if you were splitting a string into words and whitespace then a whitespace following a word is not the terminal character so at the pint of receiving the whitespace character you would call this with isTerm=False

Parameters:isTerm (bool) – True if a terminal character.
Returns:NoneType
removeSetReplaceClear(isTerm, theType, theRepl)

This provides a helper combination function for a common operation of:

  • Removing the marked word from the output.
  • Setting the word type in the input.
  • Replacing the marked word with a replacement string.
  • Clearing the current marker.
Parameters:
Returns:

NoneType

setAtMarker(theRepl)

Sets the token at the current marker to be theRepl.

Parameters:theRepl (str) – The repl.
Returns:NoneType
Raises:ExceptionMultiPass no marker present.
setMarker()

Sets a mark at this point in the input.

Returns:NoneType
setWordType(theType, isTerm)

Marks a word in the input as a word of type theType starting from the marker up to the current place.

See removeMarkedWord() for an explanation of isTerm.

Parameters:
  • theType (str) – The type.
  • isTerm (bool) – Is a terminal character.
Returns:

NoneType

wordLength

The length of the current word.

Returns:int – The length.
class cpip.util.MultiPassString.Word(wordLen, wordType)
__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, wordLen, wordType)

Create new instance of Word(wordLen, wordType)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new Word object from a sequence or iterable

_replace(_self, **kwds)

Return a new Word object replacing specified fields with new values

wordLen

Alias for field number 0

wordType

Alias for field number 1