MultiPassString¶
Converts an ITU to HTML.
-
class
cpip.util.MultiPassString.MultiPassString(theFileObj)¶ Reads a file, the file can be translated any number of times and marked with word types. The latter can then be generated using BufGen for example:
myBg = BufGen.BufGen(self._mps.genChars()) try: i = 0 while 1: print myBg[i] i += 1 except IndexError: pass
-
_MultiPassString__set(idx, repl)¶ Sets a marker.
Parameters: - idx (
int) – Index of marker. - repl (
str) – The marker.
Returns: NoneTypeRaises: ExceptionMultiPassin index error.- idx (
-
__init__(theFileObj)¶ Constructor.
Parameters: theFileObj ( _io.TextIOWrapper) – File like object.Returns: NoneType
-
__weakref__¶ list of weak references to the object (if defined)
-
_retZeroIndex()¶ Returns the index to the first non-empty token in the current list.
Returns: int– The index.
-
clearMarker()¶ The mark at this point in the input.
Returns: NoneType
-
genChars()¶ Generates the current set of characters. This can be used as the generator for the BufGen and that BufGen can be passed to the
PpTokeniser _slice...Functions.
-
genWords()¶ Generates pairs
(word, type)from the original string.TODO: Solve the overlap problem.
Returns: NoneType,tuple([str, str])– a pair of(word, type)
-
hasWord¶ True if the length of the current word is > 0.
-
prevChar¶ The previous character of the input. A slight nod to K&R this is (not) a bit like putc().
-
removeMarkedWord(isTerm)¶ Remove the current marked word. isTerm is a boolean that is True if the current position is a terminal character.
For example if you want to split a string into lines then n is a terminal character and you would call this with
isTerm=True.However if you were splitting a string into words and whitespace then a whitespace following a word is not the terminal character so at the pint of receiving the whitespace character you would call this with
isTerm=FalseParameters: isTerm ( bool) – True if a terminal character.Returns: NoneType
-
removeSetReplaceClear(isTerm, theType, theRepl)¶ This provides a helper combination function for a common operation of:
- Removing the marked word from the output.
- Setting the word type in the input.
- Replacing the marked word with a replacement string.
- Clearing the current marker.
Parameters: - isTerm (
bool) – SeeremoveMarkedWord()for an explanation of this. - theType (
str) – SeeremoveMarkedWord()for an explanation of this. - theRepl (
str) – SeesetAtMarker()for an explanation of this.
Returns: NoneType
-
setAtMarker(theRepl)¶ Sets the token at the current marker to be theRepl.
Parameters: theRepl ( str) – The repl.Returns: NoneTypeRaises: ExceptionMultiPassno marker present.
-
setMarker()¶ Sets a mark at this point in the input.
Returns: NoneType
-
setWordType(theType, isTerm)¶ Marks a word in the input as a word of type theType starting from the marker up to the current place.
See
removeMarkedWord()for an explanation ofisTerm.Parameters: - theType (
str) – The type. - isTerm (
bool) – Is a terminal character.
Returns: NoneType- theType (
-
wordLength¶ The length of the current word.
Returns: int– The length.
-
-
class
cpip.util.MultiPassString.Word(wordLen, wordType)¶ -
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, wordLen, wordType)¶ Create new instance of Word(wordLen, wordType)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)¶ Make a new Word object from a sequence or iterable
-
_replace(_self, **kwds)¶ Return a new Word object replacing specified fields with new values
-
wordLen¶ Alias for field number 0
-
wordType¶ Alias for field number 1
-