regexparser
regexparser
helps with the painful situation of having a bunch of small
parsing functions spread for all over the code.
Frequently I have to parse text into float
, int
and date
objects.
The regexparser.TextParser
class to isolates the parsing task,
it groups the parsing rules in a hierachy of classes that can be easily reused
in different projects.
Install
pip install regexparser
pip
install from github:
pip install git+https://github.com/wilsonfreitas/regexparser.git
Using
Create a class that inherits regexparser.TextParser
and write methods with names starting with parse
.
These methods must accept 2 arguments after self
.
These arguments are the text
that will be parsed and the re.Match
that is returned by applying the regular expression to the text
.
The parse*
methods are called only if its regular expression matches the given text and their regular expressions are set in the methods' doc string.
regexparser
provides a compact way of applying transformation rules and that rules don't have to be spread out along the code.
The following code shows how to create text parsing rules for a tew text chunks in portuguese.
class PortugueseRulesParser(TextParser):
# transform Sim and Não into boolean True and False, ignoring case
def parseBoolean_ptBR(self, text, match):
r'^(sim|Sim|SIM|n.o|N.o|N.O)$'
return text[0].lower() == 's'
# transform Verdadeiro and Falso into boolean True and False, ignoring case
def parseBoolean_ptBR2(self, text, match):
r'^(verdadeiro|VERDADEIRO|falso|FALSO|V|F|v|f)$'
return text[0].lower() == 'v'
# parses a decimal number
def parse_number_decimal_ptBR(self, text, match):
r'^-?\s*\d+,\d+?$'
text = text.replace(',', '.')
return eval(text)
# parses number with thousands
def parse_number_with_thousands_ptBR(self, text, match):
r'^-?\s*(\d+\.)+\d+,\d+?$'
text = text.replace('.', '')
text = text.replace(',', '.')
return eval(text)
parser = PortugueseRulesParser()
assert parser.parse('1,1') == 1.1
assert parser.parse('-1,1') == -1.1
assert parser.parse('- 1,1') == -1.1
assert parser.parse('Wálson') == 'Wálson'
assert parser.parse('1.100,01') == 1100.01