Cell Patterns¶
Cell patterns allow you to normalize the data on a per-field basis.
fuzzytable Built-Ins¶
Let’s say you have this very simple table:
Name | DOB |
---|---|
John Doe | 1995 |
123 | 2003OCt |
Jane Doe | 1964 |
True | 2019 |
You want to ensure the first field (Name) returns only strings and the Birthday column returns integers.
# birthdays.py
from fuzzytable import FuzzyTable, FieldPattern, cellpatterns
name_field = FieldPattern(
name="name",
alias="Name",
cellpattern=cellpatterns.String,
)
birthday_field = FieldPattern(
name="birthday",
alias="DOB",
cellpattern=cellpatterns.Integer,
)
birthday_table = FuzzyTable(
path='birthdays.csv',
fields=[name_field, birthday_field],
)
>>> python birthdays.py
>>> for record in birthday_table.records
... print(record)
...
{'name': 'John Doe', 'birthday': 1995}
{'name': '123', 'birthday': 2003}
{'name': 'Jane Doe', 'birthday': 1964}
{'name': '', 'birthday': 2019}
Custom Callables¶
You’re not limited to the built-in cell patterns provided by fuzzytable. You can also create your own.
In this example, we’ll return the type of the variable in the Name
column.
# birthdays_customcallable.py
from fuzzytable import FuzzyTable, FieldPattern, cellpatterns
# This is new
def value_type(value):
return type(value)
name_field = FieldPattern(
name="name",
alias="Name",
cellpattern=value_type, # This is new.
# Note that we don't call the function here.
)
birthday_field = FieldPattern(
name="birthday",
alias="DOB",
cellpattern=cellpatterns.Integer,
)
birthday_table = FuzzyTable(
path='birthdays.csv',
fields=[name_field, birthday_field],
)
>>> python birthdays_customcallable.py
>>> for record in birthday_table.records
... print(record)
...
{'name': <class 'str'>, 'birthday': 1995}
{'name': <class 'int'>, 'birthday': 2003}
{'name': <class 'str'>, 'birthday': 1964}
{'name': <class 'bool'>, 'birthday': 2019}
Piping¶
You can also string multiple cell patterns together.
Here we’ll pipe the String
output into a custom first_char
function.
# birthdays_piping.py
from fuzzytable import FuzzyTable, FieldPattern, cellpatterns
# This is new
def first_char(value):
# Because we are piping the output of cellpattern.String
# to this function, value is guaranteed to be a string.
# Therefore, the only potential exception we have to
# handle is IndexError (in case value = '').
try:
return value[0]
except IndexError:
return ''
name_field = FieldPattern(
name="name",
alias="Name",
cellpattern=[cellpatterns.String, first_char], # This is new
)
# replace `birthday_field` with a string literal 'DOB' below.
birthday_table = FuzzyTable(
path='birthdays.csv',
fields=[name_field, 'DOB'],
)
>>> python birthdays_piping.py
>>> for record in birthday_table.records
... print(record['name'])
...
'J'
'1'
'J'
''