Cell Patterns

See the cell pattern tutorial for examples.

CellPatterns normalize fields’ data values

Classes

Boolean([default_value]) Normalize cell values to booleans.
Digit([default_value]) Normalize cell values to an integer between 0-9.
Float([default_value]) Normalize cell values to float.
Integer([default_value]) Normalize cell values to int.
IntegerList() Normalize cell values to list of int.
String([default_value]) Normalizes cell values to str.
StringChoice(choices[, dict_use_keys, …]) Return “choice” that best fits the cell value.
StringChoiceMulti(choices[, case_sensitive]) Check cell for desired strings.
WordList([default_value]) Normalize cell values to a list of words (no digits, no punctuation).
class fuzzytable.cellpatterns.Boolean(default_value=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to booleans.

# warm_colors.py

from fuzzytable import FuzzyTable, FieldPattern, cellpatterns

iswarmcolor_field = FieldPattern(
    name="is_warm_color",
    cellpattern=cellpatterns.Boolean,
)

warmcolor_table = FuzzyTable(
    path='warm_colors.csv',
    fields=['color', boolean_field],
    approximate_match=True,
)
warm_colors.csv
color is warm color
brown True
green False
yellow yes
black  
>>> python warm_colors.py
>>> for record in warmcolor_table.records
...     print(record)
...
{'color': 'brown', 'is_warm_color': True}
{'color': 'green', 'is_warm_color': False}
{'color': 'yellow', 'is_warm_color': True}
{'color': 'black', 'is_warm_color': False}
class fuzzytable.cellpatterns.Digit(default_value=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to an integer between 0-9.

class fuzzytable.cellpatterns.Float(default_value=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to float.

class fuzzytable.cellpatterns.Integer(default_value=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to int.

class fuzzytable.cellpatterns.IntegerList

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to list of int.

class fuzzytable.cellpatterns.String(default_value='')

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalizes cell values to str.

class fuzzytable.cellpatterns.StringChoice(choices, dict_use_keys=True, default=None, approximate_match=False, min_ratio=0.6, case_sensitive=False, contains_match=True, mode=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Return “choice” that best fits the cell value.

This pattern operates in one of these three modes: - exact - approx - contains

# cities.py

from fuzzytable import FuzzyTable, FieldPattern, cellpatterns

state_field = FieldPattern(
    name="states",
    cellpattern=cellpatterns.StringChoice(
        choices='pennsylvania new_york north_carolina'.split()
        case_sensitive=False,
        approximate_match=True,
        min_ratio=0.5,
    ),
)

cities_table = FuzzyTable(path='cities.csv', fields=['city', state_field],)
cities.csv
city state
New York New York
Philadelphia Pennsylvania
Albany new york
Raleigh North Carolina
Wilmington Delaware
>>> python cities.py
>>> for record in cities_table.records
...     print(record)
...
{'city': 'New York', 'state': 'new_york'}
{'city': 'Philadelphia', 'state': 'pennsylvania'}
{'city': 'Albany', 'state': 'new_york'}
{'city': 'Raleigh', 'state': 'north_carolina'}
{'city': 'Wilmington', 'state': None}
Parameters:
  • choices (sequence of strings or dict whose values are sequences of strings) – the key is what is returned.
  • case_sensitive (bool, default False) –
  • dict_use_keys (bool, default True) – the keys of the dict will be used as search terms.
  • default (Any, default None) – any of the choices given.
  • approximate_match (bool, default False) – Deprecated in v0.18. To be removed in v1.0. Use mode instead. True overrides contains_match.
  • min_ratio (float within [0.0, 1.0], default 0.6) –
  • case_sensitive
  • contains_match (bool, default True) – Deprecated in v0.18. To be removed in v1.0. Use mode instead.
  • mode (None or str) – Choose from 'exact', 'approx', or 'contains'. mode overrides approximate_match and contains_match.
class fuzzytable.cellpatterns.StringChoiceMulti(choices: List[str], case_sensitive=True)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Check cell for desired strings. Return list of found strings.

# colors.py

from fuzzytable import FuzzyTable, FieldPattern, cellpatterns

warm_color_field = FieldPattern(
    name="warm_colors",
    cellpattern=cellpatterns.StringChoiceMulti(
        choices='red pink brown yellow'.split()
        case_sensitive=False,
    ),
)

colors_table = FuzzyTable(
    path='colors.csv',
    fields=[warm_color_field, 'cool_colors'],
    approximate_match=True,
)
colors.csv
warm colors cool colors
brown Yellow Red  
Brown green
yellow red blue
  black
>>> python colors.py
>>> for record in colors_table.records
...     print(record)
...
{'warm_colors': ['red', 'brown', 'yellow'], 'cool_colors': None}
{'warm_colors': ['brown'], 'cool_colors': 'green'}
{'warm_colors': ['red', 'yellow'], 'cool_colors': 'blue'}
{'warm_colors': [], 'cool_colors': 'black'}
Parameters:
  • choices (sequence of strings) –
  • case_sensitive (bool, default True) –
class fuzzytable.cellpatterns.WordList(default_value=None)

Bases: fuzzytable.patterns.cellpattern.CellPattern

Normalize cell values to a list of words (no digits, no punctuation).