Field Sorting¶
Summary¶
- By default,
ft.fields
provides a list sorted by column number in ascending order. - If you passed a sequence of desired fields (
FuzzyTable(fields=...)
), theft.fields
fields will be sorted in the same order. MultiField
subfields are sorted by column number
Details¶
Let’s read the following csv file.
first_name | last_name | birthday |
---|---|---|
John | Doe | 3-Mar-85 |
Typhoid | Mary | 2-Aug-83 |
Jane | Smith | 23-Oct-46 |
Let’s extract this table using all the default settings.
Notice how ft.fields
returns the three fields in the same order they appear in the spreadsheet.
>>> import fuzzytable
>>> ft = fuzzytable.FuzzyTable('birthdays.csv')
>>> [field.name for field in ft.fields]
['first_name', 'last_name', 'birthday']
But now let’s extract the table using the field
parameter.
Notice now how ft.fields
returns the fields in the order as passed to FuzzyTable.
>>> ft = fuzzytable.FuzzyTable(
>>> 'birthdays.csv',
>>> fields=['last name', 'first name'],
>>> )
>>> [field.name for field in ft.fields]
['last_name', 'first_name']
>>>
But what if you don’t want them in that order?
What if you’d prefer to have them sorted in the order they appear in the spreadsheet? Easy.
Because ft.fields
is a list, you can sort it in place. See https://docs.python.org/3/howto/sorting.html for more details.
>>> from operator import attrgetter
>>> ft.fields.sort(key=attrgetter('col_num'))
>>> [field.name for field in ft.fields]
['first_name', 'last_name']
The subfields collected by a multifield=True FieldPattern
are stored in spreadsheet order.
The col_num
of a multifield FieldPattern
is equal to the smallest col_num
of its subfields.
>>> names = fuzzytable.FieldPattern(
>>> name='names',
>>> alias=['last name', 'first name'],
>>> multifield=True,
>>> approximate_match=True,
>>> )
>>>
>>> ft = fuzzytable.FuzzyTable(
>>> 'birthdays.csv',
>>> fields=[names, 'birthday'],
>>> )
>>>
>>> [field.name for field in ft.fields]
['names', 'birthday']
>>> namesfield = ft.get_field('names')
>>> namesfield.headers
['first_name', 'last_name']
>>> # Note that these are in their original table order, not the alias order.