Utility functions

petl.header(table)

Return the header row for the given table. E.g.:

>>> from petl import header
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> header(table)
['foo', 'bar']

See also fieldnames().

petl.data(table, *sliceargs)

Return a container supporting iteration over data rows in a given table. I.e., like iterdata() only a container is returned so you can iterate over it multiple times.

Changed in version 0.10.

Now returns a container, previously returned an iterator. See also iterdata().

petl.iterdata(table, *sliceargs)

Return an iterator over the data rows for the given table. E.g.:

>>> from petl import data
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> it = iterdata(table)
>>> it.next()
['a', 1]
>>> it.next()
['b', 2]

Changed in version 0.3.

Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().

Changed in version 0.10.

Renamed from “data”.

petl.dataslice(table, *args)

Deprecated since version 0.3.

Use data() instead, it supports slice arguments.

petl.fieldnames(table)

Return the string values of all fields for the given table. If the fields are strings, then this function is equivalent to header(), i.e.:

>>> from petl import header, fieldnames
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> header(table)
['foo', 'bar']
>>> fieldnames(table)
['foo', 'bar']
>>> header(table) == fieldnames(table)
True

Allows for custom field objects, e.g.:

>>> class CustomField(object):
...     def __init__(self, id, description):
...         self.id = id
...         self.description = description
...     def __str__(self):
...         return self.id
...     def __repr__(self):
...         return 'CustomField(%r, %r)' % (self.id, self.description)
... 
>>> table = [[CustomField('foo', 'Get some foo.'), CustomField('bar', 'A lot of bar.')], 
...          ['a', 1], 
...          ['b', 2]]
>>> header(table)
[CustomField('foo', 'Get some foo.'), CustomField('bar', 'A lot of bar.')]
>>> fieldnames(table)    
['foo', 'bar']
petl.nrows(table)

Count the number of data rows in a table. E.g.:

>>> from petl import nrows
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> nrows(table)
2

Changed in version 0.10.

Renamed from ‘rowcount’ to ‘nrows’.

petl.look(table, *sliceargs, **kwargs)

Format a portion of the table as text for inspection in an interactive session. E.g.:

>>> from petl import look
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

Any irregularities in the length of header and/or data rows will appear as blank cells, e.g.:

>>> table = [['foo', 'bar'], ['a'], ['b', 2, True]]
>>> look(table)
+-------+-------+------+
| 'foo' | 'bar' |      |
+=======+=======+======+
| 'a'   |       |      |
+-------+-------+------+
| 'b'   | 2     | True |
+-------+-------+------+

Changed in version 0.3.

Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().

Changed in version 0.8.

The properties n and p can be used to look at the next and previous rows respectively. I.e., try >>> look(table) then >>> _.n then >>> _.p.

Changed in version 0.13.

Three alternative presentation styles are available: ‘grid’, ‘simple’ and ‘minimal’, where ‘grid’ is the default. A different style can be specified using the style keyword argument, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> look(table, style='simple')
=====  =====
'foo'  'bar'
=====  =====
'a'        1
'b'        2
=====  =====

>>> look(table, style='minimal')
'foo'  'bar'
'a'        1
'b'        2

The default style can also be changed, e.g.:

>>> look.default_style = 'simple'
>>> look(table)
=====  =====
'foo'  'bar'
=====  =====
'a'        1
'b'        2
=====  =====

>>> look.default_style = 'grid'
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   |     1 |
+-------+-------+
| 'b'   |     2 |
+-------+-------+    

See also lookall() and see().

petl.lookall(table, **kwargs)

Format the entire table as text for inspection in an interactive session.

N.B., this will load the entire table into memory.

petl.see(table, *sliceargs)

Format a portion of a table as text in a column-oriented layout for inspection in an interactive session. E.g.:

>>> from petl import see
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> see(table)
'foo': 'a', 'b'
'bar': 1, 2

Useful for tables with a larger number of fields.

Changed in version 0.3.

Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().

petl.values(table, *field, **kwargs)

Return a container supporting iteration over values in a given field or fields. I.e., like itervalues() only a container is returned so you can iterate over it multiple times.

Changed in version 0.7.

Now returns a container, previously returned an iterator. See also itervalues().

Changed in version 0.24.

Multiple fields can be provided as positional arguments. The sliceargs argument has been removed, if you need to slice the result this function returns a container that can be sliced with the standard Python suffix notation.

petl.itervalues(table, *field, **kwargs)

Return an iterator over values in a given field or fields. E.g.:

>>> from petl import itervalues
>>> table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]]
>>> foo = itervalues(table, 'foo')
>>> foo.next()
'a'
>>> foo.next()
'b'
>>> foo.next()
'b'
>>> foo.next()
'c'
>>> foo.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.

If rows are uneven, the value of the keyword argument missing is returned.

More than one field can be selected, e.g.:

>>> table = [['foo', 'bar', 'baz'],
...          [1, 'a', True],
...          [2, 'bb', True],
...          [3, 'd', False]]
>>> foobaz = itervalues(table, 'foo', 'baz')
>>> foobaz.next()
(1, True)
>>> foobaz.next()
(2, True)
>>> foobaz.next()
(3, False)
>>> foobaz.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Changed in version 0.3.

Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().

Changed in version 0.7.

In previous releases this function was known as ‘values’. Also in this release the behaviour with short rows is changed. Now for any value missing due to a short row, None is returned by default, or whatever is given by the missing keyword argument.

Changed in version 0.24.

The sliceargs argument has been removed, if you need to slice the result use values() which returns a container that can be sliced with the standard Python suffix notation, or use itertools.islice().

petl.valueset(table, field, missing=None)

Deprecated since version 0.3.

Use set(values(table, *fields)) instead, see also values().

petl.valuecount(table, field, value, missing=None)

Count the number of occurrences of value under the given field. Returns the absolute count and relative frequency as a pair. E.g.:

>>> from petl import valuecount
>>> table = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 7))
>>> n, f = valuecount(table, 'foo', 'b')
>>> n
2
>>> f
0.6666666666666666

The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.

petl.valuecounts(table, *field, **kwargs)

Find distinct values for the given field and count the number and relative frequency of occurrences. Returns a table mapping values to counts, with most common values first. E.g.:

>>> from petl import look, valuecounts
>>> look(table)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   |  True |  0.12 |
+-------+-------+-------+
| 'a'   |  True |  0.17 |
+-------+-------+-------+
| 'b'   | False |  0.34 |
+-------+-------+-------+
| 'b'   | False |  0.44 |
+-------+-------+-------+
| 'b'   |       |       |
+-------+-------+-------+

>>> look(valuecounts(table, 'foo'))
+-------+---------+--------------------+
| 'foo' | 'count' | 'frequency'        |
+=======+=========+====================+
| 'b'   |       4 | 0.6666666666666666 |
+-------+---------+--------------------+
| 'a'   |       2 | 0.3333333333333333 |
+-------+---------+--------------------+

>>> look(valuecounts(table, 'foo', 'bar'))
+-------+-------+---------+---------------------+
| 'foo' | 'bar' | 'count' | 'frequency'         |
+=======+=======+=========+=====================+
| 'b'   | False |       3 |                 0.5 |
+-------+-------+---------+---------------------+
| 'a'   |  True |       2 |  0.3333333333333333 |
+-------+-------+---------+---------------------+
| 'b'   | None  |       1 | 0.16666666666666666 |
+-------+-------+---------+---------------------+

If rows are short, the value of the keyword argument missing is counted.

Changed in version 0.24.

Multiple fields can be given as positional arguments. If multiple fields are given, these are now treated as a compound key. Also the field name is used instead of ‘key’ in the output table.

petl.valuecounter(table, *field, **kwargs)

Find distinct values for the given field and count the number of occurrences. Returns a dict mapping values to counts. E.g.:

>>> from petl import valuecounter
>>> table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]]
>>> c = valuecounter(table, 'foo')
>>> c['a']
1
>>> c['b']
2
>>> c['c']
1
>>> c
Counter({'b': 2, 'a': 1, 'c': 1})

The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.

petl.dicts(table, *sliceargs, **kwargs)

Return a container supporting iteration over rows as dicts. I.e., like iterdicts() only a container is returned so you can iterate over it multiple times.

New in version 0.15.

petl.iterdicts(table, *sliceargs, **kwargs)

Return an iterator over the data in the table, yielding each row as a dictionary of values indexed by field name. E.g.:

>>> from petl import dicts
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2]]
>>> it = dicts(table)
>>> it.next()
{'foo': 'a', 'bar': 1}
>>> it.next()
{'foo': 'b', 'bar': 2}
>>> it.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Short rows are padded, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b']]
>>> it = dicts(table)
>>> it.next()
{'foo': 'a', 'bar': 1}
>>> it.next()
{'foo': 'b', 'bar': None}
>>> it.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

New in version 0.15.

petl.namedtuples(table, *sliceargs, **kwargs)

View the table as a container of named tuples. I.e., like iternamedtuples() only a container is returned so you can iterate over it multiple times.

New in version 0.15.

petl.iternamedtuples(table, *sliceargs, **kwargs)

Return an iterator over the data in the table, yielding each row as a named tuple.

New in version 0.15.

petl.records(table, *sliceargs, **kwargs)

Return a container supporting iteration over rows as records. I.e., like iterrecords() only a container is returned so you can iterate over it multiple times. See also dicts().

Changed in version 0.15.

Previously returned dicts, now returns hybrid objects which behave like tuples/dicts/namedtuples.

petl.iterrecords(table, *sliceargs, **kwargs)

Return an iterator over the data in the table, where rows support value access by index or field name. See also iterdicts().

Changed in version 0.15.

Previously returned dicts, now returns hybrid objects which behave like tuples/dicts/namedtuples.

petl.columns(table, missing=None)

Construct a dict mapping field names to lists of values. E.g.:

>>> from petl import columns
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> cols = columns(table)
>>> cols['foo']
['a', 'b', 'b']
>>> cols['bar']    
[1, 2, 3]

See also facetcolumns().

petl.facetcolumns(table, key, missing=None)

Like columns() but stratified by values of the given key field. E.g.:

>>> from petl import facetcolumns
>>> table = [['foo', 'bar', 'baz'], 
...          ['a', 1, True], 
...          ['b', 2, True], 
...          ['b', 3]]
>>> fc = facetcolumns(table, 'foo')
>>> fc['a']
{'baz': [True], 'foo': ['a'], 'bar': [1]}
>>> fc['b']
{'baz': [True, None], 'foo': ['b', 'b'], 'bar': [2, 3]}
>>> fc['c']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'c'

New in version 0.8.

petl.isunique(table, field)

Return True if there are no duplicate values for the given field(s), otherwise False. E.g.:

>>> from petl import isunique
>>> table = [['foo', 'bar'], ['a', 1], ['b'], ['b', 2], ['c', 3, True]]
>>> isunique(table, 'foo')
False
>>> isunique(table, 'bar')
True

The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes.

Changed in version 0.10.

Renamed from “unique”. See also petl.unique().

petl.isordered(table, key=None, reverse=False, strict=False)

Return True if the table is ordered (i.e., sorted) by the given key. E.g.:

>>> from petl import isordered, look
>>> look(table)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 1     | True  |
+-------+-------+-------+
| 'b'   | 3     | True  |
+-------+-------+-------+
| 'b'   | 2     |       |
+-------+-------+-------+

>>> isordered(table, key='foo')
True
>>> isordered(table, key='foo', strict=True)
False
>>> isordered(table, key='foo', reverse=True)
False

New in version 0.10.

petl.limits(table, field)

Find minimum and maximum values under the given field. E.g.:

>>> from petl import limits
>>> t1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> minv, maxv = limits(t1, 'bar')
>>> minv
1
>>> maxv
3

The field argument can be a field name or index (starting from zero).

petl.stats(table, field)

Calculate basic descriptive statistics on a given field. E.g.:

>>> from petl import stats
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 1, 2],
...          ['B', '2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', 'xyz', 9.0],
...          ['E', None]]
>>> stats(table, 'bar')    
{'count': 3, 'errors': 2, 'min': 1.0, 'max': 3.0, 'sum': 6.0, 'mean': 2.0}

The field argument can be a field name or index (starting from zero).

petl.lenstats(table, field)[source]

Convenience function to report statistics on value lengths under the given field. E.g.:

>>> from petl import lenstats    
>>> table1 = [['foo', 'bar'],
...           [1, 'a'],
...           [2, 'aaa'],
...           [3, 'aa'],
...           [4, 'aaa'],
...           [5, 'aaaaaaaaaaa']]
>>> lenstats(table1, 'bar')
{'count': 5, 'errors': 0, 'min': 1.0, 'max': 11.0, 'sum': 20.0, 'mean': 4.0}
petl.stringpatterns(table, field)

Profile string patterns in the given field, returning a table of patterns, counts and frequencies. E.g.:

>>> from petl import stringpatterns, look    
>>> table = [['foo', 'bar'],
...          ['Mr. Foo', '123-1254'],
...          ['Mrs. Bar', '234-1123'],
...          ['Mr. Spo', '123-1254'],
...          [u'Mr. Baz', u'321 1434'],
...          [u'Mrs. Baz', u'321 1434'],
...          ['Mr. Quux', '123-1254-XX']]
>>> foopats = stringpatterns(table, 'foo')
>>> look(foopats)
+------------+---------+---------------------+
| 'pattern'  | 'count' | 'frequency'         |
+============+=========+=====================+
| 'Aa. Aaa'  | 3       | 0.5                 |
+------------+---------+---------------------+
| 'Aaa. Aaa' | 2       | 0.3333333333333333  |
+------------+---------+---------------------+
| 'Aa. Aaaa' | 1       | 0.16666666666666666 |
+------------+---------+---------------------+

>>> barpats = stringpatterns(table, 'bar')
>>> look(barpats)
+---------------+---------+---------------------+
| 'pattern'     | 'count' | 'frequency'         |
+===============+=========+=====================+
| '999-9999'    | 3       | 0.5                 |
+---------------+---------+---------------------+
| '999 9999'    | 2       | 0.3333333333333333  |
+---------------+---------+---------------------+
| '999-9999-AA' | 1       | 0.16666666666666666 |
+---------------+---------+---------------------+

New in version 0.5.

petl.stringpatterncounter(table, field)

Profile string patterns in the given field, returning a dict mapping patterns to counts.

New in version 0.5.

petl.rowlengths(table)

Report on row lengths found in the table. E.g.:

>>> from petl import look, rowlengths
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 1, 2],
...          ['B', '2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', 'xyz', 9.0],
...          ['E', None],
...          ['F', 9]]
>>> look(rowlengths(table))
+----------+---------+
| 'length' | 'count' |
+==========+=========+
| 3        | 3       |
+----------+---------+
| 2        | 2       |
+----------+---------+
| 4        | 1       |
+----------+---------+

Useful for finding potential problems in data files.

petl.typecounts(table, field, **kwargs)

Count the number of values found for each Python type and return a table mapping class names to counts and frequencies. E.g.:

>>> from petl import look, typecounts
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 1, 2],
...          ['B', u'2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', u'xyz', 9.0],
...          ['E', 42]]
>>> look(typecounts(table, 'foo'))
+-----------+---------+-------------+
| 'type'    | 'count' | 'frequency' |
+===========+=========+=============+
| 'str'     | 4       | 0.8         |
+-----------+---------+-------------+
| 'unicode' | 1       | 0.2         |
+-----------+---------+-------------+

>>> look(typecounts(table, 'bar'))
+-----------+---------+-------------+
| 'type'    | 'count' | 'frequency' |
+===========+=========+=============+
| 'unicode' | 3       | 0.6         |
+-----------+---------+-------------+
| 'int'     | 2       | 0.4         |
+-----------+---------+-------------+

>>> look(typecounts(table, 'baz'))
+-----------+---------+-------------+
| 'type'    | 'count' | 'frequency' |
+===========+=========+=============+
| 'int'     | 1       | 0.25        |
+-----------+---------+-------------+
| 'float'   | 1       | 0.25        |
+-----------+---------+-------------+
| 'unicode' | 1       | 0.25        |
+-----------+---------+-------------+
| 'str'     | 1       | 0.25        |
+-----------+---------+-------------+

The field argument can be a field name or index (starting from zero).

Changed in version 0.6.

Added frequency.

petl.typecounter(table, field)

Count the number of values found for each Python type. E.g.:

>>> from petl import typecounter
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 1, 2],
...          ['B', u'2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', u'xyz', 9.0],
...          ['E', 42]]
>>> typecounter(table, 'foo')
Counter({'str': 4, 'unicode': 1})
>>> typecounter(table, 'bar')
Counter({'unicode': 3, 'int': 2})
>>> typecounter(table, 'baz')
Counter({'int': 1, 'float': 1, 'unicode': 1, 'str': 1})

The field argument can be a field name or index (starting from zero).

petl.typeset(table, field)

Return a set containing all Python types found for values in the given field. E.g.:

>>> from petl import typeset
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 1, '2'],
...          ['B', u'2', '3.4'],
...          [u'B', u'3', '7.8', True],
...          ['D', u'xyz', 9.0],
...          ['E', 42]]
>>> typeset(table, 'foo') 
set([<type 'str'>, <type 'unicode'>])
>>> typeset(table, 'bar') 
set([<type 'int'>, <type 'unicode'>])
>>> typeset(table, 'baz') 
set([<type 'float'>, <type 'str'>])

The field argument can be a field name or index (starting from zero).

petl.parsecounts(table, field, parsers={'int': <type 'int'>, 'float': <type 'float'>})

Count the number of str or unicode values that can be parsed as ints, floats or via custom parser functions. Return a table mapping parser names to the number of values successfully parsed and the number of errors. E.g.:

>>> from petl import look, parsecounts
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 'aaa', 2],
...          ['B', u'2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', '3.7', 9.0],
...          ['E', 42]]
>>> look(parsecounts(table, 'bar'))
+---------+---------+----------+
| 'type'  | 'count' | 'errors' |
+=========+=========+==========+
| 'float' | 3       | 1        |
+---------+---------+----------+
| 'int'   | 2       | 2        |
+---------+---------+----------+

The field argument can be a field name or index (starting from zero).

petl.parsecounter(table, field, parsers={'int': <type 'int'>, 'float': <type 'float'>})

Count the number of str or unicode values under the given fields that can be parsed as ints, floats or via custom parser functions. Return a pair of Counter objects, the first mapping parser names to the number of strings successfully parsed, the second mapping parser names to the number of errors. E.g.:

>>> from petl import parsecounter
>>> table = [['foo', 'bar', 'baz'],
...          ['A', 'aaa', 2],
...          ['B', u'2', '3.4'],
...          [u'B', u'3', u'7.8', True],
...          ['D', '3.7', 9.0],
...          ['E', 42]]
>>> counter, errors = parsecounter(table, 'bar')
>>> counter
Counter({'float': 3, 'int': 2})
>>> errors
Counter({'int': 2, 'float': 1})

The field argument can be a field name or index (starting from zero).

petl.dateparser(fmt, strict=True)

Return a function to parse strings as datetime.date objects using a given format. E.g.:

>>> from petl import dateparser
>>> isodate = dateparser('%Y-%m-%d')
>>> isodate('2002-12-25')
datetime.date(2002, 12, 25)
>>> isodate('2002-02-30')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1032, in parser
    return parser
  File "/usr/lib/python2.7/_strptime.py", line 440, in _strptime
    datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month

Can be used with parsecounts(), e.g.:

>>> from petl import look, parsecounts, dateparser
>>> table = [['when', 'who'],
...          ['2002-12-25', 'Alex'],
...          ['2004-09-12', 'Gloria'],
...          ['2002-13-25', 'Marty'],
...          ['2002-02-30', 'Melman']]
>>> parsers={'date': dateparser('%Y-%m-%d')}
>>> look(parsecounts(table, 'when', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'date' | 2       | 2        |
+--------+---------+----------+

Changed in version 0.6.

Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.

petl.timeparser(fmt, strict=True)

Return a function to parse strings as datetime.time objects using a given format. E.g.:

>>> from petl import timeparser
>>> isotime = timeparser('%H:%M:%S')
>>> isotime('00:00:00')
datetime.time(0, 0)
>>> isotime('13:00:00')
datetime.time(13, 0)
>>> isotime('12:00:99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1046, in parser
    
  File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
    data_string[found.end():])
ValueError: unconverted data remains: 9
>>> isotime('25:00:00')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1046, in parser
    
  File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
    (data_string, format))
ValueError: time data '25:00:00' does not match format '%H:%M:%S'

Can be used with parsecounts(), e.g.:

>>> from petl import look, parsecounts, timeparser
>>> table = [['when', 'who'],
...          ['00:00:00', 'Alex'],
...          ['12:02:45', 'Gloria'],
...          ['25:01:01', 'Marty'],
...          ['09:70:00', 'Melman']]
>>> parsers={'time': timeparser('%H:%M:%S')}
>>> look(parsecounts(table, 'when', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'time' | 2       | 2        |
+--------+---------+----------+

Changed in version 0.6.

Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.

petl.datetimeparser(fmt, strict=True)

Return a function to parse strings as datetime.datetime objects using a given format. E.g.:

>>> from petl import datetimeparser
>>> isodatetime = datetimeparser('%Y-%m-%dT%H:%M:%S')
>>> isodatetime('2002-12-25T00:00:00')
datetime.datetime(2002, 12, 25, 0, 0)
>>> isodatetime('2002-12-25T00:00:99')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1018, in parser
    return datetime.strptime(value.strip(), format)
  File "/usr/lib/python2.7/_strptime.py", line 328, in _strptime
    data_string[found.end():])
ValueError: unconverted data remains: 9

Can be used with parsecounts(), e.g.:

>>> from petl import look, parsecounts, datetimeparser
>>> table = [['when', 'who'],
...          ['2002-12-25T00:00:00', 'Alex'],
...          ['2004-09-12T01:10:11', 'Gloria'],
...          ['2002-13-25T00:00:00', 'Marty'],
...          ['2002-02-30T07:09:00', 'Melman']]
>>> parsers={'datetime': datetimeparser('%Y-%m-%dT%H:%M:%S')}
>>> look(parsecounts(table, 'when', parsers))
+------------+---------+----------+
| 'type'     | 'count' | 'errors' |
+============+=========+==========+
| 'datetime' | 2       | 2        |
+------------+---------+----------+

Changed in version 0.6.

Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.

petl.boolparser(true_strings=['true', 't', 'yes', 'y', '1'], false_strings=['false', 'f', 'no', 'n', '0'], case_sensitive=False, strict=True)

Return a function to parse strings as bool objects using a given set of string representations for True and False. E.g.:

>>> from petl import boolparser    
>>> mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n'])
>>> mybool('y')
True
>>> mybool('Y')
True
>>> mybool('yes')
True
>>> mybool('No')
False
>>> mybool('nO')
False
>>> mybool('true')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1175, in parser
    raise ValueError('value is not one of recognised boolean strings: %r' % value)
ValueError: value is not one of recognised boolean strings: 'true'
>>> mybool('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 1175, in parser
    raise ValueError('value is not one of recognised boolean strings: %r' % value)
ValueError: value is not one of recognised boolean strings: 'foo'

Can be used with parsecounts(), e.g.:

>>> from petl import look, parsecounts, boolparser
>>> table = [['who', 'vote'],
...          ['Alex', 'yes'],
...          ['Gloria', 'N'],
...          ['Marty', 'hmmm'],
...          ['Melman', 'nope']]
>>> mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n'])
>>> parsers = {'bool': mybool}
>>> look(parsecounts(table, 'vote', parsers))
+--------+---------+----------+
| 'type' | 'count' | 'errors' |
+========+=========+==========+
| 'bool' | 2       | 2        |
+--------+---------+----------+

Changed in version 0.6.

Added strict keyword argument. If strict=False then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. Allows for, e.g., incremental parsing of mixed format fields.

petl.numparser(strict=False)

Return a function that will attempt to parse the value as a number, trying int(), long(), float() and complex() in that order. If all fail, return the value as-is, unless strict`=`True, in which case raise the underlying exception.

New in version 0.24.

petl.parsenumber(v, strict=False)

Attempt to parse the value as a number, trying int(), long(), float() and complex() in that order. If all fail, return the value as-is.

New in version 0.4.

Changed in version 0.7: Set strict=True to get an exception if parsing fails.

Deprecated since version 0.24.

Use numparser() instead.

petl.lookup(table, keyspec, valuespec=None, dictionary=None)

Load a dictionary with data from the given table. E.g.:

>>> from petl import lookup
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookup(table, 'foo', 'bar')
>>> lkp['a']
[1]
>>> lkp['b']
[2, 3]

If no valuespec argument is given, defaults to the whole row (as a tuple), e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookup(table, 'foo')
>>> lkp['a']
[('a', 1)]
>>> lkp['b']
[('b', 2), ('b', 3)]

Compound keys are supported, e.g.:

>>> t2 = [['foo', 'bar', 'baz'],
...       ['a', 1, True],
...       ['b', 2, False],
...       ['b', 3, True],
...       ['b', 3, False]]
>>> lkp = lookup(t2, ('foo', 'bar'), 'baz')
>>> lkp[('a', 1)]
[True]
>>> lkp[('b', 2)]
[False]
>>> lkp[('b', 3)]
[True, False]

Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:

>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = shelve.open('mylookup.dat')
>>> lkp = lookup(table, 'foo', 'bar', lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mylookup.dat')
>>> lkp['a']
[1]
>>> lkp['b']
[2, 3]
petl.lookupone(table, keyspec, valuespec=None, dictionary=None, strict=False)

Load a dictionary with data from the given table, assuming there is at most one value for each key. E.g.:

>>> from petl import lookupone
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = lookupone(table, 'foo', 'bar')
>>> lkp['a']
1
>>> lkp['b']
2
>>> lkp['c']
2

If the specified key is not unique and strict=False (default), the first value wins, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookupone(table, 'foo', 'bar', strict=False)
>>> lkp['a']
1
>>> lkp['b']
2

If the specified key is not unique and strict=True, will raise DuplicateKeyError, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = lookupone(table, 'foo', strict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 451, in lookupone
petl.util.DuplicateKeyError

Compound keys are supported, e.g.:

>>> t2 = [['foo', 'bar', 'baz'],
...       ['a', 1, True],
...       ['b', 2, False],
...       ['b', 3, True]]
>>> lkp = lookupone(t2, ('foo', 'bar'), 'baz')
>>> lkp[('a', 1)]
True
>>> lkp[('b', 2)]
False
>>> lkp[('b', 3)]
True

Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:

>>> from petl import lookupone
>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = shelve.open('mylookupone.dat')
>>> lkp = lookupone(table, 'foo', 'bar', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mylookupone.dat')
>>> lkp['a']
1
>>> lkp['b']
2
>>> lkp['c']
2

Changed in version 0.11.

Changed so that strict=False is default and first value wins.

petl.dictlookup(table, keyspec, dictionary=None)

Load a dictionary with data from the given table, mapping to dicts. E.g.:

>>> from petl import dictlookup
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = dictlookup(table, 'foo')
>>> lkp['a']
[{'foo': 'a', 'bar': 1}]
>>> lkp['b']
[{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}]

Compound keys are supported, e.g.:

>>> t2 = [['foo', 'bar', 'baz'],
...       ['a', 1, True],
...       ['b', 2, False],
...       ['b', 3, True],
...       ['b', 3, False]]
>>> lkp = dictlookup(t2, ('foo', 'bar'))
>>> lkp[('a', 1)]
[{'baz': True, 'foo': 'a', 'bar': 1}]
>>> lkp[('b', 2)]
[{'baz': False, 'foo': 'b', 'bar': 2}]
>>> lkp[('b', 3)]
[{'baz': True, 'foo': 'b', 'bar': 3}, {'baz': False, 'foo': 'b', 'bar': 3}]

Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:

>>> import shelve
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = shelve.open('mydictlookup.dat')
>>> lkp = dictlookup(table, 'foo', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mydictlookup.dat')
>>> lkp['a']
[{'foo': 'a', 'bar': 1}]
>>> lkp['b']
[{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}]

Changed in version 0.15.

Renamed from recordlookup.

petl.dictlookupone(table, keyspec, dictionary=None, strict=False)

Load a dictionary with data from the given table, mapping to dicts, assuming there is at most one row for each key. E.g.:

>>> from petl import dictlookupone
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = dictlookupone(table, 'foo')
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 2}
>>> lkp['c']
{'foo': 'c', 'bar': 2}

If the specified key is not unique and strict=False (default), the first dict wins, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = dictlookupone(table, 'foo')
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 2}

If the specified key is not unique and strict=True, will raise DuplicateKeyError, e.g.:

>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]]
>>> lkp = dictlookupone(table, 'foo', strict=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "petl/util.py", line 451, in lookupone
petl.util.DuplicateKeyError

Compound keys are supported, e.g.:

>>> t2 = [['foo', 'bar', 'baz'],
...       ['a', 1, True],
...       ['b', 2, False],
...       ['b', 3, True]]
>>> lkp = dictlookupone(t2, ('foo', 'bar'), strict=False)
>>> lkp[('a', 1)]
{'baz': True, 'foo': 'a', 'bar': 1}
>>> lkp[('b', 2)]
{'baz': False, 'foo': 'b', 'bar': 2}
>>> lkp[('b', 3)]
{'baz': True, 'foo': 'b', 'bar': 3}

Data can be loaded into an existing dictionary-like object, including persistent dictionaries created via the shelve module, e.g.:

>>> import shelve
>>> lkp = shelve.open('mydictlookupone.dat')
>>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]]
>>> lkp = dictlookupone(table, 'foo', dictionary=lkp)
>>> lkp.close()
>>> exit()
$ python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shelve
>>> lkp = shelve.open('mydictlookupone.dat')
>>> lkp['a']
{'foo': 'a', 'bar': 1}
>>> lkp['b']
{'foo': 'b', 'bar': 2}
>>> lkp['c']
{'foo': 'c', 'bar': 2}

Changed in version 0.11.

Changed so that strict=False is default and first value wins.

Changed in version 0.15.

Renamed from recordlookupone.

petl.expr(s)

Construct a function operating on a record (i.e., a dictionary representation of a data row, indexed by field name).

The expression string is converted into a lambda function by prepending the string with 'lambda rec: ', then replacing anything enclosed in curly braces (e.g., "{foo}") with a lookup on the record (e.g., "rec['foo']"), then finally calling eval().

So, e.g., the expression string "{foo} * {bar}" is converted to the function lambda rec: rec['foo'] * rec['bar']

petl.strjoin(s)

Return a function to join sequences using s as the separator.

petl.randomtable(numflds=5, numrows=100, wait=0)

Construct a table with random numerical data. Use numflds and numrows to specify the number of fields and rows respectively. Set wait to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:

>>> from petl import randomtable, look
>>> t = randomtable(5, 10000)
>>> look(t)
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 'f0'                | 'f1'                | 'f2'                | 'f3'                 | 'f4'                 |
+=====================+=====================+=====================+======================+======================+
| 0.37981479583619415 | 0.5651754962690851  | 0.5219839418441516  | 0.400507081757018    | 0.18772722969580335  |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.8523718373108918  | 0.9728988775985702  | 0.539819811070272   | 0.5253127991162814   | 0.032332586052070345 |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.15767415808765595 | 0.8723372406647985  | 0.8116271113050197  | 0.19606663402788693  | 0.02917384287810021  |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.29027126477145737 | 0.9458013821235983  | 0.0558711583090582  | 0.8388382491420909   | 0.533855533396786    |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.7299727877963395  | 0.7293822340944851  | 0.953624640847381   | 0.7161554959575555   | 0.8681001821667421   |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.7057077618876934  | 0.5222733323906424  | 0.26527912571554013 | 0.41069309093677264  | 0.7062831671289698   |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.9447075997744453  | 0.3980291877822444  | 0.5748113148854611  | 0.037655670603881974 | 0.30826709590498524  |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.21559911346698513 | 0.8353039675591192  | 0.5558847892537019  | 0.8561403358605812   | 0.01109608253313421  |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.27334411287843097 | 0.10064946027523636 | 0.7476185996637322  | 0.26201984851765325  | 0.6303996377010502   |
+---------------------+---------------------+---------------------+----------------------+----------------------+
| 0.8348722928576766  | 0.40319578510057763 | 0.3658094978577834  | 0.9829576880714145   | 0.6170025401631835   |
+---------------------+---------------------+---------------------+----------------------+----------------------+

Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables.

New in version 0.6.

See also dummytable().

petl.dummytable(numrows=100, fields=[('foo', <functools.partial object at 0x7f4271907aa0>), ('bar', <functools.partial object at 0x7f4271907af8>), ('baz', <built-in method random of Random object at 0x18f0b70>)], wait=0)

Construct a table with dummy data. Use numrows to specify the number of rows. Set wait to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:

>>> from petl import dummytable, look
>>> t1 = dummytable(10000)
>>> look(t1)
+-------+-----------+----------------------+
| 'foo' | 'bar'     | 'baz'                |
+=======+===========+======================+
| 98    | 'oranges' | 0.017443519200384117 |
+-------+-----------+----------------------+
| 85    | 'pears'   | 0.6126183086894914   |
+-------+-----------+----------------------+
| 43    | 'apples'  | 0.8354915052285888   |
+-------+-----------+----------------------+
| 32    | 'pears'   | 0.9612740566307508   |
+-------+-----------+----------------------+
| 35    | 'bananas' | 0.4845179128370132   |
+-------+-----------+----------------------+
| 16    | 'pears'   | 0.150174888085586    |
+-------+-----------+----------------------+
| 98    | 'bananas' | 0.22592589109877748  |
+-------+-----------+----------------------+
| 82    | 'bananas' | 0.4887849296756226   |
+-------+-----------+----------------------+
| 75    | 'apples'  | 0.8414305202212253   |
+-------+-----------+----------------------+
| 78    | 'bananas' | 0.025845900016858714 |
+-------+-----------+----------------------+

Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables.

Data generation functions can be specified via the fields keyword argument, or set on the table via the suffix notation, e.g.:

>>> import random
>>> from functools import partial
>>> t2 = dummytable(10000, fields=[('foo', random.random), ('bar', partial(random.randint, 0, 500))])
>>> t2['baz'] = partial(random.choice, ['chocolate', 'strawberry', 'vanilla'])
>>> look(t2)
+---------------------+-------+--------------+
| 'foo'               | 'bar' | 'baz'        |
+=====================+=======+==============+
| 0.04595169186388326 | 370   | 'strawberry' |
+---------------------+-------+--------------+
| 0.29252999472988905 | 90    | 'chocolate'  |
+---------------------+-------+--------------+
| 0.7939324498894116  | 146   | 'chocolate'  |
+---------------------+-------+--------------+
| 0.4964898678468417  | 123   | 'chocolate'  |
+---------------------+-------+--------------+
| 0.26250784199548494 | 327   | 'strawberry' |
+---------------------+-------+--------------+
| 0.748470693146964   | 275   | 'strawberry' |
+---------------------+-------+--------------+
| 0.8995553034254133  | 151   | 'strawberry' |
+---------------------+-------+--------------+
| 0.26331484411715367 | 211   | 'chocolate'  |
+---------------------+-------+--------------+
| 0.4740252948218193  | 364   | 'vanilla'    |
+---------------------+-------+--------------+
| 0.166428545780258   | 59    | 'vanilla'    |
+---------------------+-------+--------------+

Changed in version 0.6.

Now supports different field types, e.g., non-numeric. Previous functionality is available as randomtable().

petl.diffheaders(t1, t2)

Return the difference between the headers of the two tables as a pair of sets. E.g.:

>>> from petl import diffheaders    
>>> table1 = [['foo', 'bar', 'baz'],
...           ['a', 1, .3]]
>>> table2 = [['baz', 'bar', 'quux'],
...           ['a', 1, .3]]
>>> add, sub = diffheaders(table1, table2)
>>> add
set(['quux'])
>>> sub
set(['foo'])

New in version 0.6.

petl.diffvalues(t1, t2, f)

Return the difference between the values under the given field in the two tables, e.g.:

>>> from petl import diffvalues
>>> table1 = [['foo', 'bar'],
...           ['a', 1],
...           ['b', 3]]
>>> table2 = [['bar', 'foo'],
...           [1, 'a'],
...           [3, 'c']]
>>> add, sub = diffvalues(table1, table2, 'foo')
>>> add
set(['c'])
>>> sub
set(['b'])

New in version 0.6.

petl.heapqmergesorted(key=None, *iterables)

Return a single iterator over the given iterables, sorted by the given key function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses heapq.merge() for the underlying implementation. See also shortlistmergesorted().

New in version 0.9.

petl.shortlistmergesorted(key=None, reverse=False, *iterables)

Return a single iterator over the given iterables, sorted by the given key function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses min() (or max() if reverse=True) for the underlying implementation. See also heapqmergesorted().

New in version 0.9.

petl.progress(table, batchsize=1000, prefix='', out=<open file '<stderr>', mode 'w' at 0x7f42771241e0>)

Report progress on rows passing through. E.g.:

>>> from petl import dummytable, progress, tocsv
>>> d = dummytable(100500)
>>> p = progress(d, 10000)
>>> tocsv(p, 'output.csv')
10000 rows in 0.57s (17574 rows/second); batch in 0.57s (17574 rows/second)
20000 rows in 1.13s (17723 rows/second); batch in 0.56s (17876 rows/second)
30000 rows in 1.69s (17732 rows/second); batch in 0.56s (17749 rows/second)
40000 rows in 2.27s (17652 rows/second); batch in 0.57s (17418 rows/second)
50000 rows in 2.83s (17679 rows/second); batch in 0.56s (17784 rows/second)
60000 rows in 3.39s (17694 rows/second); batch in 0.56s (17769 rows/second)
70000 rows in 3.96s (17671 rows/second); batch in 0.57s (17534 rows/second)
80000 rows in 4.53s (17677 rows/second); batch in 0.56s (17720 rows/second)
90000 rows in 5.09s (17681 rows/second); batch in 0.56s (17715 rows/second)
100000 rows in 5.66s (17675 rows/second); batch in 0.57s (17625 rows/second)
100500 rows in 5.69s (17674 rows/second)

See also clock().

New in version 0.10.

petl.clock(table)

Time how long is spent retrieving rows from the wrapped container. Enables diagnosis of which steps in a pipeline are taking the most time. E.g.:

>>> from petl import dummytable, clock, convert, progress, tocsv
>>> t1 = dummytable(100000)
>>> c1 = clock(t1)
>>> t2 = convert(c1, 'foo', lambda v: v**2)
>>> c2 = clock(t2)
>>> p = progress(c2, 10000)
>>> tocsv(p, 'dummy.csv')
10000 rows in 1.17s (8559 rows/second); batch in 1.17s (8559 rows/second)
20000 rows in 2.34s (8548 rows/second); batch in 1.17s (8537 rows/second)
30000 rows in 3.51s (8547 rows/second); batch in 1.17s (8546 rows/second)
40000 rows in 4.68s (8541 rows/second); batch in 1.17s (8522 rows/second)
50000 rows in 5.89s (8483 rows/second); batch in 1.21s (8261 rows/second)
60000 rows in 7.30s (8221 rows/second); batch in 1.40s (7121 rows/second)
70000 rows in 8.59s (8144 rows/second); batch in 1.30s (7711 rows/second)
80000 rows in 9.78s (8182 rows/second); batch in 1.18s (8459 rows/second)
90000 rows in 10.98s (8193 rows/second); batch in 1.21s (8279 rows/second)
100000 rows in 12.30s (8132 rows/second); batch in 1.31s (7619 rows/second)
100000 rows in 12.30s (8132 rows/second)
>>> # time consumed retrieving rows from t1
... c1.time
5.4099999999999895
>>> # time consumed retrieving rows from t2
... c2.time
8.740000000000006
>>> # actual time consumed by the convert step
... c2.time - c1.time 
3.330000000000016

See also progress().

New in version 0.10.

petl.rowgroupby(table, key, value=None)

Convenient adapter for itertools.groupby(). E.g.:

>>> from petl import rowgroupby, look
>>> look(table)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 1     | True  |
+-------+-------+-------+
| 'b'   | 3     | True  |
+-------+-------+-------+
| 'b'   | 2     |       |
+-------+-------+-------+

>>> # group entire rows
... for key, group in rowgroupby(table, 'foo'):
...     print key, list(group)
... 
a [('a', 1, True)]
b [('b', 3, True), ('b', 2)]
>>> # group specific values
... for key, group in rowgroupby(table, 'foo', 'bar'):
...     print key, list(group)
... 
a [1]
b [3, 2]

N.B., assumes the input table is already sorted by the given key.

New in version 0.10.

petl.nthword(n, sep=None)

Construct a function to return the nth word in a string. E.g.:

>>> from petl import nthword
>>> s = 'foo bar'
>>> f = nthword(0)
>>> f(s)
'foo'
>>> g = nthword(1)
>>> g(s)
'bar'

New in version 0.10.

petl.cache(table, n=10000)

Wrap the table with a cache that caches up to n rows as they are initially requested via iteration.

New in version 0.16.

petl.empty()

Convenience function to return an empty table. Can be useful when building up a table from a set of columns, e.g.:

>>> from petl import empty, addcolumn, look
>>> table1 = addcolumn(empty(), 'foo', ['A', 'B'])
>>> table2 = addcolumn(table1, 'bar', [1, 2])
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   |     1 |
+-------+-------+
| 'B'   |     2 |
+-------+-------+

New in version 0.23.

petl.coalesce(*fields, **kwargs)

Return a function which accepts a row and returns the first non-missing value from the specified fields.