Transform - transforming tables

Basic transformations

petl.head(table, n=5)

Choose the first n data rows. E.g.:

>>> from petl import head, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+
| 'f'   | 42    |
+-------+-------+
| 'f'   | 3     |
+-------+-------+
| 'h'   | 90    |
+-------+-------+

>>> table2 = head(table1, 4)
>>> look(table2)    
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+

Syntactic sugar, equivalent to rowslice(table, n).

petl.tail(table, n=5)

Choose the last n data rows.

E.g.:

>>> from petl import tail, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+
| 'f'   | 42    |
+-------+-------+
| 'f'   | 3     |
+-------+-------+
| 'h'   | 90    |
+-------+-------+
| 'k'   | 12    |
+-------+-------+
| 'l'   | 77    |
+-------+-------+
| 'q'   | 2     |
+-------+-------+

>>> table2 = tail(table1, 4)
>>> look(table2)    
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'h'   | 90    |
+-------+-------+
| 'k'   | 12    |
+-------+-------+
| 'l'   | 77    |
+-------+-------+
| 'q'   | 2     |
+-------+-------+

See also head(), rowslice().

petl.rowslice(table, *sliceargs)

Choose a subsequence of data rows. E.g.:

>>> from petl import rowslice, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+
| 'f'   | 42    |
+-------+-------+

>>> table2 = rowslice(table1, 2)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

>>> table3 = rowslice(table1, 1, 4)
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+

>>> table4 = rowslice(table1, 0, 5, 2)
>>> look(table4)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'c'   | 5     |
+-------+-------+
| 'f'   | 42    |
+-------+-------+

Changed in version 0.3.

Positional arguments can be used to slice the data rows. The sliceargs are passed to itertools.islice().

petl.cut(table, *args, **kwargs)

Choose and/or re-order columns. E.g.:

>>> from petl import look, cut    
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.7   |
+-------+-------+-------+
| 'B'   | 2     | 3.4   |
+-------+-------+-------+
| 'B'   | 3     | 7.8   |
+-------+-------+-------+
| 'D'   | 42    | 9.0   |
+-------+-------+-------+
| 'E'   | 12    |       |
+-------+-------+-------+

>>> table2 = cut(table1, 'foo', 'baz')
>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A'   | 2.7   |
+-------+-------+
| 'B'   | 3.4   |
+-------+-------+
| 'B'   | 7.8   |
+-------+-------+
| 'D'   | 9.0   |
+-------+-------+
| 'E'   | None  |
+-------+-------+

>>> # fields can also be specified by index, starting from zero
... table3 = cut(table1, 0, 2)
>>> look(table3)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A'   | 2.7   |
+-------+-------+
| 'B'   | 3.4   |
+-------+-------+
| 'B'   | 7.8   |
+-------+-------+
| 'D'   | 9.0   |
+-------+-------+
| 'E'   | None  |
+-------+-------+

>>> # field names and indices can be mixed
... table4 = cut(table1, 'bar', 0)
>>> look(table4)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 1     | 'A'   |
+-------+-------+
| 2     | 'B'   |
+-------+-------+
| 3     | 'B'   |
+-------+-------+
| 42    | 'D'   |
+-------+-------+
| 12    | 'E'   |
+-------+-------+

>>> # select a range of fields
... table5 = cut(table1, *range(0, 2))
>>> look(table5)    
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 1     |
+-------+-------+
| 'B'   | 2     |
+-------+-------+
| 'B'   | 3     |
+-------+-------+
| 'D'   | 42    |
+-------+-------+
| 'E'   | 12    |
+-------+-------+

Note that any short rows will be padded with None values (or whatever is provided via the missing keyword argument).

See also cutout().

petl.cutout(table, *args, **kwargs)

Remove fields. E.g.:

>>> from petl import cutout, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.7   |
+-------+-------+-------+
| 'B'   | 2     | 3.4   |
+-------+-------+-------+
| 'B'   | 3     | 7.8   |
+-------+-------+-------+
| 'D'   | 42    | 9.0   |
+-------+-------+-------+
| 'E'   | 12    |       |
+-------+-------+-------+

>>> table2 = cutout(table1, 'bar')
>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'A'   | 2.7   |
+-------+-------+
| 'B'   | 3.4   |
+-------+-------+
| 'B'   | 7.8   |
+-------+-------+
| 'D'   | 9.0   |
+-------+-------+
| 'E'   | None  |
+-------+-------+

See also cut().

New in version 0.3.

petl.movefield(table, field, index)

Move a field to a new position.

New in version 0.24.

petl.cat(*tables, **kwargs)

Concatenate data from two or more tables. E.g.:

>>> from petl import look, cat
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 1     | 'A'   |
+-------+-------+
| 2     | 'B'   |
+-------+-------+

>>> look(table2)
+-------+-------+
| 'bar' | 'baz' |
+=======+=======+
| 'C'   | True  |
+-------+-------+
| 'D'   | False |
+-------+-------+

>>> table3 = cat(table1, table2)
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'A'   | None  |
+-------+-------+-------+
| 2     | 'B'   | None  |
+-------+-------+-------+
| None  | 'C'   | True  |
+-------+-------+-------+
| None  | 'D'   | False |
+-------+-------+-------+

>>> # can also be used to square up a single table with uneven rows
... look(table4)
+-------+-------+--------+------+
| 'foo' | 'bar' | 'baz'  |      |
+=======+=======+========+======+
| 'A'   | 1     | 2      |      |
+-------+-------+--------+------+
| 'B'   | '2'   | '3.4'  |      |
+-------+-------+--------+------+
| u'B'  | u'3'  | u'7.8' | True |
+-------+-------+--------+------+
| 'D'   | 'xyz' | 9.0    |      |
+-------+-------+--------+------+
| 'E'   | None  |        |      |
+-------+-------+--------+------+

>>> look(cat(table4))
+-------+-------+--------+
| 'foo' | 'bar' | 'baz'  |
+=======+=======+========+
| 'A'   | 1     | 2      |
+-------+-------+--------+
| 'B'   | '2'   | '3.4'  |
+-------+-------+--------+
| u'B'  | u'3'  | u'7.8' |
+-------+-------+--------+
| 'D'   | 'xyz' | 9.0    |
+-------+-------+--------+
| 'E'   | None  | None   |
+-------+-------+--------+

>>> # use the header keyword argument to specify a fixed set of fields 
... look(table5)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 'A'   | 1     |
+-------+-------+
| 'B'   | 2     |
+-------+-------+

>>> table6 = cat(table5, header=['A', 'foo', 'B', 'bar', 'C'])
>>> look(table6)
+------+-------+------+-------+------+
| 'A'  | 'foo' | 'B'  | 'bar' | 'C'  |
+======+=======+======+=======+======+
| None | 1     | None | 'A'   | None |
+------+-------+------+-------+------+
| None | 2     | None | 'B'   | None |
+------+-------+------+-------+------+

>>> # using the header keyword argument with two input tables
... look(table7)
+-------+-------+
| 'bar' | 'foo' |
+=======+=======+
| 'A'   | 1     |
+-------+-------+
| 'B'   | 2     |
+-------+-------+

>>> look(table8)
+-------+-------+
| 'bar' | 'baz' |
+=======+=======+
| 'C'   | True  |
+-------+-------+
| 'D'   | False |
+-------+-------+

>>> table9 = cat(table7, table8, header=['A', 'foo', 'B', 'bar', 'C'])
>>> look(table9)
+------+-------+------+-------+------+
| 'A'  | 'foo' | 'B'  | 'bar' | 'C'  |
+======+=======+======+=======+======+
| None | 1     | None | 'A'   | None |
+------+-------+------+-------+------+
| None | 2     | None | 'B'   | None |
+------+-------+------+-------+------+
| None | None  | None | 'C'   | None |
+------+-------+------+-------+------+
| None | None  | None | 'D'   | None |
+------+-------+------+-------+------+    

Note that the tables do not need to share exactly the same fields, any missing fields will be padded with None or whatever is provided via the missing keyword argument.

Note that cat() can be used with a single table argument, in which case it has the effect of ensuring all data rows are the same length as the header row, truncating any long rows and padding any short rows with the value of the missing keyword argument.

Changed in version 0.5.

By default, the fields for the output table will be determined as the union of all fields found in the input tables. Use the header keyword argument to override this behaviour and specify a fixed set of fields for the output table.

petl.skipcomments(table, prefix)

Skip any row where the first value is a string and starts with prefix. E.g.:

>>> from petl import skipcomments, look
>>> look(table1)
+---------+-------+-------+
| '##aaa' | 'bbb' | 'ccc' |
+=========+=======+=======+
| '##mmm' |       |       |
+---------+-------+-------+
| '#foo'  | 'bar' |       |
+---------+-------+-------+
| '##nnn' | 1     |       |
+---------+-------+-------+
| 'a'     | 1     |       |
+---------+-------+-------+
| 'b'     | 2     |       |
+---------+-------+-------+

>>> table2 = skipcomments(table1, '##')
>>> look(table2)
+--------+-------+
| '#foo' | 'bar' |
+========+=======+
| 'a'    | 1     |
+--------+-------+
| 'b'    | 2     |
+--------+-------+

New in version 0.4.

petl.addfield(table, field, value=None, index=None, missing=None)

Add a field with a fixed or calculated value. E.g.:

>>> from petl import addfield, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'M'   | 12    |
+-------+-------+
| 'F'   | 34    |
+-------+-------+
| '-'   | 56    |
+-------+-------+

>>> # using a fixed value
... table2 = addfield(table1, 'baz', 42)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M'   | 12    | 42    |
+-------+-------+-------+
| 'F'   | 34    | 42    |
+-------+-------+-------+
| '-'   | 56    | 42    |
+-------+-------+-------+

>>> # calculating the value
... table2 = addfield(table1, 'baz', lambda rec: rec['bar'] * 2)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M'   | 12    | 24    |
+-------+-------+-------+
| 'F'   | 34    | 68    |
+-------+-------+-------+
| '-'   | 56    | 112   |
+-------+-------+-------+

>>> # an expression string can also be used via expr
... from petl import expr
>>> table3 = addfield(table1, 'baz', expr('{bar} * 2'))
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'M'   | 12    | 24    |
+-------+-------+-------+
| 'F'   | 34    | 68    |
+-------+-------+-------+
| '-'   | 56    | 112   |
+-------+-------+-------+

Changed in version 0.10.

Renamed ‘extend’ to ‘addfield’.

petl.addcolumn(table, field, col, index=None, missing=None)

Add a column of data to the table. E.g.:

>>> from petl import addcolumn, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 1     |
+-------+-------+
| 'B'   | 2     |
+-------+-------+

>>> col = [True, False]
>>> table2 = addcolumn(table1, 'baz', col)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+

New in version 0.10.

petl.addrownumbers(table, start=1, step=1)

Add a field of row numbers. E.g.:

>>> from petl import addrownumbers, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 9     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

>>> table2 = addrownumbers(table1)
>>> look(table2)
+-------+-------+-------+
| 'row' | 'foo' | 'bar' |
+=======+=======+=======+
| 1     | 'A'   | 9     |
+-------+-------+-------+
| 2     | 'C'   | 2     |
+-------+-------+-------+
| 3     | 'F'   | 1     |
+-------+-------+-------+

New in version 0.10.

petl.addfieldusingcontext(table, field, query)

Like addfield() but the query function is passed the previous, current and next rows, so values may be calculated based on data in adjacent rows.

>>> from petl import look, addfieldusingcontext
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   |     1 |
+-------+-------+
| 'B'   |     4 |
+-------+-------+
| 'C'   |     5 |
+-------+-------+
| 'D'   |     9 |
+-------+-------+
>>> def upstream(prv, cur, nxt):
...     if prv is None:
...         return None
...     else:
...         return cur.bar - prv.bar
...
>>> def downstream(prv, cur, nxt):
...     if nxt is None:
...         return None
...     else:
...         return nxt.bar - cur.bar
...
>>> table2 = addfieldusingcontext(table1, 'baz', upstream)
>>> table3 = addfieldusingcontext(table2, 'quux', downstream)
>>> look(table3)
+-------+-------+-------+--------+
| 'foo' | 'bar' | 'baz' | 'quux' |
+=======+=======+=======+========+
| 'A'   |     1 | None  |      3 |
+-------+-------+-------+--------+
| 'B'   |     4 |     3 |      1 |
+-------+-------+-------+--------+
| 'C'   |     5 |     1 |      4 |
+-------+-------+-------+--------+
| 'D'   |     9 |     4 | None   |
+-------+-------+-------+--------+

New in version 0.24.

petl.annex(*tables, **kwargs)

Join two or more tables by row order. E.g.:

>>> from petl import annex, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 9     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

>>> look(table2)
+-------+-------+
| 'foo' | 'baz' |
+=======+=======+
| 'B'   | 3     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+

>>> table3 = annex(table1, table2)
>>> look(table3)    
+-------+-------+-------+-------+
| 'foo' | 'bar' | 'foo' | 'baz' |
+=======+=======+=======+=======+
| 'A'   | 9     | 'B'   | 3     |
+-------+-------+-------+-------+
| 'C'   | 2     | 'D'   | 10    |
+-------+-------+-------+-------+
| 'F'   | 1     | None  | None  |
+-------+-------+-------+-------+

New in version 0.10.

Header manipulations

petl.rename(table, *args)

Replace one or more fields in the table’s header row. E.g.:

>>> from petl import look, rename
>>> look(table1)
+-------+-------+
| 'sex' | 'age' |
+=======+=======+
| 'M'   | 12    |
+-------+-------+
| 'F'   | 34    |
+-------+-------+
| '-'   | 56    |
+-------+-------+

>>> # rename a single field
... table2 = rename(table1, 'sex', 'gender')
>>> look(table2)
+----------+-------+
| 'gender' | 'age' |
+==========+=======+
| 'M'      | 12    |
+----------+-------+
| 'F'      | 34    |
+----------+-------+
| '-'      | 56    |
+----------+-------+

>>> # rename multiple fields by passing a dictionary as the second argument
... table3 = rename(table1, {'sex': 'gender', 'age': 'age_years'})
>>> look(table3)
+----------+-------------+
| 'gender' | 'age_years' |
+==========+=============+
| 'M'      | 12          |
+----------+-------------+
| 'F'      | 34          |
+----------+-------------+
| '-'      | 56          |
+----------+-------------+

>>> # the returned table object can also be used to modify the field mapping using the suffix notation
... table4 = rename(table1)
>>> table4['sex'] = 'gender'
>>> table4['age'] = 'age_years'
>>> look(table4)
+----------+-------------+
| 'gender' | 'age_years' |
+==========+=============+
| 'M'      | 12          |
+----------+-------------+
| 'F'      | 34          |
+----------+-------------+
| '-'      | 56          |
+----------+-------------+

Changed in version 0.4.

Function signature changed to support the simple 2 argument form when renaming a single field.

Changed in version 0.23.

The field to rename can be specified as an index (i.e., integer representing field position).

petl.setheader(table, fields)

Override fields in the given table. E.g.:

>>> from petl import setheader, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

>>> table2 = setheader(table1, ['foofoo', 'barbar'])
>>> look(table2)
+----------+----------+
| 'foofoo' | 'barbar' |
+==========+==========+
| 'a'      | 1        |
+----------+----------+
| 'b'      | 2        |
+----------+----------+

See also extendheader(), pushheader().

petl.extendheader(table, fields)

Extend fields in the given table. E.g.:

>>> from petl import extendheader, look
>>> look(table1)
+-------+---+-------+
| 'foo' |   |       |
+=======+===+=======+
| 'a'   | 1 | True  |
+-------+---+-------+
| 'b'   | 2 | False |
+-------+---+-------+

>>> table2 = extendheader(table1, ['bar', 'baz'])
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 1     | True  |
+-------+-------+-------+
| 'b'   | 2     | False |
+-------+-------+-------+

See also setheader(), pushheader().

petl.pushheader(table, fields, *args)

Push rows down and prepend a header row. E.g.:

>>> from petl import pushheader, look
>>> look(table1)
+-----+---+
| 'a' | 1 |
+=====+===+
| 'b' | 2 |
+-----+---+

The header row can either be a list or positional arguments:

>>> table2 = pushheader(table1, ['foo', 'bar'])
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

>>> table2 = pushheader(table1, 'foo', 'bar')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

Useful, e.g., where data are from a CSV file that has not included a header row.

petl.prefixheader(table, prefix)

Prefix all fields in the table header.

New in version 0.24.

petl.suffixheader(table, suffix)

Suffix all fields in the table header.

New in version 0.24.

petl.skip(table, n)

Skip n rows (including the header row).

E.g.:

>>> from petl import skip, look
>>> look(table1)
+--------+-------+-------+
| '#aaa' | 'bbb' | 'ccc' |
+========+=======+=======+
| '#mmm' |       |       |
+--------+-------+-------+
| 'foo'  | 'bar' |       |
+--------+-------+-------+
| 'a'    | 1     |       |
+--------+-------+-------+
| 'b'    | 2     |       |
+--------+-------+-------+

>>> table2 = skip(table1, 2)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+

See also skipcomments().

Converting values

petl.convert(table, *args, **kwargs)

Transform values under one or more fields via arbitrary functions, method invocations or dictionary translations. E.g.:

>>> from petl import convert, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | '2.4' |    12 |
+-------+-------+-------+
| 'B'   | '5.7' |    34 |
+-------+-------+-------+
| 'C'   | '1.2' |    56 |
+-------+-------+-------+

>>> # using the built-in float function:
... table2 = convert(table1, 'bar', float)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   |   2.4 |    12 |
+-------+-------+-------+
| 'B'   |   5.7 |    34 |
+-------+-------+-------+
| 'C'   |   1.2 |    56 |
+-------+-------+-------+

>>> # using a lambda function::
... table3 = convert(table1, 'baz', lambda v: v*2)
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | '2.4' |    24 |
+-------+-------+-------+
| 'B'   | '5.7' |    68 |
+-------+-------+-------+
| 'C'   | '1.2' |   112 |
+-------+-------+-------+

>>> # a method of the data value can also be invoked by passing the method name
... table4 = convert(table1, 'foo', 'lower')
>>> look(table4)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | '2.4' |    12 |
+-------+-------+-------+
| 'b'   | '5.7' |    34 |
+-------+-------+-------+
| 'c'   | '1.2' |    56 |
+-------+-------+-------+

>>> # arguments to the method invocation can also be given
... table5 = convert(table1, 'foo', 'replace', 'A', 'AA')
>>> look(table5)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'AA'  | '2.4' |    12 |
+-------+-------+-------+
| 'B'   | '5.7' |    34 |
+-------+-------+-------+
| 'C'   | '1.2' |    56 |
+-------+-------+-------+

>>> # values can also be translated via a dictionary
... table7 = convert(table1, 'foo', {'A': 'Z', 'B': 'Y'})
>>> look(table7)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'Z'   | '2.4' |    12 |
+-------+-------+-------+
| 'Y'   | '5.7' |    34 |
+-------+-------+-------+
| 'C'   | '1.2' |    56 |
+-------+-------+-------+

>>> # the same conversion can be applied to multiple fields
... table8 = convert(table1, ('foo', 'bar', 'baz'), unicode)
>>> look(table8)
+-------+--------+-------+
| 'foo' | 'bar'  | 'baz' |
+=======+========+=======+
| u'A'  | u'2.4' | u'12' |
+-------+--------+-------+
| u'B'  | u'5.7' | u'34' |
+-------+--------+-------+
| u'C'  | u'1.2' | u'56' |
+-------+--------+-------+

>>> # multiple conversions can be specified at the same time
... table9 = convert(table1, {'foo': 'lower', 'bar': float, 'baz': lambda v: v*2})
>>> look(table9)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   |   2.4 |    24 |
+-------+-------+-------+
| 'b'   |   5.7 |    68 |
+-------+-------+-------+
| 'c'   |   1.2 |   112 |
+-------+-------+-------+

>>> # ...or alternatively via a list
... table10 = convert(table1, ['lower', float, lambda v: v*2])
>>> look(table10)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   |   2.4 |    24 |
+-------+-------+-------+
| 'b'   |   5.7 |    68 |
+-------+-------+-------+
| 'c'   |   1.2 |   112 |
+-------+-------+-------+

>>> # ...or alternatively via suffix notation on the returned table object
... table11 = convert(table1)
>>> table11['foo'] = 'lower'
>>> table11['bar'] = float
>>> table11['baz'] = lambda v: v*2
>>> look(table11)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   |   2.4 |    24 |
+-------+-------+-------+
| 'b'   |   5.7 |    68 |
+-------+-------+-------+
| 'c'   |   1.2 |   112 |
+-------+-------+-------+

>>> # conversion can be conditional
... table12 = convert(table1, 'baz', lambda v: v*2, where=lambda r: r.foo == 'B')
>>> look(table12)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | '2.4' |    12 |
+-------+-------+-------+
| 'B'   | '5.7' |    68 |
+-------+-------+-------+
| 'C'   | '1.2' |    56 |
+-------+-------+-------+

>>> # conversion can access other values from the same row
... table14 = convert(table1, 'baz', lambda v, row: v * float(row.bar), pass_row=True)
>>> look(table14)
+-------+-------+--------------------+
| 'foo' | 'bar' | 'baz'              |
+=======+=======+====================+
| 'A'   | '2.4' | 28.799999999999997 |
+-------+-------+--------------------+
| 'B'   | '5.7' |              193.8 |
+-------+-------+--------------------+
| 'C'   | '1.2' |               67.2 |
+-------+-------+--------------------+

Note that either field names or indexes can be given.

Changed in version 0.11.

Now supports multiple field conversions.

Changed in version 0.22.

The where keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False.

Changed in version 0.25.

The pass_row keyword argument can be given, which if True will mean that both the value and the containing row will be passed as arguments to the conversion function (so, i.e., the conversion function should accept two arguments).

petl.convertall(table, *args, **kwargs)

Convenience function to convert all fields in the table using a common function or mapping. See also convert().

New in version 0.4.

Changed in version 0.22.

The where keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False.

petl.convertnumbers(table, strict=False, **kwargs)

Convenience function to convert all field values to numbers where possible. E.g.:

>>> from petl import convertnumbers, look
>>> look(table1)
+-------+-------+--------+--------+
| 'foo' | 'bar' | 'baz'  | 'quux' |
+=======+=======+========+========+
| '1'   | '3.0' | '9+3j' | 'aaa'  |
+-------+-------+--------+--------+
| '2'   | '1.3' | '7+2j' | None   |
+-------+-------+--------+--------+

>>> table2 = convertnumbers(table1)
>>> look(table2)
+-------+-------+--------+--------+
| 'foo' | 'bar' | 'baz'  | 'quux' |
+=======+=======+========+========+
| 1     | 3.0   | (9+3j) | 'aaa'  |
+-------+-------+--------+--------+
| 2     | 1.3   | (7+2j) | None   |
+-------+-------+--------+--------+

New in version 0.4.

petl.replace(table, field, a, b, **kwargs)

Convenience function to replace all occurrences of a with b under the given field. See also convert().

New in version 0.5.

Changed in version 0.22.

The where keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False.

petl.replaceall(table, a, b, **kwargs)

Convenience function to replace all instances of a with b under all fields. See also convertall().

New in version 0.5.

Changed in version 0.22.

The where keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False.

petl.update(table, field, value, **kwargs)

Convenience function to convert a field to a fixed value. Accepts the where keyword argument. See also convert().

New in version 0.23.

petl.fieldconvert(table, converters=None, failonerror=False, errorvalue=None, **kwargs)

Transform values in one or more fields via functions or method invocations.

Deprecated since version 0.11.

Use convert() instead.

Selecting rows

petl.select(table, *args, **kwargs)

Select rows meeting a condition. E.g.:

>>> from petl import select, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 4     | 9.3   |
+-------+-------+-------+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+
| 'b'   | 1     | 23.3  |
+-------+-------+-------+
| 'c'   | 8     | 42.0  |
+-------+-------+-------+
| 'd'   | 7     | 100.9 |
+-------+-------+-------+
| 'c'   | 2     |       |
+-------+-------+-------+

>>> # the second positional argument can be a function accepting a record
... table2 = select(table1, lambda rec: rec[0] == 'a' and rec[1] > 88.1)
... # table2 = select(table1, lambda rec: rec['foo'] == 'a' and rec['baz'] > 88.1)
... # table2 = select(table1, lambda rec: rec.foo == 'a' and rec.baz > 88.1)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+

>>> # the second positional argument can also be an expression string, which
... # will be converted to a function using expr()
... table3 = select(table1, "{foo} == 'a' and {baz} > 88.1")
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+

>>> # the condition can also be applied to a single field
... table4 = select(table1, 'foo', lambda v: v == 'a')
>>> look(table4)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 4     | 9.3   |
+-------+-------+-------+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectop(table, field, value, op, complement=False)

Select rows where the function op applied to the given field and the given value returns true.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selecteq(table, field, value, complement=False)

Select rows where the given field equals the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectne(table, field, value, complement=False)

Select rows where the given field does not equal the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectlt(table, field, value, complement=False)

Select rows where the given field is less than the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectle(table, field, value, complement=False)

Select rows where the given field is less than or equal to the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectgt(table, field, value, complement=False)

Select rows where the given field is greater than the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectge(table, field, value, complement=False)

Select rows where the given field is greater than or equal to the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectrangeopen(table, field, minv, maxv, complement=False)

Select rows where the given field is greater than or equal to minv and less than or equal to maxv.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectrangeopenleft(table, field, minv, maxv, complement=False)

Select rows where the given field is greater than or equal to minv and less than maxv.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectrangeopenright(table, field, minv, maxv, complement=False)

Select rows where the given field is greater than minv and less than or equal to maxv.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectrangeclosed(table, field, minv, maxv, complement=False)

Select rows where the given field is greater than minv and less than maxv.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectcontains(table, field, value, complement=False)

Select rows where the given field contains the given value.

New in version 0.10.

petl.selectin(table, field, value, complement=False)

Select rows where the given field is a member of the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectnotin(table, field, value, complement=False)

Select rows where the given field is not a member of the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectis(table, field, value, complement=False)

Select rows where the given field is the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectisnot(table, field, value, complement=False)

Select rows where the given field is not the given value.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectisinstance(table, field, value, complement=False)

Select rows where the given field is an instance of the given type.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selectre(table, field, pattern, flags=0, complement=False)

Select rows where a regular expression search using the given pattern on the given field returns a match. E.g.:

>>> from petl import selectre, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'aa'  | 4     | 9.3   |
+-------+-------+-------+
| 'aaa' | 2     | 88.2  |
+-------+-------+-------+
| 'b'   | 1     | 23.3  |
+-------+-------+-------+
| 'ccc' | 8     | 42.0  |
+-------+-------+-------+
| 'bb'  | 7     | 100.9 |
+-------+-------+-------+
| 'c'   | 2     |       |
+-------+-------+-------+

>>> table2 = selectre(table1, 'foo', '[ab]{2}')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'aa'  | 4     | 9.3   |
+-------+-------+-------+
| 'aaa' | 2     | 88.2  |
+-------+-------+-------+
| 'bb'  | 7     | 100.9 |
+-------+-------+-------+

See also re.search().

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.selecttrue(table, field, complement=False)

Select rows where the given field equals True.

petl.selectfalse(table, field, complement=False)

Select rows where the given field equals False.

petl.selectnone(table, field, complement=False)

Select rows where the given field is None.

petl.selectnotnone(table, field, complement=False)

Select rows where the given field is not None.

petl.selectusingcontext(table, query)

Select rows based on data in the current row and/or previous and next row. E.g.:

>>> from petl import look, selectusingcontext
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   |     1 |
+-------+-------+
| 'B'   |     4 |
+-------+-------+
| 'C'   |     5 |
+-------+-------+
| 'D'   |     9 |
+-------+-------+

>>> def query(prv, cur, nxt):
...     return ((prv is not None and (cur.bar - prv.bar) < 2)
...             or (nxt is not None and (nxt.bar - cur.bar) < 2))
...
>>> table2 = selectusingcontext(table1, query)
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'B'   |     4 |
+-------+-------+
| 'C'   |     5 |
+-------+-------+

New in version 0.24.

petl.rowselect(table, where, complement=False)

Select rows matching a condition. The where argument should be a function accepting a hybrid row object (supports accessing values either by position or by field name) as argument and returning True or False.

Deprecated since version 0.10.

Use select() instead, it supports the same signature.

petl.recordselect(table, where, missing=None, complement=False)

Select rows matching a condition. The where argument should be a function accepting a record (row as dictionary of values indexed by field name) as argument and returning True or False.

Deprecated since version 0.9.

Use select() instead.

petl.rowlenselect(table, n, complement=False)

Select rows of length n.

Changed in version 0.4.

The complement of the selection can be returned (i.e., the query can be inverted) by providing complement=True as a keyword argument.

petl.fieldselect(table, field, where, complement=False)

Select rows matching a condition. The where argument should be a function accepting a single data value as argument and returning True or False.

Deprecated since version 0.10.

Use select() instead, it supports the same signature.

petl.facet(table, field)

Return a dictionary mapping field values to tables.

E.g.:

>>> from petl import facet, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 4     | 9.3   |
+-------+-------+-------+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+
| 'b'   | 1     | 23.3  |
+-------+-------+-------+
| 'c'   | 8     | 42.0  |
+-------+-------+-------+
| 'd'   | 7     | 100.9 |
+-------+-------+-------+
| 'c'   | 2     |       |
+-------+-------+-------+

>>> foo = facet(table1, 'foo')
>>> foo.keys()
['a', 'c', 'b', 'd']
>>> look(foo['a'])
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   | 4     | 9.3   |
+-------+-------+-------+
| 'a'   | 2     | 88.2  |
+-------+-------+-------+

>>> look(foo['c'])
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'c'   | 8     | 42.0  |
+-------+-------+-------+
| 'c'   | 2     |       |
+-------+-------+-------+

See also facetcolumns().

petl.rangefacet(table, field, width, minv=None, maxv=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Return a dictionary mapping ranges to tables. E.g.:

>>> from petl import rangefacet, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 3     |
+-------+-------+
| 'a'   | 7     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'b'   | 1     |
+-------+-------+
| 'b'   | 9     |
+-------+-------+
| 'c'   | 4     |
+-------+-------+
| 'd'   | 3     |
+-------+-------+

>>> rf = rangefacet(table1, 'bar', 2)
>>> rf.keys()
[(1, 3), (3, 5), (5, 7), (7, 9)]
>>> look(rf[(1, 3)])
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'b'   | 2     |
+-------+-------+
| 'b'   | 1     |
+-------+-------+

>>> look(rf[(7, 9)])
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 7     |
+-------+-------+
| 'b'   | 9     |
+-------+-------+

Note that the last bin includes both edges.

Regular expressions

petl.search(table, *args, **kwargs)

Perform a regular expression search, returning rows that match a given pattern, either anywhere in the row or within a specific field. E.g.:

>>> from petl import search, look
>>> look(table1)
+------------+-------+--------------------------+
| 'foo'      | 'bar' | 'baz'                    |
+============+=======+==========================+
| 'orange'   | 12    | 'oranges are nice fruit' |
+------------+-------+--------------------------+
| 'mango'    | 42    | 'I like them'            |
+------------+-------+--------------------------+
| 'banana'   | 74    | 'lovely too'             |
+------------+-------+--------------------------+
| 'cucumber' | 41    | 'better than mango'      |
+------------+-------+--------------------------+

>>> # search any field
... table2 = search(table1, '.g.')
>>> look(table2)
+------------+-------+--------------------------+
| 'foo'      | 'bar' | 'baz'                    |
+============+=======+==========================+
| 'orange'   | 12    | 'oranges are nice fruit' |
+------------+-------+--------------------------+
| 'mango'    | 42    | 'I like them'            |
+------------+-------+--------------------------+
| 'cucumber' | 41    | 'better than mango'      |
+------------+-------+--------------------------+

>>> # search a specific field
... table3 = search(table1, 'foo', '.g.')
>>> look(table3)
+----------+-------+--------------------------+
| 'foo'    | 'bar' | 'baz'                    |
+==========+=======+==========================+
| 'orange' | 12    | 'oranges are nice fruit' |
+----------+-------+--------------------------+
| 'mango'  | 42    | 'I like them'            |
+----------+-------+--------------------------+

The complement of search() (i.e., the rows not found via search()) can be found via searchcomplement()

New in version 0.10.

petl.sub(table, field, pattern, repl, count=0, flags=0)

Convenience function to convert values under the given field using a regular expression substitution. See also re.sub().

New in version 0.5.

Changed in version 0.10.

Renamed ‘resub’ to ‘sub’.

petl.split(table, field, pattern, newfields=None, include_original=False, maxsplit=0, flags=0)

Add one or more new fields with values generated by splitting an existing value around occurrences of a regular expression. E.g.:

>>> from petl import split, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| '1'  | 'parad1'   | '12'    |
+------+------------+---------+
| '2'  | 'parad2'   | '15'    |
+------+------------+---------+
| '3'  | 'tempd1'   | '18'    |
+------+------------+---------+
| '4'  | 'tempd2'   | '19'    |
+------+------------+---------+

>>> table2 = split(table1, 'variable', 'd', ['variable', 'day'])
>>> look(table2)
+------+---------+------------+-------+
| 'id' | 'value' | 'variable' | 'day' |
+======+=========+============+=======+
| '1'  | '12'    | 'para'     | '1'   |
+------+---------+------------+-------+
| '2'  | '15'    | 'para'     | '2'   |
+------+---------+------------+-------+
| '3'  | '18'    | 'temp'     | '1'   |
+------+---------+------------+-------+
| '4'  | '19'    | 'temp'     | '2'   |
+------+---------+------------+-------+

See also re.split().

petl.capture(table, field, pattern, newfields=None, include_original=False, flags=0, fill=None)

Add one or more new fields with values captured from an existing field searched via a regular expression. E.g.:

>>> from petl import capture, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| '1'  | 'A1'       | '12'    |
+------+------------+---------+
| '2'  | 'A2'       | '15'    |
+------+------------+---------+
| '3'  | 'B1'       | '18'    |
+------+------------+---------+
| '4'  | 'C12'      | '19'    |
+------+------------+---------+

>>> table2 = capture(table1, 'variable', '(\w)(\d+)', ['treat', 'time'])
>>> look(table2)
+------+---------+---------+--------+
| 'id' | 'value' | 'treat' | 'time' |
+======+=========+=========+========+
| '1'  | '12'    | 'A'     | '1'    |
+------+---------+---------+--------+
| '2'  | '15'    | 'A'     | '2'    |
+------+---------+---------+--------+
| '3'  | '18'    | 'B'     | '1'    |
+------+---------+---------+--------+
| '4'  | '19'    | 'C'     | '12'   |
+------+---------+---------+--------+

>>> # using the include_original argument
... table3 = capture(table1, 'variable', '(\w)(\d+)', ['treat', 'time'], include_original=True)
>>> look(table3)
+------+------------+---------+---------+--------+
| 'id' | 'variable' | 'value' | 'treat' | 'time' |
+======+============+=========+=========+========+
| '1'  | 'A1'       | '12'    | 'A'     | '1'    |
+------+------------+---------+---------+--------+
| '2'  | 'A2'       | '15'    | 'A'     | '2'    |
+------+------------+---------+---------+--------+
| '3'  | 'B1'       | '18'    | 'B'     | '1'    |
+------+------------+---------+---------+--------+
| '4'  | 'C12'      | '19'    | 'C'     | '12'   |
+------+------------+---------+---------+--------+

By default the field on which the capture is performed is omitted. It can be included using the include_original argument.

See also split(), re.search().

Changed in version 0.18.

The fill parameter can be used to provide a list or tuple of values to use if the regular expression does not match. The fill parameter should contain as many values as there are capturing groups in the regular expression. If fill is None (default) then a petl.transform.TransformError will be raised on the first non-matching value.

Deduplicating rows

petl.duplicates(table, key=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Select rows with duplicate values under a given key (or duplicate rows where no key is given). E.g.:

>>> from petl import duplicates, look    
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.0   |
+-------+-------+-------+
| 'B'   | 2     | 3.4   |
+-------+-------+-------+
| 'D'   | 6     | 9.3   |
+-------+-------+-------+
| 'B'   | 3     | 7.8   |
+-------+-------+-------+
| 'B'   | 2     | 12.3  |
+-------+-------+-------+
| 'E'   | None  | 1.3   |
+-------+-------+-------+
| 'D'   | 4     | 14.5  |
+-------+-------+-------+

>>> table2 = duplicates(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B'   | 2     | 3.4   |
+-------+-------+-------+
| 'B'   | 3     | 7.8   |
+-------+-------+-------+
| 'B'   | 2     | 12.3  |
+-------+-------+-------+
| 'D'   | 6     | 9.3   |
+-------+-------+-------+
| 'D'   | 4     | 14.5  |
+-------+-------+-------+

>>> # compound keys are supported
... table3 = duplicates(table1, key=['foo', 'bar'])
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B'   | 2     | 3.4   |
+-------+-------+-------+
| 'B'   | 2     | 12.3  |
+-------+-------+-------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

See also unique() and distinct().

petl.unique(table, key=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Select rows with unique values under a given key (or unique rows if no key is given). E.g.:

>>> from petl import unique, look
>>> look(table1)
+-------+-------+--------+
| 'foo' | 'bar' | 'baz'  |
+=======+=======+========+
| 'A'   | 1     | 2      |
+-------+-------+--------+
| 'B'   | '2'   | '3.4'  |
+-------+-------+--------+
| 'D'   | 'xyz' | 9.0    |
+-------+-------+--------+
| 'B'   | u'3'  | u'7.8' |
+-------+-------+--------+
| 'B'   | '2'   | 42     |
+-------+-------+--------+
| 'E'   | None  | None   |
+-------+-------+--------+
| 'D'   | 4     | 12.3   |
+-------+-------+--------+
| 'F'   | 7     | 2.3    |
+-------+-------+--------+

>>> table2 = unique(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2     |
+-------+-------+-------+
| 'E'   | None  | None  |
+-------+-------+-------+
| 'F'   | 7     | 2.3   |
+-------+-------+-------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

New in version 0.10.

See also duplicates() and distinct().

petl.conflicts(table, key, missing=None, include=None, exclude=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Select rows with the same key value but differing in some other field. E.g.:

>>> from petl import conflicts, look    
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.7   |
+-------+-------+-------+
| 'B'   | 2     | None  |
+-------+-------+-------+
| 'D'   | 3     | 9.4   |
+-------+-------+-------+
| 'B'   | None  | 7.8   |
+-------+-------+-------+
| 'E'   | None  |       |
+-------+-------+-------+
| 'D'   | 3     | 12.3  |
+-------+-------+-------+
| 'A'   | 2     | None  |
+-------+-------+-------+

>>> table2 = conflicts(table1, 'foo')
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.7   |
+-------+-------+-------+
| 'A'   | 2     | None  |
+-------+-------+-------+
| 'D'   | 3     | 9.4   |
+-------+-------+-------+
| 'D'   | 3     | 12.3  |
+-------+-------+-------+

Missing values are not considered conflicts. By default, None is treated as the missing value, this can be changed via the missing keyword argument.

One or more fields can be ignored when determining conflicts by providing the exclude keyword argument. Alternatively, fields to use when determining conflicts can be specified explicitly with the include keyword argument.

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.8.

Added the include and exclude keyword arguments. The exclude keyword argument replaces the ignore keyword argument in previous versions.

petl.distinct(table, count=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Return only distinct rows in the table. See also duplicates() and unique().

New in version 0.12.

Changed in version 0.25.

The count keyword argument has been added. If this argument is not None, it will be used as the name for an additional field, and the values of the field will be the number of duplicate rows.

Unpacking compound values

petl.unpack(table, field, newfields=None, include_original=False, missing=None)

Unpack data values that are lists or tuples. E.g.:

>>> from petl import unpack, look
>>> look(table1)
+-------+------------+
| 'foo' | 'bar'      |
+=======+============+
| 1     | ['a', 'b'] |
+-------+------------+
| 2     | ['c', 'd'] |
+-------+------------+
| 3     | ['e', 'f'] |
+-------+------------+

>>> table2 = unpack(table1, 'bar', ['baz', 'quux'])
>>> look(table2)
+-------+-------+--------+
| 'foo' | 'baz' | 'quux' |
+=======+=======+========+
| 1     | 'a'   | 'b'    |
+-------+-------+--------+
| 2     | 'c'   | 'd'    |
+-------+-------+--------+
| 3     | 'e'   | 'f'    |
+-------+-------+--------+

>>> table3 = unpack(table1, 'bar', 2)
>>> look(table3)
+-------+--------+--------+
| 'foo' | 'bar1' | 'bar2' |
+=======+========+========+
| 1     | 'a'    | 'b'    |
+-------+--------+--------+
| 2     | 'c'    | 'd'    |
+-------+--------+--------+
| 3     | 'e'    | 'f'    |
+-------+--------+--------+

See also unpackdict().

Changed in version 0.23.

This function will attempt to unpack exactly the number of values as given by the number of new fields specified. If there are more values than new fields, remaining values will not be unpacked. If there are less values than new fields, missing values will be added.

petl.unpackdict(table, field, keys=None, includeoriginal=False, samplesize=1000, missing=None)

Unpack dictionary values into separate fields. E.g.:

>>> from petl import unpackdict, look
>>> look(table1)
+-------+---------------------------+
| 'foo' | 'bar'                     |
+=======+===========================+
| 1     | {'quux': 'b', 'baz': 'a'} |
+-------+---------------------------+
| 2     | {'quux': 'd', 'baz': 'c'} |
+-------+---------------------------+
| 3     | {'quux': 'f', 'baz': 'e'} |
+-------+---------------------------+

>>> table2 = unpackdict(table1, 'bar')
>>> look(table2)
+-------+-------+--------+
| 'foo' | 'baz' | 'quux' |
+=======+=======+========+
| 1     | 'a'   | 'b'    |
+-------+-------+--------+
| 2     | 'c'   | 'd'    |
+-------+-------+--------+
| 3     | 'e'   | 'f'    |
+-------+-------+--------+

New in version 0.10.

Transforming rows

petl.fieldmap(table, mappings=None, failonerror=False, errorvalue=None)

Transform a table, mapping fields arbitrarily between input and output. E.g.:

>>> from petl import fieldmap, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex'    | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1    | 'male'   | 16    | 1.45     | 62.0     |
+------+----------+-------+----------+----------+
| 2    | 'female' | 19    | 1.34     | 55.4     |
+------+----------+-------+----------+----------+
| 3    | 'female' | 17    | 1.78     | 74.4     |
+------+----------+-------+----------+----------+
| 4    | 'male'   | 21    | 1.33     | 45.2     |
+------+----------+-------+----------+----------+
| 5    | '-'      | 25    | 1.65     | 51.9     |
+------+----------+-------+----------+----------+

>>> from collections import OrderedDict
>>> mappings = OrderedDict()
>>> # rename a field
... mappings['subject_id'] = 'id'
>>> # translate a field
... mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'}
>>> # apply a calculation to a field
... mappings['age_months'] = 'age', lambda v: v * 12
>>> # apply a calculation to a combination of fields
... mappings['bmi'] = lambda rec: rec['weight'] / rec['height']**2
>>> # transform and inspect the output
... table2 = fieldmap(table1, mappings)
>>> look(table2)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi'              |
+==============+==========+==============+====================+
| 1            | 'M'      | 192          | 29.48870392390012  |
+--------------+----------+--------------+--------------------+
| 2            | 'F'      | 228          | 30.8531967030519   |
+--------------+----------+--------------+--------------------+
| 3            | 'F'      | 204          | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4            | 'M'      | 252          | 25.55260331279326  |
+--------------+----------+--------------+--------------------+
| 5            | '-'      | 300          | 19.0633608815427   |
+--------------+----------+--------------+--------------------+

>>> # field mappings can also be added and/or updated after the table is created
... # via the suffix notation
... table3 = fieldmap(table1)
>>> table3['subject_id'] = 'id'
>>> table3['gender'] = 'sex', {'male': 'M', 'female': 'F'}
>>> table3['age_months'] = 'age', lambda v: v * 12
>>> # use an expression string this time
... table3['bmi'] = '{weight} / {height}**2'
>>> look(table3)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi'              |
+==============+==========+==============+====================+
| 1            | 'M'      | 192          | 29.48870392390012  |
+--------------+----------+--------------+--------------------+
| 2            | 'F'      | 228          | 30.8531967030519   |
+--------------+----------+--------------+--------------------+
| 3            | 'F'      | 204          | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4            | 'M'      | 252          | 25.55260331279326  |
+--------------+----------+--------------+--------------------+
| 5            | '-'      | 300          | 19.0633608815427   |
+--------------+----------+--------------+--------------------+

Note also that the mapping value can be an expression string, which will be converted to a lambda function via expr().

petl.rowmap(table, rowmapper, fields, failonerror=False, missing=None)

Transform rows via an arbitrary function. E.g.:

>>> from petl import rowmap, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex'    | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1    | 'male'   | 16    | 1.45     | 62.0     |
+------+----------+-------+----------+----------+
| 2    | 'female' | 19    | 1.34     | 55.4     |
+------+----------+-------+----------+----------+
| 3    | 'female' | 17    | 1.78     | 74.4     |
+------+----------+-------+----------+----------+
| 4    | 'male'   | 21    | 1.33     | 45.2     |
+------+----------+-------+----------+----------+
| 5    | '-'      | 25    | 1.65     | 51.9     |
+------+----------+-------+----------+----------+

>>> def rowmapper(row):
...     transmf = {'male': 'M', 'female': 'F'}
...     return [row[0],
...             transmf[row[1]] if row[1] in transmf else row[1],
...             row[2] * 12,
...             row[4] / row[3] ** 2]
...
>>> table2 = rowmap(table1, rowmapper, fields=['subject_id', 'gender', 'age_months', 'bmi'])
>>> look(table2)
+--------------+----------+--------------+--------------------+
| 'subject_id' | 'gender' | 'age_months' | 'bmi'              |
+==============+==========+==============+====================+
| 1            | 'M'      | 192          | 29.48870392390012  |
+--------------+----------+--------------+--------------------+
| 2            | 'F'      | 228          | 30.8531967030519   |
+--------------+----------+--------------+--------------------+
| 3            | 'F'      | 204          | 23.481883600555488 |
+--------------+----------+--------------+--------------------+
| 4            | 'M'      | 252          | 25.55260331279326  |
+--------------+----------+--------------+--------------------+
| 5            | '-'      | 300          | 19.0633608815427   |
+--------------+----------+--------------+--------------------+

The rowmapper function should return a single row (list or tuple).

Changed in version 0.9.

Hybrid row objects supporting data value access by either position or by field name are now passed to the rowmapper function.

petl.recordmap(table, recmapper, fields, failonerror=False)

Transform records via an arbitrary function.

Deprecated since version 0.9.

Use rowmap() insteand.

petl.rowmapmany(table, rowgenerator, fields, failonerror=False, missing=None)

Map each input row to any number of output rows via an arbitrary function. E.g.:

>>> from petl import rowmapmany, look
>>> look(table1)
+------+----------+-------+----------+----------+
| 'id' | 'sex'    | 'age' | 'height' | 'weight' |
+======+==========+=======+==========+==========+
| 1    | 'male'   | 16    | 1.45     | 62.0     |
+------+----------+-------+----------+----------+
| 2    | 'female' | 19    | 1.34     | 55.4     |
+------+----------+-------+----------+----------+
| 3    | '-'      | 17    | 1.78     | 74.4     |
+------+----------+-------+----------+----------+
| 4    | 'male'   | 21    | 1.33     |          |
+------+----------+-------+----------+----------+

>>> def rowgenerator(row):
...     transmf = {'male': 'M', 'female': 'F'}
...     yield [row[0], 'gender', transmf[row[1]] if row[1] in transmf else row[1]]
...     yield [row[0], 'age_months', row[2] * 12]
...     yield [row[0], 'bmi', row[4] / row[3] ** 2]
...
>>> table2 = rowmapmany(table1, rowgenerator, fields=['subject_id', 'variable', 'value'])
>>> look(table2)
+--------------+--------------+--------------------+
| 'subject_id' | 'variable'   | 'value'            |
+==============+==============+====================+
| 1            | 'gender'     | 'M'                |
+--------------+--------------+--------------------+
| 1            | 'age_months' | 192                |
+--------------+--------------+--------------------+
| 1            | 'bmi'        | 29.48870392390012  |
+--------------+--------------+--------------------+
| 2            | 'gender'     | 'F'                |
+--------------+--------------+--------------------+
| 2            | 'age_months' | 228                |
+--------------+--------------+--------------------+
| 2            | 'bmi'        | 30.8531967030519   |
+--------------+--------------+--------------------+
| 3            | 'gender'     | '-'                |
+--------------+--------------+--------------------+
| 3            | 'age_months' | 204                |
+--------------+--------------+--------------------+
| 3            | 'bmi'        | 23.481883600555488 |
+--------------+--------------+--------------------+
| 4            | 'gender'     | 'M'                |
+--------------+--------------+--------------------+

The rowgenerator function should yield zero or more rows (lists or tuples).

See also the melt() function.

Changed in version 0.9.

Hybrid row objects supporting data value access by either position or by field name are now passed to the rowgenerator function.

petl.recordmapmany(table, rowgenerator, fields, failonerror=False)

Map each input row (as a record) to any number of output rows via an arbitrary function.

Deprecated since version 0.9.

Use rowmapmany() instead.

petl.rowgroupmap(table, key, mapper, fields=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows under the given key then apply mapper to yield zero or more output rows for each input group of rows.

New in version 0.12.

Sorting

petl.sort(table, key=None, reverse=False, buffersize=None, tempdir=None, cache=True)

Sort the table. Field names or indices (from zero) can be used to specify the key. E.g.:

>>> from petl import sort, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'C'   | 2     |
+-------+-------+
| 'A'   | 9     |
+-------+-------+
| 'A'   | 6     |
+-------+-------+
| 'F'   | 1     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+

>>> table2 = sort(table1, 'foo')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 9     |
+-------+-------+
| 'A'   | 6     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

>>> # sorting by compound key is supported
... table3 = sort(table1, key=['foo', 'bar'])
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 6     |
+-------+-------+
| 'A'   | 9     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

>>> # if no key is specified, the default is a lexical sort
... table4 = sort(table1)
>>> look(table4)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 6     |
+-------+-------+
| 'A'   | 9     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

The buffersize argument should be an int or None.

If the number of rows in the table is less than buffersize, the table will be sorted in memory. Otherwise, the table is sorted in chunks of no more than buffersize rows, each chunk is written to a temporary file, and then a merge sort is performed on the temporary files.

If buffersize is None, the value of petl.transform.sorts.defaultbuffersize will be used. By default this is set to 100000 rows, but can be changed, e.g.:

>>> import petl.transform.sorts
>>> petl.transform.sorts.defaultbuffersize = 500000

If petl.transform.sorts.defaultbuffersize is set to None, this forces all sorting to be done entirely in memory.

Changed in version 0.16.

By default the results of the sort will be cached, and so a second pass over the sorted table will yield rows from the cache and will not repeat the sort operation. To turn off caching, set the cache argument to False.

petl.mergesort(*tables, **kwargs)

Combine multiple input tables into one sorted output table. E.g.:

>>> from petl import mergesort, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 9     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'A'   | 6     |
+-------+-------+
| 'F'   | 1     |
+-------+-------+

>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'B'   | 3     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'A'   | 10    |
+-------+-------+
| 'F'   | 4     |
+-------+-------+

>>> table3 = mergesort(table1, table2, key='foo')
>>> look(table3)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 9     |
+-------+-------+
| 'A'   | 6     |
+-------+-------+
| 'A'   | 10    |
+-------+-------+
| 'B'   | 3     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'D'   | 10    |
+-------+-------+
| 'F'   | 1     |
+-------+-------+
| 'F'   | 4     |
+-------+-------+

If the input tables are already sorted by the given key, give presorted=True as a keyword argument.

This function is equivalent to concatenating the input tables using cat() then sorting, however this function will typically be more efficient, especially if the input tables are presorted.

Keyword arguments:

  • key - field name or tuple of fields to sort by (defaults to None - lexical sort)
  • reverse - True if sort in reverse (descending) order (defaults to False)
  • presorted - True if inputs are already sorted by the given key (defaults to False)
  • missing - value to fill with when input tables have different fields (defaults to None)
  • header - specify a fixed header for the output table
  • buffersize - limit the number of rows in memory per input table when inputs are not presorted

New in version 0.9.

Joins

petl.join(left, right, key=None, lkey=None, rkey=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None)

Perform an equi-join on the given tables. E.g.:

>>> from petl import join, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 3    | 'purple' |
+------+----------+

>>> look(table2)
+------+-----------+
| 'id' | 'shape'   |
+======+===========+
| 1    | 'circle'  |
+------+-----------+
| 3    | 'square'  |
+------+-----------+
| 4    | 'ellipse' |
+------+-----------+

>>> table3 = join(table1, table2, key='id')
>>> look(table3)
+------+----------+----------+
| 'id' | 'colour' | 'shape'  |
+======+==========+==========+
| 1    | 'blue'   | 'circle' |
+------+----------+----------+
| 3    | 'purple' | 'square' |
+------+----------+----------+

>>>  # if no key is given, a natural join is tried
... table4 = join(table1, table2)
>>> look(table4)
+------+----------+----------+
| 'id' | 'colour' | 'shape'  |
+======+==========+==========+
| 1    | 'blue'   | 'circle' |
+------+----------+----------+
| 3    | 'purple' | 'square' |
+------+----------+----------+

>>>  # note behaviour if the key is not unique in either or both tables
... look(table5)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 1    | 'red'    |
+------+----------+
| 2    | 'purple' |
+------+----------+

>>> look(table6)
+------+-----------+
| 'id' | 'shape'   |
+======+===========+
| 1    | 'circle'  |
+------+-----------+
| 1    | 'square'  |
+------+-----------+
| 2    | 'ellipse' |
+------+-----------+

>>> table7 = join(table5, table6, key='id')
>>> look(table7)
+------+----------+-----------+
| 'id' | 'colour' | 'shape'   |
+======+==========+===========+
| 1    | 'blue'   | 'circle'  |
+------+----------+-----------+
| 1    | 'blue'   | 'square'  |
+------+----------+-----------+
| 1    | 'red'    | 'circle'  |
+------+----------+-----------+
| 1    | 'red'    | 'square'  |
+------+----------+-----------+
| 2    | 'purple' | 'ellipse' |
+------+----------+-----------+

>>>  # compound keys are supported
... look(table8)
+------+--------+----------+
| 'id' | 'time' | 'height' |
+======+========+==========+
| 1    | 1      | 12.3     |
+------+--------+----------+
| 1    | 2      | 34.5     |
+------+--------+----------+
| 2    | 1      | 56.7     |
+------+--------+----------+

>>> look(table9)
+------+--------+----------+
| 'id' | 'time' | 'weight' |
+======+========+==========+
| 1    | 2      | 4.5      |
+------+--------+----------+
| 2    | 1      | 6.7      |
+------+--------+----------+
| 2    | 2      | 8.9      |
+------+--------+----------+

>>> table10 = join(table8, table9, key=['id', 'time'])
>>> look(table10)
+------+--------+----------+----------+
| 'id' | 'time' | 'height' | 'weight' |
+======+========+==========+==========+
| 1    | 2      | 34.5     | 4.5      |
+------+--------+----------+----------+
| 2    | 1      | 56.7     | 6.7      |
+------+--------+----------+----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.leftjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None)

Perform a left outer join on the given tables. E.g.:

>>> from petl import leftjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 3    | 'purple' |
+------+----------+

>>> look(table2)
+------+-----------+
| 'id' | 'shape'   |
+======+===========+
| 1    | 'circle'  |
+------+-----------+
| 3    | 'square'  |
+------+-----------+
| 4    | 'ellipse' |
+------+-----------+

>>> table3 = leftjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+----------+
| 'id' | 'colour' | 'shape'  |
+======+==========+==========+
| 1    | 'blue'   | 'circle' |
+------+----------+----------+
| 2    | 'red'    | None     |
+------+----------+----------+
| 3    | 'purple' | 'square' |
+------+----------+----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.lookupjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None)

Perform a left join, but where the key is not unique in the right-hand table, arbitrarily choose the first row and ignore others. E.g.:

>>> from petl import lookupjoin, look
>>> look(table1)
+------+----------+--------+
| 'id' | 'color'  | 'cost' |
+======+==========+========+
| 1    | 'blue'   | 12     |
+------+----------+--------+
| 2    | 'red'    | 8      |
+------+----------+--------+
| 3    | 'purple' | 4      |
+------+----------+--------+

>>> look(table2)
+------+-----------+---------+
| 'id' | 'shape'   | 'size'  |
+======+===========+=========+
| 1    | 'circle'  | 'big'   |
+------+-----------+---------+
| 1    | 'circle'  | 'small' |
+------+-----------+---------+
| 2    | 'square'  | 'tiny'  |
+------+-----------+---------+
| 2    | 'square'  | 'big'   |
+------+-----------+---------+
| 3    | 'ellipse' | 'small' |
+------+-----------+---------+
| 3    | 'ellipse' | 'tiny'  |
+------+-----------+---------+

>>> table3 = lookupjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+--------+-----------+---------+
| 'id' | 'color'  | 'cost' | 'shape'   | 'size'  |
+======+==========+========+===========+=========+
| 1    | 'blue'   | 12     | 'circle'  | 'big'   |
+------+----------+--------+-----------+---------+
| 2    | 'red'    | 8      | 'square'  | 'tiny'  |
+------+----------+--------+-----------+---------+
| 3    | 'purple' | 4      | 'ellipse' | 'small' |
+------+----------+--------+-----------+---------+

See also leftjoin().

New in version 0.11.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.rightjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None)

Perform a right outer join on the given tables. E.g.:

>>> from petl import rightjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 3    | 'purple' |
+------+----------+

>>> look(table2)
+------+-----------+
| 'id' | 'shape'   |
+======+===========+
| 1    | 'circle'  |
+------+-----------+
| 3    | 'square'  |
+------+-----------+
| 4    | 'ellipse' |
+------+-----------+

>>> table3 = rightjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+-----------+
| 'id' | 'colour' | 'shape'   |
+======+==========+===========+
| 1    | 'blue'   | 'circle'  |
+------+----------+-----------+
| 3    | 'purple' | 'square'  |
+------+----------+-----------+
| 4    | None     | 'ellipse' |
+------+----------+-----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.outerjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None)

Perform a full outer join on the given tables. E.g.:

>>> from petl import outerjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 3    | 'purple' |
+------+----------+

>>> look(table2)
+------+-----------+
| 'id' | 'shape'   |
+======+===========+
| 1    | 'circle'  |
+------+-----------+
| 3    | 'square'  |
+------+-----------+
| 4    | 'ellipse' |
+------+-----------+

>>> table3 = outerjoin(table1, table2, key='id')
>>> look(table3)
+------+----------+-----------+
| 'id' | 'colour' | 'shape'   |
+======+==========+===========+
| 1    | 'blue'   | 'circle'  |
+------+----------+-----------+
| 2    | 'red'    | None      |
+------+----------+-----------+
| 3    | 'purple' | 'square'  |
+------+----------+-----------+
| 4    | None     | 'ellipse' |
+------+----------+-----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.crossjoin(*tables, **kwargs)

Form the cartesian product of the given tables. E.g.:

>>> from petl import crossjoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+

>>> look(table2)
+------+----------+
| 'id' | 'shape'  |
+======+==========+
| 1    | 'circle' |
+------+----------+
| 3    | 'square' |
+------+----------+

>>> table3 = crossjoin(table1, table2)
>>> look(table3)
+------+----------+------+----------+
| 'id' | 'colour' | 'id' | 'shape'  |
+======+==========+======+==========+
| 1    | 'blue'   | 1    | 'circle' |
+------+----------+------+----------+
| 1    | 'blue'   | 3    | 'square' |
+------+----------+------+----------+
| 2    | 'red'    | 1    | 'circle' |
+------+----------+------+----------+
| 2    | 'red'    | 3    | 'square' |
+------+----------+------+----------+

See also join(), leftjoin(), rightjoint(), outerjoin().

petl.antijoin(left, right, key=None, lkey=None, rkey=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Return rows from the left table where the key value does not occur in the right table. E.g.:

>>> from petl import antijoin, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 0    | 'black'  |
+------+----------+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 4    | 'yellow' |
+------+----------+
| 5    | 'white'  |
+------+----------+

>>> look(table2)
+------+----------+
| 'id' | 'shape'  |
+======+==========+
| 1    | 'circle' |
+------+----------+
| 3    | 'square' |
+------+----------+

>>> table3 = antijoin(table1, table2, key='id')
>>> look(table3)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 0    | 'black'  |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 4    | 'yellow' |
+------+----------+
| 5    | 'white'  |
+------+----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.unjoin(table, value, key=None, autoincrement=(1, 1), presorted=False, buffersize=None, tempdir=None, cache=True)

Split a table into two tables by reversing an inner join.

E.g., if the join key is present in the table:

>>> from petl import look, unjoin
>>> look(table1)
+-------+-------+----------+
| 'foo' | 'bar' | 'baz'    |
+=======+=======+==========+
| 'A'   | 1     | 'apple'  |
+-------+-------+----------+
| 'B'   | 1     | 'apple'  |
+-------+-------+----------+
| 'C'   | 2     | 'orange' |
+-------+-------+----------+

>>> table2, table3 = unjoin(table1, 'baz', key='bar')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'A'   | 1     |
+-------+-------+
| 'B'   | 1     |
+-------+-------+
| 'C'   | 2     |
+-------+-------+

>>> look(table3)
+-------+----------+
| 'bar' | 'baz'    |
+=======+==========+
| 1     | 'apple'  |
+-------+----------+
| 2     | 'orange' |
+-------+----------+

An integer join key can also be reconstructed, e.g.:

>>> look(table4)
+-------+----------+
| 'foo' | 'bar'    |
+=======+==========+
| 'A'   | 'apple'  |
+-------+----------+
| 'B'   | 'apple'  |
+-------+----------+
| 'C'   | 'orange' |
+-------+----------+

>>> table5, table6 = unjoin(table4, 'bar')
>>> look(table5)
+-------+----------+
| 'foo' | 'bar_id' |
+=======+==========+
| 'A'   | 1        |
+-------+----------+
| 'B'   | 1        |
+-------+----------+
| 'C'   | 2        |
+-------+----------+

>>> look(table6)
+------+----------+
| 'id' | 'bar'    |
+======+==========+
| 1    | 'apple'  |
+------+----------+
| 2    | 'orange' |
+------+----------+

New in version 0.12.

petl.hashjoin(left, right, key=None, lkey=None, rkey=None, cache=True, lprefix=None, rprefix=None)

Alternative implementation of join(), where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.5.

Changed in version 0.16.

Added support for caching data from right hand table (only available when key is given).

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.hashleftjoin(left, right, key=None, lkey=None, rkey=None, missing=None, cache=True, lprefix=None, rprefix=None)

Alternative implementation of leftjoin(), where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.5.

Changed in version 0.16.

Added support for caching data from right hand table (only available when key is given).

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.hashlookupjoin(left, right, key=None, lkey=None, rkey=None, missing=None, lprefix=None, rprefix=None)

Alternative implementation of lookupjoin(), where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.11.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.hashrightjoin(left, right, key=None, lkey=None, rkey=None, missing=None, cache=True, lprefix=None, rprefix=None)

Alternative implementation of rightjoin(), where the join is executed by constructing an in-memory lookup for the left hand table, then iterating over rows from the right hand table.

May be faster and/or more resource efficient where the left table is small and the right table is large.

New in version 0.5.

Changed in version 0.16.

Added support for caching data from left hand table (only available when key is given).

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

petl.hashantijoin(left, right, key=None, lkey=None, rkey=None)

Alternative implementation of antijoin(), where the join is executed by constructing an in-memory set for all keys found in the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.5.

Changed in version 0.24.

Added support for left and right tables with different key fields via the lkey and rkey arguments.

Set operations

petl.complement(a, b, presorted=False, buffersize=None, tempdir=None, cache=True)

Return rows in a that are not in b. E.g.:

>>> from petl import complement, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> look(b)
+-----+-----+-------+
| 'x' | 'y' | 'z'   |
+=====+=====+=======+
| 'B' | 2   | False |
+-----+-----+-------+
| 'A' | 9   | False |
+-----+-----+-------+
| 'B' | 3   | True  |
+-----+-----+-------+
| 'C' | 9   | True  |
+-----+-----+-------+

>>> aminusb = complement(a, b)
>>> look(aminusb)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+

>>> bminusa = complement(b, a)
>>> look(bminusa)
+-----+-----+-------+
| 'x' | 'y' | 'z'   |
+=====+=====+=======+
| 'A' | 9   | False |
+-----+-----+-------+
| 'B' | 3   | True  |
+-----+-----+-------+

Note that the field names of each table are ignored - rows are simply compared following a lexical sort. See also the recordcomplement() function.

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

petl.diff(a, b, presorted=False, buffersize=None, tempdir=None, cache=True)

Find the difference between rows in two tables. Returns a pair of tables. E.g.:

>>> from petl import diff, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> look(b)
+-----+-----+-------+
| 'x' | 'y' | 'z'   |
+=====+=====+=======+
| 'B' | 2   | False |
+-----+-----+-------+
| 'A' | 9   | False |
+-----+-----+-------+
| 'B' | 3   | True  |
+-----+-----+-------+
| 'C' | 9   | True  |
+-----+-----+-------+

>>> added, subtracted = diff(a, b)
>>> # rows in b not in a
... look(added)
+-----+-----+-------+
| 'x' | 'y' | 'z'   |
+=====+=====+=======+
| 'A' | 9   | False |
+-----+-----+-------+
| 'B' | 3   | True  |
+-----+-----+-------+

>>> # rows in a not in b
... look(subtracted)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+

Convenient shorthand for (complement(b, a), complement(a, b)). See also complement().

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

petl.recordcomplement(a, b, buffersize=None, tempdir=None, cache=True)

Find records in a that are not in b. E.g.:

>>> from petl import recordcomplement, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> look(b)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 2     | 'B'   | False |
+-------+-------+-------+
| 9     | 'A'   | False |
+-------+-------+-------+
| 3     | 'B'   | True  |
+-------+-------+-------+
| 9     | 'C'   | True  |
+-------+-------+-------+

>>> aminusb = recordcomplement(a, b)
>>> look(aminusb)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+

>>> bminusa = recordcomplement(b, a)
>>> look(bminusa)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 3     | 'B'   | True  |
+-------+-------+-------+
| 9     | 'A'   | False |
+-------+-------+-------+

Note that both tables must have the same set of fields, but that the order of the fields does not matter. See also the complement() function.

See also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

New in version 0.3.

petl.recorddiff(a, b, buffersize=None, tempdir=None, cache=True)

Find the difference between records in two tables. E.g.:

>>> from petl import recorddiff, look
>>> look(a)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> look(b)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 2     | 'B'   | False |
+-------+-------+-------+
| 9     | 'A'   | False |
+-------+-------+-------+
| 3     | 'B'   | True  |
+-------+-------+-------+
| 9     | 'C'   | True  |
+-------+-------+-------+

>>> added, subtracted = recorddiff(a, b)
>>> look(added)
+-------+-------+-------+
| 'bar' | 'foo' | 'baz' |
+=======+=======+=======+
| 3     | 'B'   | True  |
+-------+-------+-------+
| 9     | 'A'   | False |
+-------+-------+-------+

>>> look(subtracted)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+

Convenient shorthand for (recordcomplement(b, a), recordcomplement(a, b)). See also recordcomplement().

See also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

New in version 0.3.

petl.intersection(a, b, presorted=False, buffersize=None, tempdir=None, cache=True)

Return rows in a that are also in b. E.g.:

>>> from petl import intersection, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> look(table2)
+-----+-----+-------+
| 'x' | 'y' | 'z'   |
+=====+=====+=======+
| 'B' | 2   | False |
+-----+-----+-------+
| 'A' | 9   | False |
+-----+-----+-------+
| 'B' | 3   | True  |
+-----+-----+-------+
| 'C' | 9   | True  |
+-----+-----+-------+

>>> table3 = intersection(table1, table2)
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

petl.hashcomplement(a, b)

Alternative implementation of complement(), where the complement is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.5.

petl.hashintersection(a, b)

Alternative implementation of intersection(), where the intersection is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table.

May be faster and/or more resource efficient where the right table is small and the left table is large.

New in version 0.5.

Reducing rows

petl.aggregate(table, key, aggregation=None, value=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows under the given key then apply aggregation functions. E.g.:

>>> from petl import aggregate, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'a'   |     3 |  True |
+-------+-------+-------+
| 'a'   |     7 | False |
+-------+-------+-------+
| 'b'   |     2 |  True |
+-------+-------+-------+
| 'b'   |     2 | False |
+-------+-------+-------+
| 'b'   |     9 | False |
+-------+-------+-------+

>>> # aggregate whole rows
... table2 = aggregate(table1, 'foo', len)
>>> look(table2)
+-------+---------+
| 'foo' | 'value' |
+=======+=========+
| 'a'   |       2 |
+-------+---------+
| 'b'   |       3 |
+-------+---------+
| 'c'   |       1 |
+-------+---------+

>>> # aggregate single field
... table3 = aggregate(table1, 'foo', sum, 'bar')
>>> look(table3)
+-------+---------+
| 'foo' | 'value' |
+=======+=========+
| 'a'   |      10 |
+-------+---------+
| 'b'   |      13 |
+-------+---------+
| 'c'   |       4 |
+-------+---------+

>>> # alternative signature for single field aggregation using keyword args
... table4 = aggregate(table1, key=('foo', 'bar'), aggregation=list, value=('bar', 'baz'))
>>> look(table4)
+-------+-------+-------------------------+
| 'foo' | 'bar' | 'value'                 |
+=======+=======+=========================+
| 'a'   |     3 | [(3, True)]             |
+-------+-------+-------------------------+
| 'a'   |     7 | [(7, False)]            |
+-------+-------+-------------------------+
| 'b'   |     2 | [(2, True), (2, False)] |
+-------+-------+-------------------------+
| 'b'   |     9 | [(9, False)]            |
+-------+-------+-------------------------+
| 'c'   |     4 | [(4, True)]             |
+-------+-------+-------------------------+

>>> # aggregate multiple fields
... from collections import OrderedDict
>>> from petl import strjoin
>>> aggregation = OrderedDict()
>>> aggregation['count'] = len
>>> aggregation['minbar'] = 'bar', min
>>> aggregation['maxbar'] = 'bar', max
>>> aggregation['sumbar'] = 'bar', sum
>>> aggregation['listbar'] = 'bar' # default aggregation function is list
>>> aggregation['listbarbaz'] = ('bar', 'baz'), list
>>> aggregation['bars'] = 'bar', strjoin(', ')
>>> table5 = aggregate(table1, 'foo', aggregation)
>>> look(table5)
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'foo' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'listbarbaz'                        | 'bars'    |
+=======+=========+==========+==========+==========+===========+=====================================+===========+
| 'a'   |       2 |        3 |        7 |       10 | [3, 7]    | [(3, True), (7, False)]             | '3, 7'    |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'b'   |       3 |        2 |        9 |       13 | [2, 2, 9] | [(2, True), (2, False), (9, False)] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'c'   |       1 |        4 |        4 |        4 | [4]       | [(4, True)]                         | '4'       |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+

>>> # can also use list or tuple to specify multiple field aggregation
... aggregation = [('count', len),
...                ('minbar', 'bar', min),
...                ('maxbar', 'bar', max),
...                ('sumbar', 'bar', sum),
...                ('listbar', 'bar'), # default aggregation function is list
...                ('listbarbaz', ('bar', 'baz'), list),
...                ('bars', 'bar', strjoin(', '))]
>>> table6 = aggregate(table1, 'foo', aggregation)
>>> look(table6)
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'foo' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'listbarbaz'                        | 'bars'    |
+=======+=========+==========+==========+==========+===========+=====================================+===========+
| 'a'   |       2 |        3 |        7 |       10 | [3, 7]    | [(3, True), (7, False)]             | '3, 7'    |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'b'   |       3 |        2 |        9 |       13 | [2, 2, 9] | [(2, True), (2, False), (9, False)] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'c'   |       1 |        4 |        4 |        4 | [4]       | [(4, True)]                         | '4'       |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+

>>> # can also use suffix notation
... table7 = aggregate(table1, 'foo')
>>> table7['count'] = len
>>> table7['minbar'] = 'bar', min
>>> table7['maxbar'] = 'bar', max
>>> table7['sumbar'] = 'bar', sum
>>> table7['listbar'] = 'bar' # default aggregation function is list
>>> table7['listbarbaz'] = ('bar', 'baz'), list
>>> table7['bars'] = 'bar', strjoin(', ')
>>> look(table7)
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'foo' | 'count' | 'minbar' | 'maxbar' | 'sumbar' | 'listbar' | 'listbarbaz'                        | 'bars'    |
+=======+=========+==========+==========+==========+===========+=====================================+===========+
| 'a'   |       2 |        3 |        7 |       10 | [3, 7]    | [(3, True), (7, False)]             | '3, 7'    |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'b'   |       3 |        2 |        9 |       13 | [2, 2, 9] | [(2, True), (2, False), (9, False)] | '2, 2, 9' |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+
| 'c'   |       1 |        4 |        4 |        4 | [4]       | [(4, True)]                         | '4'       |
+-------+---------+----------+----------+----------+-----------+-------------------------------------+-----------+

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.24.

The provided key field is used in the output header instead of ‘key’. Also compound keys are output as separate columns.

petl.rangeaggregate(table, key, width, aggregation=None, value=None, minv=None, maxv=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows into bins then apply aggregation functions. E.g.:

>>> from petl import rangeaggregate, look, strjoin
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   |     3 |
+-------+-------+
| 'a'   |     7 |
+-------+-------+
| 'b'   |     2 |
+-------+-------+
| 'b'   |     1 |
+-------+-------+
| 'b'   |     9 |
+-------+-------+

>>> # aggregate whole rows
... table2 = rangeaggregate(table1, 'bar', 2, len)
>>> look(table2)
+---------+---------+
| 'bar'   | 'value' |
+=========+=========+
| (1, 3)  |       2 |
+---------+---------+
| (3, 5)  |       3 |
+---------+---------+
| (5, 7)  |       0 |
+---------+---------+
| (7, 9)  |       1 |
+---------+---------+
| (9, 11) |       1 |
+---------+---------+

>>> # aggregate single field
... table3 = rangeaggregate(table1, 'bar', 2, list, 'foo')
>>> look(table3)
+---------+-----------------+
| 'bar'   | 'value'         |
+=========+=================+
| (1, 3)  | ['b', 'b']      |
+---------+-----------------+
| (3, 5)  | ['a', 'd', 'c'] |
+---------+-----------------+
| (5, 7)  | []              |
+---------+-----------------+
| (7, 9)  | ['a']           |
+---------+-----------------+
| (9, 11) | ['b']           |
+---------+-----------------+

>>> # aggregate single field - alternative signature using keyword args
... table4 = rangeaggregate(table1, key='bar', width=2, aggregation=list, value='foo')
>>> look(table4)
+---------+-----------------+
| 'bar'   | 'value'         |
+=========+=================+
| (1, 3)  | ['b', 'b']      |
+---------+-----------------+
| (3, 5)  | ['a', 'd', 'c'] |
+---------+-----------------+
| (5, 7)  | []              |
+---------+-----------------+
| (7, 9)  | ['a']           |
+---------+-----------------+
| (9, 11) | ['b']           |
+---------+-----------------+

>>> # aggregate multiple fields
... from collections import OrderedDict
>>> aggregation = OrderedDict()
>>> aggregation['foocount'] = len
>>> aggregation['foojoin'] = 'foo', strjoin('')
>>> aggregation['foolist'] = 'foo' # default is list
>>> table5 = rangeaggregate(table1, 'bar', 2, aggregation)
>>> look(table5)
+---------+------------+-----------+-----------------+
| 'bar'   | 'foocount' | 'foojoin' | 'foolist'       |
+=========+============+===========+=================+
| (1, 3)  |          2 | 'bb'      | ['b', 'b']      |
+---------+------------+-----------+-----------------+
| (3, 5)  |          3 | 'adc'     | ['a', 'd', 'c'] |
+---------+------------+-----------+-----------------+
| (5, 7)  |          0 | ''        | []              |
+---------+------------+-----------+-----------------+
| (7, 9)  |          1 | 'a'       | ['a']           |
+---------+------------+-----------+-----------------+
| (9, 11) |          1 | 'b'       | ['b']           |
+---------+------------+-----------+-----------------+

Changed in version 0.12.

Changed signature to simplify and make consistent with aggregate().

Changed in version 0.24.

The provided key is used in the output header instead of ‘key’.

petl.rangecounts(table, key, width, minv=None, maxv=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows into bins then count the number of rows in each bin. E.g.:

>>> from petl import rangecounts, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 3     |
+-------+-------+
| 'a'   | 7     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'b'   | 1     |
+-------+-------+
| 'b'   | 9     |
+-------+-------+
| 'c'   | 4     |
+-------+-------+
| 'd'   | 3     |
+-------+-------+

>>> table2 = rangecounts(table1, 'bar', 2)
>>> look(table2)
+---------+---------+
| 'key'   | 'value' |
+=========+=========+
| (1, 3)  | 2       |
+---------+---------+
| (3, 5)  | 3       |
+---------+---------+
| (5, 7)  | 0       |
+---------+---------+
| (7, 9)  | 1       |
+---------+---------+
| (9, 11) | 1       |
+---------+---------+

See also rangeaggregate().

petl.rowreduce(table, key, reducer, fields=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows under the given key then apply reducer to produce a single output row for each input group of rows. E.g.:

>>> from petl import rowreduce, look    
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 3     |
+-------+-------+
| 'a'   | 7     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'b'   | 1     |
+-------+-------+
| 'b'   | 9     |
+-------+-------+
| 'c'   | 4     |
+-------+-------+

>>> def reducer(key, rows):
...     return [key, sum(row[1] for row in rows)]
... 
>>> table2 = rowreduce(table1, key='foo', reducer=reducer, fields=['foo', 'barsum'])
>>> look(table2)
+-------+----------+
| 'foo' | 'barsum' |
+=======+==========+
| 'a'   | 10       |
+-------+----------+
| 'b'   | 12       |
+-------+----------+
| 'c'   | 4        |
+-------+----------+

N.B., this is not strictly a “reduce” in the sense of the standard Python reduce() function, i.e., the reducer function is not applied recursively to values within a group, rather it is applied once to each row group as a whole.

See also aggregate() and fold().

Changed in version 0.12.

Was previously deprecated, now resurrected as it is a useful function in it’s own right.

petl.recordreduce(table, key, reducer, fields=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Deprecated since version 0.9.

Use rowreduce() instead.

petl.rangerowreduce(table, key, width, reducer, fields=None, minv=None, maxv=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Group rows into bins of a given width under the given numeric key then apply reducer to produce a single output row for each input group of rows. E.g.:

>>> from petl import rangerowreduce, look
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 3     |
+-------+-------+
| 'a'   | 7     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'b'   | 1     |
+-------+-------+
| 'b'   | 9     |
+-------+-------+
| 'c'   | 4     |
+-------+-------+

>>> def reducer(key, rows):
...     return [key[0], key[1], ''.join(row[0] for row in rows)]
... 
>>> table2 = rangerowreduce(table1, 'bar', 2, reducer=reducer, fields=['frombar', 'tobar', 'foos'])
>>> look(table2)
+-----------+---------+--------+
| 'frombar' | 'tobar' | 'foos' |
+===========+=========+========+
| 1         | 3       | 'bb'   |
+-----------+---------+--------+
| 3         | 5       | 'ac'   |
+-----------+---------+--------+
| 5         | 7       | ''     |
+-----------+---------+--------+
| 7         | 9       | 'a'    |
+-----------+---------+--------+
| 9         | 11      | 'b'    |
+-----------+---------+--------+

N.B., this is not strictly a “reduce” in the sense of the standard Python reduce() function, i.e., the reducer function is not applied recursively to values within a group, rather it is applied once to each row group as a whole.

See also rangeaggregate() and rangecounts().

Changed in version 0.12.

Was previously deprecated, now resurrected as it is a useful function in it’s own right.

petl.rangerecordreduce(table, key, width, reducer, fields=None, minv=None, maxv=None, failonerror=False, presorted=False, buffersize=None, tempdir=None, cache=True)

Reduce records grouped into bins under the given key via an arbitrary function.

Deprecated since version 0.9.

Use rangeaggregate() instead.

petl.mergeduplicates(table, key, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Merge duplicate rows under the given key. E.g.:

>>> from petl import mergeduplicates, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | 2.7   |
+-------+-------+-------+
| 'B'   | 2     | None  |
+-------+-------+-------+
| 'D'   | 3     | 9.4   |
+-------+-------+-------+
| 'B'   | None  | 7.8   |
+-------+-------+-------+
| 'E'   | None  | 42.0  |
+-------+-------+-------+
| 'D'   | 3     | 12.3  |
+-------+-------+-------+
| 'A'   | 2     | None  |
+-------+-------+-------+

>>> table2 = mergeduplicates(table1, 'foo')
>>> look(table2)
+-------+------------------+-----------------------+
| 'foo' | 'bar'            | 'baz'                 |
+=======+==================+=======================+
| 'A'   | Conflict([1, 2]) | 2.7                   |
+-------+------------------+-----------------------+
| 'B'   | 2                | 7.8                   |
+-------+------------------+-----------------------+
| 'D'   | 3                | Conflict([9.4, 12.3]) |
+-------+------------------+-----------------------+
| 'E'   | None             | 42.0                  |
+-------+------------------+-----------------------+

Missing values are overridden by non-missing values. Conflicting values are reported as an instance of the Conflict class (sub-class of frozenset).

If presorted is True, it is assumed that the data are already sorted by the given key, and the buffersize, tempdir and cache arguments are ignored. Otherwise, the data are sorted, see also the discussion of the buffersize, tempdir and cache arguments under the sort() function.

Changed in version 0.3.

Previously conflicts were reported as a list, this is changed to a tuple in version 0.3.

Changed in version 0.10.

Renamed from ‘mergereduce’ to ‘mergeduplicates’. Conflicts now reported as instance of Conflict.

petl.merge(*tables, **kwargs)

Convenience function to combine multiple tables (via mergesort()) then combine duplicate rows by merging under the given key (via mergeduplicates()). E.g.:

>>> from petl import look, merge
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'A'   | True  |
+-------+-------+-------+
| 2     | 'B'   | None  |
+-------+-------+-------+
| 4     | 'C'   | True  |
+-------+-------+-------+

>>> look(table2)
+-------+-------+--------+
| 'bar' | 'baz' | 'quux' |
+=======+=======+========+
| 'A'   | True  | 42.0   |
+-------+-------+--------+
| 'B'   | False | 79.3   |
+-------+-------+--------+
| 'C'   | False | 12.4   |
+-------+-------+--------+

>>> table3 = merge(table1, table2, key='bar')
>>> look(table3)
+-------+-------+-------------------------+--------+
| 'bar' | 'foo' | 'baz'                   | 'quux' |
+=======+=======+=========================+========+
| 'A'   | 1     | True                    | 42.0   |
+-------+-------+-------------------------+--------+
| 'B'   | 2     | False                   | 79.3   |
+-------+-------+-------------------------+--------+
| 'C'   | 4     | Conflict([False, True]) | 12.4   |
+-------+-------+-------------------------+--------+

Keyword arguments are the same as for mergesort(), except key is required.

Changed in version 0.9.

Now uses mergesort(), should be more efficient for presorted inputs.

petl.fold(table, key, f, value=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Reduce rows recursively via the Python standard reduce() function. E.g.:

>>> from petl import fold, look
>>> look(table1)
+------+---------+
| 'id' | 'count' |
+======+=========+
| 1    | 3       |
+------+---------+
| 1    | 5       |
+------+---------+
| 2    | 4       |
+------+---------+
| 2    | 8       |
+------+---------+

>>> import operator
>>> table2 = fold(table1, 'id', operator.add, 'count', presorted=True)
>>> look(table2)
+-------+---------+
| 'key' | 'value' |
+=======+=========+
| 1     | 8       |
+-------+---------+
| 2     | 12      |
+-------+---------+

See also aggregate(), rowreduce().

New in version 0.10.

petl.multirangeaggregate(table, keys, widths, aggregation, value=None, mins=None, maxs=None)

Group rows at multiple levels then aggregate whole rows or specified values. E.g.:

>>> from petl import look, multirangeaggregate
>>> look(table1)
+-----+-----+-----+
| 'x' | 'y' | 'z' |
+=====+=====+=====+
| 1   | 3   | 9   |
+-----+-----+-----+
| 2   | 3   | 12  |
+-----+-----+-----+
| 4   | 2   | 17  |
+-----+-----+-----+
| 2   | 7   | 3   |
+-----+-----+-----+
| 1   | 6   | 1   |
+-----+-----+-----+

>>> table2 = multirangeaggregate(table1, keys=('x', 'y'), widths=(2, 2), aggregation=sum, mins=(0, 0), maxs=(4, 4), value='z')
>>> look(table2)
+------------------+---------+
| 'key'            | 'value' |
+==================+=========+
| ((0, 2), (0, 2)) | 0       |
+------------------+---------+
| ((0, 2), (2, 4)) | 9       |
+------------------+---------+
| ((2, 4), (0, 2)) | 0       |
+------------------+---------+
| ((2, 4), (2, 4)) | 29      |
+------------------+---------+

New in version 0.12.

petl.groupcountdistinctvalues(table, key, value)

Group by the key field then count the number of distinct values in the value field.

New in version 0.14.

petl.groupselectfirst(table, key)

Group by the key field then return the first row within each group.

New in version 0.14.

petl.groupselectmin(table, key, value)

Group by the key field then return the row with the maximum of the value field within each group. N.B., will only return one row for each group, even if multiple rows have the same (maximum) value.

New in version 0.14.

petl.groupselectmax(table, key, value)

Group by the key field then return the row with the minimum of the value field within each group. N.B., will only return one row for each group, even if multiple rows have the same (maximum) value.

New in version 0.14.

Reshaping tables

petl.melt(table, key=None, variables=None, variablefield='variable', valuefield='value')

Reshape a table, melting fields into data. E.g.:

>>> from petl import melt, look
>>> look(table1)
+------+----------+-------+
| 'id' | 'gender' | 'age' |
+======+==========+=======+
| 1    | 'F'      | 12    |
+------+----------+-------+
| 2    | 'M'      | 17    |
+------+----------+-------+
| 3    | 'M'      | 16    |
+------+----------+-------+

>>> table2 = melt(table1, 'id')
>>> look(table2)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 1    | 'gender'   | 'F'     |
+------+------------+---------+
| 1    | 'age'      | 12      |
+------+------------+---------+
| 2    | 'gender'   | 'M'     |
+------+------------+---------+
| 2    | 'age'      | 17      |
+------+------------+---------+
| 3    | 'gender'   | 'M'     |
+------+------------+---------+
| 3    | 'age'      | 16      |
+------+------------+---------+

>>> # compound keys are supported
... look(table3)
+------+--------+----------+----------+
| 'id' | 'time' | 'height' | 'weight' |
+======+========+==========+==========+
| 1    | 11     | 66.4     | 12.2     |
+------+--------+----------+----------+
| 2    | 16     | 53.2     | 17.3     |
+------+--------+----------+----------+
| 3    | 12     | 34.5     | 9.4      |
+------+--------+----------+----------+

>>> table4 = melt(table3, key=['id', 'time'])
>>> look(table4)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1    | 11     | 'height'   | 66.4    |
+------+--------+------------+---------+
| 1    | 11     | 'weight'   | 12.2    |
+------+--------+------------+---------+
| 2    | 16     | 'height'   | 53.2    |
+------+--------+------------+---------+
| 2    | 16     | 'weight'   | 17.3    |
+------+--------+------------+---------+
| 3    | 12     | 'height'   | 34.5    |
+------+--------+------------+---------+
| 3    | 12     | 'weight'   | 9.4     |
+------+--------+------------+---------+

>>> # a subset of variable fields can be selected
... table5 = melt(table3, key=['id', 'time'], variables=['height'])
>>> look(table5)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1    | 11     | 'height'   | 66.4    |
+------+--------+------------+---------+
| 2    | 16     | 'height'   | 53.2    |
+------+--------+------------+---------+
| 3    | 12     | 'height'   | 34.5    |
+------+--------+------------+---------+

See also recast().

petl.recast(table, key=None, variablefield='variable', valuefield='value', samplesize=1000, reducers=None, missing=None)

Recast molten data. E.g.:

>>> from petl import recast, look
>>> look(table1)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 3    | 'age'      | 16      |
+------+------------+---------+
| 1    | 'gender'   | 'F'     |
+------+------------+---------+
| 2    | 'gender'   | 'M'     |
+------+------------+---------+
| 2    | 'age'      | 17      |
+------+------------+---------+
| 1    | 'age'      | 12      |
+------+------------+---------+
| 3    | 'gender'   | 'M'     |
+------+------------+---------+

>>> table2 = recast(table1)
>>> look(table2)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1    | 12    | 'F'      |
+------+-------+----------+
| 2    | 17    | 'M'      |
+------+-------+----------+
| 3    | 16    | 'M'      |
+------+-------+----------+

>>> # specifying variable and value fields
... look(table3)
+------+----------+--------+
| 'id' | 'vars'   | 'vals' |
+======+==========+========+
| 3    | 'age'    | 16     |
+------+----------+--------+
| 1    | 'gender' | 'F'    |
+------+----------+--------+
| 2    | 'gender' | 'M'    |
+------+----------+--------+
| 2    | 'age'    | 17     |
+------+----------+--------+
| 1    | 'age'    | 12     |
+------+----------+--------+
| 3    | 'gender' | 'M'    |
+------+----------+--------+

>>> table4 = recast(table3, variablefield='vars', valuefield='vals')
>>> look(table4)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1    | 12    | 'F'      |
+------+-------+----------+
| 2    | 17    | 'M'      |
+------+-------+----------+
| 3    | 16    | 'M'      |
+------+-------+----------+

>>> # if there are multiple values for each key/variable pair, and no reducers
... # function is provided, then all values will be listed
... look(table6)
+------+--------+------------+---------+
| 'id' | 'time' | 'variable' | 'value' |
+======+========+============+=========+
| 1    | 11     | 'weight'   | 66.4    |
+------+--------+------------+---------+
| 1    | 14     | 'weight'   | 55.2    |
+------+--------+------------+---------+
| 2    | 12     | 'weight'   | 53.2    |
+------+--------+------------+---------+
| 2    | 16     | 'weight'   | 43.3    |
+------+--------+------------+---------+
| 3    | 12     | 'weight'   | 34.5    |
+------+--------+------------+---------+
| 3    | 17     | 'weight'   | 49.4    |
+------+--------+------------+---------+

>>> table7 = recast(table6, key='id')
>>> look(table7)
+------+--------------+
| 'id' | 'weight'     |
+======+==============+
| 1    | [66.4, 55.2] |
+------+--------------+
| 2    | [53.2, 43.3] |
+------+--------------+
| 3    | [34.5, 49.4] |
+------+--------------+

>>> # multiple values can be reduced via an aggregation function
... def mean(values):
...     return float(sum(values)) / len(values)
...
>>> table8 = recast(table6, key='id', reducers={'weight': mean})
>>> look(table8)
+------+--------------------+
| 'id' | 'weight'           |
+======+====================+
| 1    | 60.800000000000004 |
+------+--------------------+
| 2    | 48.25              |
+------+--------------------+
| 3    | 41.95              |
+------+--------------------+

>>> # missing values are padded with whatever is provided via the missing
... # keyword argument (None by default)
... look(table9)
+------+------------+---------+
| 'id' | 'variable' | 'value' |
+======+============+=========+
| 1    | 'gender'   | 'F'     |
+------+------------+---------+
| 2    | 'age'      | 17      |
+------+------------+---------+
| 1    | 'age'      | 12      |
+------+------------+---------+
| 3    | 'gender'   | 'M'     |
+------+------------+---------+

>>> table10 = recast(table9, key='id')
>>> look(table10)
+------+-------+----------+
| 'id' | 'age' | 'gender' |
+======+=======+==========+
| 1    | 12    | 'F'      |
+------+-------+----------+
| 2    | 17    | None     |
+------+-------+----------+
| 3    | None  | 'M'      |
+------+-------+----------+

Note that the table is scanned once to discover variables, then a second time to reshape the data and recast variables as fields. How many rows are scanned in the first pass is determined by the samplesize argument.

See also melt().

petl.transpose(table)

Transpose rows into columns. E.g.:

>>> from petl import transpose, look
>>> look(table1)
+------+----------+
| 'id' | 'colour' |
+======+==========+
| 1    | 'blue'   |
+------+----------+
| 2    | 'red'    |
+------+----------+
| 3    | 'purple' |
+------+----------+
| 5    | 'yellow' |
+------+----------+
| 7    | 'orange' |
+------+----------+

>>> table2 = transpose(table1)
>>> look(table2)
+----------+--------+-------+----------+----------+----------+
| 'id'     | 1      | 2     | 3        | 5        | 7        |
+==========+========+=======+==========+==========+==========+
| 'colour' | 'blue' | 'red' | 'purple' | 'yellow' | 'orange' |
+----------+--------+-------+----------+----------+----------+

See also recast().

petl.pivot(table, f1, f2, f3, aggfun, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True)

Construct a pivot table. E.g.:

>>> from petl import pivot, look
>>> look(table1)
+----------+----------+---------+---------+
| 'region' | 'gender' | 'style' | 'units' |
+==========+==========+=========+=========+
| 'east'   | 'boy'    | 'tee'   | 12      |
+----------+----------+---------+---------+
| 'east'   | 'boy'    | 'golf'  | 14      |
+----------+----------+---------+---------+
| 'east'   | 'boy'    | 'fancy' | 7       |
+----------+----------+---------+---------+
| 'east'   | 'girl'   | 'tee'   | 3       |
+----------+----------+---------+---------+
| 'east'   | 'girl'   | 'golf'  | 8       |
+----------+----------+---------+---------+
| 'east'   | 'girl'   | 'fancy' | 18      |
+----------+----------+---------+---------+
| 'west'   | 'boy'    | 'tee'   | 12      |
+----------+----------+---------+---------+
| 'west'   | 'boy'    | 'golf'  | 15      |
+----------+----------+---------+---------+
| 'west'   | 'boy'    | 'fancy' | 8       |
+----------+----------+---------+---------+
| 'west'   | 'girl'   | 'tee'   | 6       |
+----------+----------+---------+---------+

>>> table2 = pivot(table1, 'region', 'gender', 'units', sum)
>>> look(table2)
+----------+-------+--------+
| 'region' | 'boy' | 'girl' |
+==========+=======+========+
| 'east'   | 33    | 29     |
+----------+-------+--------+
| 'west'   | 35    | 23     |
+----------+-------+--------+

>>> table3 = pivot(table1, 'region', 'style', 'units', sum)
>>> look(table3)
+----------+---------+--------+-------+
| 'region' | 'fancy' | 'golf' | 'tee' |
+==========+=========+========+=======+
| 'east'   | 25      | 22     | 15    |
+----------+---------+--------+-------+
| 'west'   | 9       | 31     | 18    |
+----------+---------+--------+-------+

>>> table4 = pivot(table1, 'gender', 'style', 'units', sum)
>>> look(table4)
+----------+---------+--------+-------+
| 'gender' | 'fancy' | 'golf' | 'tee' |
+==========+=========+========+=======+
| 'boy'    | 15      | 29     | 24    |
+----------+---------+--------+-------+
| 'girl'   | 19      | 24     | 9     |
+----------+---------+--------+-------+

See also recast().

petl.flatten(table)

Convert a table to a sequence of values in row-major order. E.g.:

>>> from petl import flatten, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 'A'   | 1     | True  |
+-------+-------+-------+
| 'C'   | 7     | False |
+-------+-------+-------+
| 'B'   | 2     | False |
+-------+-------+-------+
| 'C'   | 9     | True  |
+-------+-------+-------+

>>> list(flatten(table1))
['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9, True]

See also unflatten().

New in version 0.7.

petl.unflatten(*args, **kwargs)

Convert a sequence of values in row-major order into a table. E.g.:

>>> from petl import unflatten, look
>>> input = ['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9]
>>> table = unflatten(input, 3)
>>> look(table)
+------+------+-------+
| 'f0' | 'f1' | 'f2'  |
+======+======+=======+
| 'A'  | 1    | True  |
+------+------+-------+
| 'C'  | 7    | False |
+------+------+-------+
| 'B'  | 2    | False |
+------+------+-------+
| 'C'  | 9    | None  |
+------+------+-------+

>>> # a table and field name can also be provided as arguments
... look(table1)
+---------+
| 'lines' |
+=========+
| 'A'     |
+---------+
| 1       |
+---------+
| True    |
+---------+
| 'C'     |
+---------+
| 7       |
+---------+
| False   |
+---------+
| 'B'     |
+---------+
| 2       |
+---------+
| False   |
+---------+
| 'C'     |
+---------+

>>> table2 = unflatten(table1, 'lines', 3)
>>> look(table2)
+------+------+-------+
| 'f0' | 'f1' | 'f2'  |
+======+======+=======+
| 'A'  | 1    | True  |
+------+------+-------+
| 'C'  | 7    | False |
+------+------+-------+
| 'B'  | 2    | False |
+------+------+-------+
| 'C'  | 9    | None  |
+------+------+-------+

See also flatten().

New in version 0.7.

Filling missing values

petl.filldown(table, *fields, **kwargs)

Replace missing values with non-missing values from the row above. E.g.:

>>> from petl import filldown, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | None  | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | None  |
+-------+-------+-------+
| 2     | None  | None  |
+-------+-------+-------+
| 2     | None  | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | None  |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

>>> table2 = filldown(table1)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | 'a'   | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | 0.23  |
+-------+-------+-------+
| 2     | 'b'   | 0.23  |
+-------+-------+-------+
| 2     | 'b'   | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | 0.72  |
+-------+-------+-------+

>>> table3 = filldown(table1, 'bar')
>>> look(table3)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | 'a'   | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | None  |
+-------+-------+-------+
| 2     | 'b'   | None  |
+-------+-------+-------+
| 2     | 'b'   | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | None  |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

>>> table4 = filldown(table1, 'bar', 'baz')
>>> look(table4)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | 'a'   | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | 0.23  |
+-------+-------+-------+
| 2     | 'b'   | 0.23  |
+-------+-------+-------+
| 2     | 'b'   | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | 0.56  |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

New in version 0.11.

petl.fillright(table, missing=None)

Replace missing values with preceding non-missing values. E.g.:

>>> from petl import fillright, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | None  | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | None  |
+-------+-------+-------+
| 2     | None  | None  |
+-------+-------+-------+
| 2     | None  | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | None  |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

>>> table2 = fillright(table1)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | 'a'   |
+-------+-------+-------+
| 1     | 1     | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | 'b'   |
+-------+-------+-------+
| 2     | 2     | 2     |
+-------+-------+-------+
| 2     | 2     | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | 'c'   |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

New in version 0.11.

petl.fillleft(table, missing=None)

Replace missing values with following non-missing values. E.g.:

>>> from petl import fillleft, look
>>> look(table1)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | None  | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | None  |
+-------+-------+-------+
| 2     | None  | None  |
+-------+-------+-------+
| None  | None  | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | None  |
+-------+-------+-------+
| None  | 'c'   | 0.72  |
+-------+-------+-------+

>>> table2 = fillleft(table1)
>>> look(table2)
+-------+-------+-------+
| 'foo' | 'bar' | 'baz' |
+=======+=======+=======+
| 1     | 'a'   | None  |
+-------+-------+-------+
| 1     | 0.23  | 0.23  |
+-------+-------+-------+
| 1     | 'b'   | None  |
+-------+-------+-------+
| 2     | None  | None  |
+-------+-------+-------+
| 0.56  | 0.56  | 0.56  |
+-------+-------+-------+
| 2     | 'c'   | None  |
+-------+-------+-------+
| 'c'   | 'c'   | 0.72  |
+-------+-------+-------+

New in version 0.11.