petl.push - Branching Pipelines

New in version 0.10.

Introduction

This module provides some functions for setting up branching data transformation pipelines.

The general pattern is to define the pipeline, connecting components together via the pipe() method call, then pushing data through the pipeline via the push() method call at the top of the pipeline. E.g.:

>>> from petl import fromcsv
>>> source = fromcsv('fruit.csv')
>>> from petl.push import *
>>> p = partition('fruit')
>>> p.pipe('orange', tocsv('oranges.csv'))
>>> p.pipe('banana', tocsv('bananas.csv'))
>>> p.push(source)

The pipe operator can also be used to connect components in the pipeline, by analogy with the use of the pipe character in unix/linux shells, e.g.:

>>> from petl import fromcsv
>>> source = fromcsv('fruit.csv')
>>> from petl.push import *
>>> p = partition('fruit')
>>> p | ('orange', tocsv('oranges.csv')
>>> p | ('banana', tocsv('bananas.csv')
>>> p.push(source)

Push Functions

petl.push.partition(discriminator)[source]

Partition rows based on values of a field or results of applying a function on the row. E.g.:

>>> from petl.push import partition, tocsv
>>> p = partition('fruit')
>>> p.pipe('orange', tocsv('oranges.csv'))
>>> p.pipe('banana', tocsv('bananas.csv'))
>>> p.push(sometable)

In the example above, rows where the value of the ‘fruit’ field equals ‘orange’ are piped to the ‘oranges.csv’ file, and rows where the ‘fruit’ field equals ‘banana’ are piped to the ‘bananas.csv’ file.

petl.push.sort(key=None, reverse=False, buffersize=None)[source]

Sort rows based on some key field or fields. E.g.:

>>> from petl.push import sort, tocsv
>>> p = sort('foo')
>>> p.pipe(tocsv('sorted_by_foo.csv'))
>>> p.push(sometable)
petl.push.duplicates(key)[source]

Report rows with duplicate key values. E.g.:

>>> from petl.push import duplicates, tocsv
>>> p = duplicates('foo')
>>> p.pipe(tocsv('foo_dups.csv'))
>>> p.pipe('remainder', tocsv('foo_uniq.csv'))
>>> p.push(sometable)

N.B., assumes data are already sorted by the given key.

petl.push.unique(key)[source]

Report rows with unique key values. E.g.:

>>> from petl.push import unique, tocsv
>>> p = unique('foo')
>>> p.pipe(tocsv('foo_uniq.csv'))
>>> p.pipe('remainder', tocsv('foo_dups.csv'))
>>> p.push(sometable)

N.B., assumes data are already sorted by the given key. See also duplicates().

petl.push.diff()[source]

Find rows that differ between two tables. E.g.:

>>> from petl.push import diff, tocsv
>>> p = diff()
>>> p.pipe('+', tocsv('added.csv'))
>>> p.pipe('-', tocsv('subtracted.csv'))
>>> p.pipe(tocsv('common.csv'))
>>> p.push(sometable, someothertable)
petl.push.tocsv(filename, dialect=<class csv.excel at 0x3a0b668>, **kwargs)[source]

Push rows to a CSV file. E.g.:

>>> from petl.push import tocsv
>>> p = tocsv('example.csv')
>>> p.push(sometable)
petl.push.totsv(filename, dialect=<class csv.excel_tab at 0x3a0b6d0>, **kwargs)[source]

Push rows to a tab-delimited file. E.g.:

>>> from petl.push import totsv
>>> p = totsv('example.tsv')
>>> p.push(sometable)
petl.push.topickle(filename, protocol=-1)[source]

Push rows to a pickle file. E.g.:

>>> from petl.push import topickle
>>> p = topickle('example.pickle')
>>> p.push(sometable)
Read the Docs v: v0.18
Versions
latest
v0.18
v0.17
v0.16
v0.15
v0.14
v0.13
v0.12
v0.11
v0.10
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.