Extract/Load - reading/writing tables from files, databases and other sources

Extract (read)

The “from...” functions read a table from a file-like source or database. For everything except fromdb() the source argument provides information about where to read the underlying data from. If the source argument is None or a string it is interpreted as follows:

  • None - read from stdin
  • string starting with ‘http://‘, ‘https://‘ or ‘ftp://‘ - read from URL
  • string ending with ‘.gz’ or ‘.bgz’ - read from file via gzip decompression
  • string ending with ‘.bz2’ - read from file via bz2 decompression
  • any other string - read directly from file

Some helper classes are also available for reading from other types of file-like sources, e.g., reading data from a Zip file, a string or a subprocess, see the section on I/O helper classes below for more information.

Load (write)

The “to...” functions write data from a table to a file-like source or database. For functions that accept a source argument, if the source argument is None or a string it is interpreted as follows:

  • None - write to stdout
  • string ending with ‘.gz’ or ‘.bgz’ - write to file via gzip decompression
  • string ending with ‘.bz2’ - write to file via bz2 decompression
  • any other string - write directly to file

Some helper classes are also available for writing to other types of file-like sources, e.g., writing to a Zip file or string buffer, see the section on I/O helper classes below for more information.

Delimited files

petl.fromcsv(source=None, dialect=<class csv.excel at 0x7f427191aae0>, **kwargs)

Wrapper for the standard csv.reader() function. Returns a table providing access to the data in the given delimited file. E.g.:

>>> import csv
>>> # set up a CSV file to demonstrate with
... with open('test.csv', 'wb') as f:
...     writer = csv.writer(f)
...     writer.writerow(['foo', 'bar'])
...     writer.writerow(['a', 1])
...     writer.writerow(['b', 2])
...     writer.writerow(['c', 2])
...
>>> # now demonstrate the use of petl.fromcsv
... from petl import fromcsv, look
>>> testcsv = fromcsv('test.csv')
>>> look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+

The filename argument is the path of the delimited file, all other keyword arguments are passed to csv.reader(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.

Note that all data values are strings, and any intended numeric values will need to be converted, see also convert().

Supports transparent reading from URLs, .gz and .bz2 files.

petl.tocsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, write_header=True, **kwargs)

Write the table to a CSV file. E.g.:

>>> from petl import tocsv, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> tocsv(table, 'test.csv')
>>> # look what it did
... from petl import fromcsv
>>> look(fromcsv('test.csv'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+

The filename argument is the path of the delimited file, and the optional write_header argument specifies whether to include the field names in the delimited file. All other keyword arguments are passed to csv.writer(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.

Note that if a file already exists at the given location, it will be overwritten.

Supports transparent writing to .gz and .bz2 files.

petl.appendcsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, **kwargs)

Append data rows to an existing CSV file. E.g.:

>>> # look at an existing CSV file
... from petl import look, fromcsv
>>> testcsv = fromcsv('test.csv')
>>> look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+

>>> # append some data
... look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd'   | 7     |
+-------+-------+
| 'e'   | 42    |
+-------+-------+
| 'f'   | 12    |
+-------+-------+

>>> from petl import appendcsv
>>> appendcsv(table, 'test.csv')
>>> # look what it did
... look(testcsv)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+
| 'd'   | '7'   |
+-------+-------+
| 'e'   | '42'  |
+-------+-------+
| 'f'   | '12'  |
+-------+-------+

The filename argument is the path of the delimited file, all other keyword arguments are passed to csv.writer(). So, e.g., to override the delimiter from the default CSV dialect, provide the delimiter keyword argument.

Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. See also the cat() function.

Supports transparent writing to .gz and .bz2 files.

petl.teecsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, write_header=True, **kwargs)

Return a table that writes rows to a CSV file as they are iterated over.

New in version 0.25.

petl.fromtsv(source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, **kwargs)

Convenience function, as fromcsv() but with different default dialect (tab delimited).

Supports transparent reading from URLs, .gz and .bz2 files.

New in version 0.9.

petl.totsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, write_header=True, **kwargs)

Convenience function, as tocsv() but with different default dialect (tab delimited).

Supports transparent writing to .gz and .bz2 files.

New in version 0.9.

petl.appendtsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, **kwargs)

Convenience function, as appendcsv() but with different default dialect (tab delimited).

Supports transparent writing to .gz and .bz2 files.

New in version 0.9.

petl.teetsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, write_header=True, **kwargs)

Convenience function, as teecsv() but with different default dialect (tab delimited).

New in version 0.25.

petl.fromucsv(source=None, dialect=<class csv.excel at 0x7f427191aae0>, encoding='utf-8', **kwargs)

Returns a table containing unicode data extracted from a delimited file via the given encoding. Like fromcsv() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.toucsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, encoding='utf-8', write_header=True, **kwargs)

Write the table to a CSV file via the given encoding. Like tocsv() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.appenducsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, encoding='utf-8', **kwargs)

Append the table to a CSV file via the given encoding. Like appendcsv() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.teeucsv(table, source=None, dialect=<class csv.excel at 0x7f427191aae0>, encoding='utf-8', write_header=True, **kwargs)

Return a table that writes rows to a Unicode CSV file as they are iterated over.

New in version 0.25.

petl.fromutsv(source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, **kwargs)

Convenience function, as fromucsv() but with different default dialect (tab delimited).

New in version 0.19.

petl.toutsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, write_header=True, **kwargs)

Convenience function, as toucsv() but with different default dialect (tab delimited).

New in version 0.19.

petl.appendutsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, **kwargs)

Convenience function, as appenducsv() but with different default dialect (tab delimited).

New in version 0.19.

petl.teeutsv(table, source=None, dialect=<class csv.excel_tab at 0x7f427191ab48>, encoding='utf-8', write_header=True, **kwargs)

Convenience function, as teeucsv() but with different default dialect (tab delimited).

New in version 0.25.

Pickle files

petl.frompickle(source=None)

Returns a table providing access to the data pickled in the given file. The rows in the table should have been pickled to the file one at a time. E.g.:

>>> import pickle
>>> # set up a file to demonstrate with
... with open('test.dat', 'wb') as f:
...     pickle.dump(['foo', 'bar'], f)
...     pickle.dump(['a', 1], f)
...     pickle.dump(['b', 2], f)
...     pickle.dump(['c', 2.5], f)
...
>>> # now demonstrate the use of petl.frompickle
... from petl import frompickle, look
>>> testdat = frompickle('test.dat')
>>> look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2.5   |
+-------+-------+

Supports transparent reading from URLs, .gz and .bz2 files.

petl.topickle(table, source=None, protocol=-1)

Write the table to a pickle file. E.g.:

>>> from petl import topickle, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> topickle(table, 'test.dat')
>>> # look what it did
... from petl import frompickle
>>> look(frompickle('test.dat'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

Note that if a file already exists at the given location, it will be overwritten.

The pickle file format preserves type information, i.e., reading and writing is round-trippable.

Supports transparent writing to .gz and .bz2 files.

petl.appendpickle(table, source=None, protocol=-1)

Append data to an existing pickle file. E.g.:

>>> from petl import look, frompickle
>>> # inspect an existing pickle file
... testdat = frompickle('test.dat')
>>> look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> # append some data
... from petl import appendpickle
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd'   | 7     |
+-------+-------+
| 'e'   | 42    |
+-------+-------+
| 'f'   | 12    |
+-------+-------+

>>> appendpickle(table, 'test.dat')
>>> # look what it did
... look(testdat)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+
| 'd'   | 7     |
+-------+-------+
| 'e'   | 42    |
+-------+-------+
| 'f'   | 12    |
+-------+-------+

Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. See also the cat() function.

Supports transparent writing to .gz and .bz2 files.

petl.teepickle(table, source=None, protocol=-1)

Return a table that writes rows to a pickle file as they are iterated over.

New in version 0.25.

Text files

petl.fromtext(source=None, header=('lines', ), strip=None)

Construct a table from lines in the given text file. E.g.:

>>> # example data
... with open('test.txt', 'w') as f:
...     f.write('a\t1\n')
...     f.write('b\t2\n')
...     f.write('c\t3\n')
...
>>> from petl import fromtext, look
>>> table1 = fromtext('test.txt')
>>> look(table1)
+--------------+
| 'lines'      |
+==============+
| 'a\t1'      |
+--------------+
| 'b\t2'      |
+--------------+
| 'c\t3'      |
+--------------+

The fromtext() function provides a starting point for custom handling of text files. E.g., using capture():

>>> from petl import capture
>>> table2 = capture(table1, 'lines', '(.*)\\t(.*)$', ['foo', 'bar'])
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '3'   |
+-------+-------+

Supports transparent reading from URLs, .gz and .bz2 files.

Changed in version 0.4.

The strip() function is called on each line, which by default will remove leading and trailing whitespace, including the end-of-line character - use the strip keyword argument to specify alternative characters to strip.

petl.totext(table, source=None, template=None, prologue=None, epilogue=None)

Write the table to a text file. E.g.:

>>> from petl import totext, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> prologue = """{| class="wikitable"
... |-
... ! foo
... ! bar
... """
>>> template = """|-
... | {foo}
... | {bar}
... """
>>> epilogue = "|}"
>>> totext(table, 'test.txt', template, prologue, epilogue)
>>>
>>> # see what we did
... with open('test.txt') as f:
...     print f.read()
...
{| class="wikitable"
|-
! foo
! bar
|-
| a
| 1
|-
| b
| 2
|-
| c
| 2
|}

The template will be used to format each row via str.format.

Supports transparent writing to .gz and .bz2 files.

petl.appendtext(table, source=None, template=None, prologue=None, epilogue=None)

Append the table to a text file.

New in version 0.19.

petl.teetext(table, source=None, template=None, prologue=None, epilogue=None)

Return a table that writes rows to a text file as they are iterated over.

New in version 0.25.

petl.fromutext(source=None, header=(u'lines', ), encoding='utf-8', strip=None)

Construct a table from lines in the given text file via the given encoding. Like fromtext() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.toutext(table, source=None, encoding='utf-8', template=None, prologue=None, epilogue=None)

Write the table to a text file via the given encoding. Like totext() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.appendutext(table, source=None, encoding='utf-8', template=None, prologue=None, epilogue=None)

Append the table to a text file via the given encoding. Like appendtext() but accepts an additional encoding argument which should be one of the Python supported encodings. See also codecs.

New in version 0.19.

petl.teeutext(table, source=None, encoding='utf-8', template=None, prologue=None, epilogue=None)

Return a table that writes rows to a Unicode text file as they are iterated over.

New in version 0.25.

XML files

petl.fromxml(source, *args, **kwargs)

Access data in an XML file. E.g.:

>>> from petl import fromxml, look
>>> data = """<table>
...     <tr>
...         <td>foo</td><td>bar</td>
...     </tr>
...     <tr>
...         <td>a</td><td>1</td>
...     </tr>
...     <tr>
...         <td>b</td><td>2</td>
...     </tr>
...     <tr>
...         <td>c</td><td>2</td>
...     </tr>
... </table>"""
>>> with open('example1.xml', 'w') as f:
...     f.write(data)
...     f.close()
...
>>> table1 = fromxml('example1.xml', 'tr', 'td')
>>> look(table1)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+

If the data values are stored in an attribute, provide the attribute name as an extra positional argument, e.g.:

>>> data = """<table>
...     <tr>
...         <td v='foo'/><td v='bar'/>
...     </tr>
...     <tr>
...         <td v='a'/><td v='1'/>
...     </tr>
...     <tr>
...         <td v='b'/><td v='2'/>
...     </tr>
...     <tr>
...         <td v='c'/><td v='2'/>
...     </tr>
... </table>"""
>>> with open('example2.xml', 'w') as f:
...     f.write(data)
...     f.close()
...
>>> table2 = fromxml('example2.xml', 'tr', 'td', 'v')
>>> look(table2)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | '1'   |
+-------+-------+
| 'b'   | '2'   |
+-------+-------+
| 'c'   | '2'   |
+-------+-------+

Data values can also be extracted by providing a mapping of field names to element paths, e.g.:

>>> data = """<table>
...     <row>
...         <foo>a</foo><baz><bar v='1'/><bar v='3'/></baz>
...     </row>
...     <row>
...         <foo>b</foo><baz><bar v='2'/></baz>
...     </row>
...     <row>
...         <foo>c</foo><baz><bar v='2'/></baz>
...     </row>
... </table>"""
>>> with open('example3.xml', 'w') as f:
...     f.write(data)
...     f.close()
...
>>> table3 = fromxml('example3.xml', 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')})
>>> look(table3)
+-------+------------+
| 'foo' | 'bar'      |
+=======+============+
| 'a'   | ('1', '3') |
+-------+------------+
| 'b'   | '2'        |
+-------+------------+
| 'c'   | '2'        |
+-------+------------+

Note that the implementation is currently not streaming, i.e., the whole document is loaded into memory.

Supports transparent reading from URLs, .gz and .bz2 files.

New in version 0.4.

Changed in version 0.6.

If multiple elements match a given field, all values are reported as a tuple.

Changed in version 0.25.

If there is more than one element name used for row values, a tuple or list of paths can be provided, e.g., fromxml('example.html', './/tr', ('th', 'td')).

For writing to an XML file, see petl.totext().

HTML files

petl.tohtml(table, source=None, caption=None, representation=<type 'str'>, lineterminator='\r\n')

Write the table as HTML to a file. E.g.:

>>> from petl import tohtml, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> tohtml(table, 'test.html')

New in version 0.12.

Changed in version 0.17.1.

Added support for caption keyword argument to provide table caption in output.

petl.teehtml(table, source=None, caption=None, representation=<type 'str'>, lineterminator='\r\n')

Return a table that writes rows to an HTML file as they are iterated over.

New in version 0.25.

petl.touhtml(table, source=None, caption=None, encoding='utf-8', representation=<type 'unicode'>, lineterminator=u'\r\n')

Write the table as Unicode HTML to a file.

New in version 0.19.

petl.teeuhtml(table, source=None, caption=None, encoding='utf-8', representation=<type 'unicode'>, lineterminator='\r\n')

Return a table that writes rows to a Unicode HTML file as they are iterated over.

New in version 0.25.

JSON files

petl.fromjson(source, *args, **kwargs)

Extract data from a JSON file. The file must contain a JSON array as the top level object, and each member of the array will be treated as a row of data. E.g.:

>>> from petl import fromjson, look
>>> data = '[{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}]'
>>> with open('example1.json', 'w') as f:
...     f.write(data)
...
>>> table1 = fromjson('example1.json')
>>> look(table1)
+--------+--------+
| u'foo' | u'bar' |
+========+========+
| u'a'   | 1      |
+--------+--------+
| u'b'   | 2      |
+--------+--------+
| u'c'   | 2      |
+--------+--------+

If your JSON file does not fit this structure, you will need to parse it via json.load() and select the array to treat as the data, see also fromdicts().

Supports transparent reading from URLs, .gz and .bz2 files.

New in version 0.5.

petl.fromdicts(dicts, header=None)

View a sequence of Python dict as a table. E.g.:

>>> from petl import fromdicts, look
>>> dicts = [{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}]
>>> table = fromdicts(dicts)
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

See also fromjson().

New in version 0.5.

petl.tojson(table, source=None, prefix=None, suffix=None, *args, **kwargs)

Write a table in JSON format, with rows output as JSON objects. E.g.:

>>> from petl import tojson, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> tojson(table, 'example.json')
>>> # check what it did
... with open('example.json') as f:
...     print f.read()
...
[{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}]

Note that this is currently not streaming, all data is loaded into memory before being written to the file.

Supports transparent writing to .gz and .bz2 files.

New in version 0.5.

petl.tojsonarrays(table, source=None, prefix=None, suffix=None, output_header=False, *args, **kwargs)

Write a table in JSON format, with rows output as JSON arrays. E.g.:

>>> from petl import tojsonarrays, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> tojsonarrays(table, 'example.json')
>>> # check what it did
... with open('example.json') as f:
...     print f.read()
...
[["a", 1], ["b", 2], ["c", 2]]

Note that this is currently not streaming, all data is loaded into memory before being written to the file.

Supports transparent writing to .gz and .bz2 files.

New in version 0.11.

Databases

petl.fromsqlite3(source, query, *args, **kwargs)

Provides access to data from an sqlite3 database file via a given query. E.g.:

>>> import sqlite3
>>> from petl import look, fromsqlite3
>>> # set up a database to demonstrate with
>>> data = [['a', 1],
...         ['b', 2],
...         ['c', 2.0]]
>>> connection = sqlite3.connect('test.db')
>>> c = connection.cursor()
>>> c.execute('create table foobar (foo, bar)')
<sqlite3.Cursor object at 0x2240b90>
>>> for row in data:
...     c.execute('insert into foobar values (?, ?)', row)
...
<sqlite3.Cursor object at 0x2240b90>
<sqlite3.Cursor object at 0x2240b90>
<sqlite3.Cursor object at 0x2240b90>
>>> connection.commit()
>>> c.close()
>>>
>>> # now demonstrate the petl.fromsqlite3 function
... foobar = fromsqlite3('test.db', 'select * from foobar')
>>> look(foobar)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a'  | 1     |
+-------+-------+
| u'b'  | 2     |
+-------+-------+
| u'c'  | 2.0   |
+-------+-------+

Changed in version 0.10.2.

Either a database file name or a connection object can be given as the first argument.

petl.tosqlite3(table, filename_or_connection, tablename, create=False, commit=True)

Load data into a table in an sqlite3 database. Note that if the database table exists, it will be truncated, i.e., all existing rows will be deleted prior to inserting the new data. E.g.:

>>> from petl import tosqlite3, look
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

>>> # by default, if the table does not already exist, it will be created
... tosqlite3(table, 'test.db', 'foobar')
>>> # look what it did
... from petl import fromsqlite3
>>> look(fromsqlite3('test.db', 'select * from foobar'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a'  | 1     |
+-------+-------+
| u'b'  | 2     |
+-------+-------+
| u'c'  | 2     |
+-------+-------+

If the table does not exist and create=True then a table will be created using the field names in the table header. However, note that no type specifications will be included in the table creation statement and so column type affinities may be inappropriate.

Changed in version 0.10.2.

Either a database file name or a connection object can be given as the second argument.

Changed in version 0.21.

Default value for create argument changed to False.

petl.appendsqlite3(table, filename_or_connection, tablename, commit=True)

Load data into an existing table in an sqlite3 database. Note that the database table will be appended, i.e., the new data will be inserted into the table, and any existing rows will remain. E.g.:

>>> from petl import appendsqlite3, look
>>> look(moredata)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'd'   | 7     |
+-------+-------+
| 'e'   | 9     |
+-------+-------+
| 'f'   | 1     |
+-------+-------+

>>> appendsqlite3(moredata, 'test.db', 'foobar')
>>> # look what it did
... from petl import look, fromsqlite3
>>> look(fromsqlite3('test.db', 'select * from foobar'))
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| u'a'  | 1     |
+-------+-------+
| u'b'  | 2     |
+-------+-------+
| u'c'  | 2     |
+-------+-------+
| u'd'  | 7     |
+-------+-------+
| u'e'  | 9     |
+-------+-------+
| u'f'  | 1     |
+-------+-------+

Changed in version 0.10.2.

Either a database file name or a connection object can be given as the second argument.

petl.fromdb(dbo, query, *args, **kwargs)

Provides access to data from any DB-API 2.0 connection via a given query. E.g., using sqlite3:

>>> import sqlite3
>>> from petl import look, fromdb
>>> connection = sqlite3.connect('test.db')
>>> table = fromdb(connection, 'select * from foobar')
>>> look(table)

E.g., using psycopg2 (assuming you’ve installed it first):

>>> import psycopg2
>>> from petl import look, fromdb
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> table = fromdb(connection, 'select * from test')
>>> look(table)

E.g., using MySQLdb (assuming you’ve installed it first):

>>> import MySQLdb
>>> from petl import look, fromdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> table = fromdb(connection, 'select * from test')
>>> look(table)

Changed in version 0.10.2.

The first argument may also be a function that creates a cursor. E.g.:

>>> import psycopg2
>>> from petl import look, fromdb
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> mkcursor = lambda: connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
>>> table = fromdb(mkcursor, 'select * from test')
>>> look(table)

N.B., each call to the function should return a new cursor.

Changed in version 0.18.

Added support for server-side cursors.

Note that the default behaviour of most database servers and clients is for the entire result set for each query to be sent from the server to the client. If your query returns a large result set this can result in significant memory usage at the client. Some databases support server-side cursors which provide a means for client libraries to fetch result sets incrementally, reducing memory usage at the client.

To use a server-side cursor with a PostgreSQL database, e.g.:

>>> import psycopg2
>>> from petl import look, fromdb
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> table = fromdb(lambda: connection.cursor(name='arbitrary'), 'select * from test')
>>> look(table)

To use a server-side cursor with a MySQL database, e.g.:

>>> import MySQLdb
>>> from petl import look, fromdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> table = fromdb(lambda: connection.cursor(MySQLdb.cursors.SSCursor), 'select * from test')
>>> look(table)

For more information on server-side cursors see the following links:

petl.todb(table, dbo, tablename, schema=None, commit=True)

Load data into an existing database table via a DB-API 2.0 connection or cursor. Note that the database table will be truncated, i.e., all existing rows will be deleted prior to inserting the new data. E.g.:

>>> from petl import look, todb
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

... using sqlite3:

>>> import sqlite3
>>> connection = sqlite3.connect('test.db')
>>> # assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')

... using psycopg2:

>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> # assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')

... using MySQLdb:

>>> import MySQLdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> # tell MySQL to use standard quote character
... connection.cursor().execute('SET SQL_MODE=ANSI_QUOTES')
>>> # load data, assuming table "foobar" already exists in the database
... todb(table, connection, 'foobar')

N.B., for MySQL the statement SET SQL_MODE=ANSI_QUOTES is required to ensure MySQL uses SQL-92 standard quote characters.

Changed in version 0.10.2.

A cursor can also be provided instead of a connection, e.g.:

>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> cursor = connection.cursor()
>>> todb(table, cursor, 'foobar')
petl.appenddb(table, dbo, tablename, schema=None, commit=True)

Load data into an existing database table via a DB-API 2.0 connection or cursor. Note that the database table will be appended, i.e., the new data will be inserted into the table, and any existing rows will remain. E.g.:

>>> from petl import look, appenddb
>>> look(table)
+-------+-------+
| 'foo' | 'bar' |
+=======+=======+
| 'a'   | 1     |
+-------+-------+
| 'b'   | 2     |
+-------+-------+
| 'c'   | 2     |
+-------+-------+

... using sqlite3:

>>> import sqlite3
>>> connection = sqlite3.connect('test.db')
>>> # assuming table "foobar" already exists in the database
... appenddb(table, connection, 'foobar')

... using psycopg2:

>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> # assuming table "foobar" already exists in the database
... appenddb(table, connection, 'foobar')

... using MySQLdb:

>>> import MySQLdb
>>> connection = MySQLdb.connect(passwd="moonpie", db="thangs")
>>> # tell MySQL to use standard quote character
... connection.cursor().execute('SET SQL_MODE=ANSI_QUOTES')
>>> # load data, appending rows to table "foobar"
... appenddb(table, connection, 'foobar')

N.B., for MySQL the statement SET SQL_MODE=ANSI_QUOTES is required to ensure MySQL uses SQL-92 standard quote characters.

Changed in version 0.10.2.

A cursor can also be provided instead of a connection, e.g.:

>>> import psycopg2
>>> connection = psycopg2.connect("dbname=test user=postgres")
>>> cursor = connection.cursor()
>>> appenddb(table, cursor, 'foobar')

I/O helper classes

The following classes are helpers for extract (from...()) and load (to...()) functions that use a file-like data source. An instance of any of the following classes can be used as the source argument to data extraction functions like fromcsv() etc., with the exception of StdoutSource which is write-only. An instance of any of the following classes can also be used as the source argument to data loading functions like tocsv() etc., with the exception of StdinSource, URLSource and PopenSource which are read-only. The behaviour of each source can usually be configured by passing arguments to the constructor, see the source code of the petl.io module for full details.

class petl.FileSource(filename, **kwargs)
class petl.GzipSource(filename, **kwargs)
class petl.BZ2Source(filename, **kwargs)
class petl.ZipSource(filename, membername, pwd=None, **kwargs)
class petl.StdinSource
class petl.StdoutSource
class petl.URLSource(*args, **kwargs)
class petl.StringSource(s=None)
class petl.PopenSource(*args, **kwargs)