- Added class
petl.io.remotes.RemoteSourceusing package fsspec for reading and writing files in remote servers by using the protocol in the url for selecting the implementation. By @juarezr, #494.
- Removed classes
petl.io.source.s3.S3Sourceas it’s handled by fsspec By @juarezr, #494.
- Removed classes
petl.io.codec.zstd.ZstandardCodecas it’s handled by fsspec. By @juarezr, #494.
- Fix bug in connection to a JDBC database using jaydebeapi. By @miguelosana, #497.
- Added functions
petl.io.sources.register_writer()for registering custom source helpers for hanlding I/O from remote protocols. By @juarezr, #491.
- Added function
petl.io.sources.register_codec()for registering custom helpers for compressing and decompressing files with other algorithms. By @juarezr, #491.
- Added classes
petl.io.codec.zstd.ZstandardCodecfor compressing files with XZ and the “state of art” LZ4 and Zstandard algorithms. By @juarezr, #491.
- Added classes
petl.io.source.smb.SMBSourcereading and writing files to remote servers using int url the protocols s3:// and smb://. By @juarezr, #491.
The parameters to the
petl.io.xlsx.fromxlsx() function have changed
in this release. The parameters
col_offset are no longer
supported. Please use
- A new configuration option failonerror has been added to the
petl.configmodule. This option affects various transformation functions including
petl.transform.maps.rowmapmany(). The option can have values True (raise any exceptions encountered during conversion), False (silently use a given errorvalue if any exceptions arise during conversion) or “inline” (use any exceptions as the output value). The default value is False which maintains compatibility with previous releases. By @bmaggard, #460, #406, #365.
- A new function
petl.util.timing.log_progress()has been added, which behaves in a similar way to
petl.util.timing.progress()but writes to a Python logger. By @dusktreader, #408, #407.
- Added new function
petl.transform.regex.splitdown()for splitting a value into multiple rows. By @John-Dennert, #430, #386.
- Added new function
petl.transform.basics.addfields()to add multiple new fields at a time. By @mjumbewu, #417.
- Pass through keyword arguments to
xlrd.open_workbook(). By @gjunqueira, #470, #473.
- Added new function
petl.io.xlsx.appendxlsx(). By @victormpa and @alimanfoo, #424, #475.
- Fixes for upstream API changes in openpyxl and intervaltree modules. N.B., the arguments
petl.io.xlsx.fromxlsx()have changed for specifying row and column offsets to match openpyxl. (#472 - @alimanfoo).
- Exposed read_only argument in
petl.io.xlsx.fromxlsx()and set default to False to prevent truncation of files created by LibreOffice. By @mbelmadani, #457.
- Added support for reading from remote sources with gzip or bz2 compression (#463 - @H-Max).
- The function
petl.transform.dedup.distinct()has been fixed for the case where
Nonevalues appear in the table. By @bmaggard, #414, #412.
- Changed keyed sorts so that comparisons are only by keys. By @DiegoEPaez, #466.
- Documentation improvements by @gamesbook (#458).
petl.transform.basics.addrownumbers()now supports a “field” argument to allow specifying the name of the new field to be added (#366, #367 - @thatneat).
- Fix to
petl.io.xlsx.fromxslx()to ensure that the underlying workbook is closed after iteration is complete (#387 - @mattkatz).
- Resolve compatibility issues with newer versions of openpyxl (#393, #394 - @henryrizzi).
- Fix deprecation warnings from openpyxl (#447, #445 - @scardine; #449 - @alimanfoo).
- Changed exceptions to use standard exception classes instead of ArgumentError (#396 - @bmaggard).
- Add support for non-numeric quoting in CSV files (#377, #378 - @vilos).
- Fix bug in handling of mode in MemorySource (#403 - @bmaggard).
- Added a get() method to the Record class (#401, #402 - @dusktreader).
- Added ability to make constraints optional, i.e., support validation on optional fields (#399, #400 - @dusktreader).
- Added support for CSV files without a header row (#421 - @LupusUmbrae).
- Documentation fixes (#379 - @DeanWay; #381 - @PabloCastellano).
petl.transform.reshape.melt()to work with non-string key argument (#209).
- Added example to docstring of
petl.transform.dedup.conflicts()to illustrate how to analyse the source of conflicts when rows are merged from multiple tables (#256).
- Added functions for working with bcolz ctables, see
- Added example in docstring for
- Added function
petl.transform.basics.stack()as a simpler alternative to
petl.transform.basics.cat(). Also behaviour of
petl.transform.basics.cat()has changed for tables where the header row contains duplicate fields. This was part of addressing a bug in
petl.transform.basics.addfield()for tables where the header contains duplicate fields (#327).
- Change in behaviour of
petl.io.json.fromdicts()to preserve ordering of keys if ordered dicts are used. Also added
petl.transform.headers.sortheader()to deal with unordered cases (#332).
- Added keyword strict to functions in the
petl.transform.setopsmodule to enable users to enforce strict set-like behaviour if desired (#333).
- Added epilogue argument to
petl.util.vis.display()to enable further customisation of content of table display in Jupyter notebooks (#337).
petl.transform.selects.biselect()as a convenience for obtaining two tables, one with rows matching a condition, the other with rows not matching the condition (#339).
petl.io.json.fromdicts()to avoid making two passes through the data (#341).
petl.transform.basics.addfieldusingcontext()to enable running calculations (#343).
- Fix behaviour of join functions when tables have no non-key fields (#345).
- Fix incorrect default value for ‘errors’ argument when using codec module (#347).
- Added some documentation on how to write extension classes, see Introduction (#349).
- Fix issue with unicode field names (#350).
Version 1.0 is a new major release of
petl. The main purpose of
version 1.0 is to introduce support for Python 3.4, in addition to the
existing support for Python 2.6 and 2.7. Much of the functionality
petl versions 0.x has remained unchanged in
version 1.0, and most existing code that uses
petl should work
unchanged with version 1.0 or with minor changes. However there have
been a number of API changes, and some functionality has been migrated
from the petlx package, described below.
If you have any questions about migrating to version 1.0 or find any problems or issues please email email@example.com.
Text file encoding¶
Version 1.0 unifies the API for working with ASCII and non-ASCII encoded text files, including CSV and HTML.
The following functions now accept an ‘encoding’ argument, which
defaults to the value of
‘utf-8’): fromcsv, tocsv, appendcsv, teecsv, fromtsv,
totsv, appendtsv, teetsv, fromtext, totext, appendtext,
The following functions have been removed as they are now redundant: fromucsv, toucsv, appenducsv, teeucsv, fromutsv, toutsv, appendutsv, teeutsv, fromutext, toutext, appendutext, touhtml, teeuhtml.
To migrate code, in most cases it should be possible to simply replace ‘fromucsv’ with ‘fromcsv’, etc.
pelt.fluent and petl.interactive¶
The functionality previously available through the petl.fluent and petl.interactive modules is now available through the root petl module.
This means two things.
First, is is now possible to use either functional or fluent (i.e.,
object-oriented) styles of programming with the root
module, as described in introductory section on
Functional and object-oriented programming styles.
Second, the default representation of table objects uses the
petl.util.vis.look() function, so you can simply return a table
from the prompt to inspect it, as described in the introductory
section on Interactive use.
The petl.fluent and petl.interactive modules have been removed as they are now redundant.
To migrate code, it should be possible to simply replace “import petl.fluent as etl” or “import petl.interactive as etl” with “import petl as etl”.
Note that the automatic caching behaviour of the petl.interactive
module has not been retained. If you want to enable caching
behaviour for a particular table, make an explicit call to the
petl.util.materialise.cache() function. See also
IPython notebook integration¶
In version 1.0
petl table container objects implement
_repr_html_() so can be returned from a cell in an IPython notebook
and will automatically format as an HTML table.
petl.util.vis.displayall() functions have been migrated across
from the petlx.ipython package. If you are working within the
IPython notebook these functions give greater control over how tables
are rendered. For some examples, see:
Database extract/load functions¶
petl.io.db.todb() function now supports automatic table
creation, inferring a schema from data in the table to be loaded. This
functionality has been migrated across from the petlx package, and
requires SQLAlchemy to be installed.
The functions fromsqlite3, tosqlite3 and appendsqlite3 have been
removed as they duplicate functionality available from the existing
petl.io.db.appenddb(). These existing functions have been
modified so that if a string is provided as the dbo argument it is
interpreted as the name of an
sqlite3 file. It should be
possible to migrate code by simply replacing ‘fromsqlite3’ with
Other functions removed or renamed¶
The following functions have been removed because they are overly complicated and/or hardly ever used. If you use any of these functions and would like to see them re-instated then please email firstname.lastname@example.org: rangefacet, rangerowreduce, rangeaggregate, rangecounts, multirangeaggregate, lenstats.
The following functions were marked as deprecated in petl 0.x and have
been removed in version 1.0: dataslice (use data instead),
fieldconvert (use convert instead), fieldselect (use select instead),
parsenumber (use numparser instead), recordmap (use rowmap instead),
recordmapmany (use rowmapmany instead), recordreduce (use rowreduce
instead), recordselect (use rowselect instead), valueset (use
The following functions are no longer available in the root
petl namespace, but are still available from a subpackage if
you really need them: iterdata (use data instead), iterdicts
(use dicts instead), iternamedtuples (use namedtuples instead),
iterrecords (use records instead), itervalues (use values
The following functions have been renamed: isordered (renamed to issorted), StringSource (renamed to MemorySource).
The function selectre has been removed as it duplicates functionality, use search instead.
Sorting and comparison¶
A major difference between Python 2 and Python 3 involves comparison
and sorting of objects of different types. Python 3 is a lot stricter
about what you can compare with what, e.g.,
None < 1 < 'foo' works
in Python 2.x but raises an exception in Python 3. The strict
comparison behaviour of Python 3 is generally a problem for typical
petl, where data can be highly heterogeneous and a
column in a table may have a mixture of values of many different
types, including None for missing.
To maintain the usability of
petl in this type of scenario, and
to ensure that the behaviour of
petl is as consistent as
possible across different Python versions, the
petl.transform.sorts.sort() function and anything that depends
on it (as well as any other functions making use of rich comparisons)
emulate the relaxed comparison behaviour that is available under
Python 2.x. In fact
petl goes further than this, allowing
comparison of a wider range of types than is possible under Python 2.x
As the underlying code to achieve this has been completely reworked,
there may be inconsistencies or unexpected behaviour, so it’s worth
testing carefully the results of any code previously run using
petl 0.x, especially if you are also migrating from Python 2 to
The different comparison behaviour under different Python versions may also give unexpected results when selecting rows of a table. E.g., the following will work under Python 2.x but raise an exception under Python 3.4:
>>> import petl as etl >>> table = [['foo', 'bar'], ... ['a', 1], ... ['b', None]] >>> # raises exception under Python 3 ... etl.select(table, 'bar', lambda v: v > 0)
To get the more relaxed behaviour under Python 3.4,
petl.transform.selects.selectgt function, or wrap
>>> # works under Python 3 ... etl.selectgt(table, 'bar', 0) +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ >>> # or ... ... etl.select(table, 'bar', lambda v: v > etl.Comparable(0)) +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+
New extract/load modules¶
Several new extract/load modules have been added, migrating functionality previously available from the petlx package:
- Excel .xls files (xlrd/xlwt)
- Excel .xlsx files (openpyxl)
- Arrays (NumPy)
- DataFrames (pandas)
- HDF5 files (PyTables)
- Text indexes (Whoosh)
These modules all have dependencies on third party packages, but these
have been kept as optional dependencies so are not required for
New validate function¶
petl.transform.validation.validate() function has been
added to provide a convenient interface when validating a table
against a set of constraints.
New intervals module¶
A new module has been added providing transformation functions based on intervals, migrating functionality previously available from the petlx package:
This module requires the intervaltree module.
New configuration module¶
All configuration variables have been brought together into a new
petl.config module. See the source code for the variables
available, they should be self-explanatory.
petl.push moved to
petl.push module remains in an experimental state and has
been moved to the petlx extensions project.
Argument names and other minor changes¶
Argument names for a small number of functions have been changed to create consistency across the API.
There are some other minor changes as well. If you are migrating from
petl version 0.x the best thing is to run your code and inspect
any errors. Email email@example.com if you have any