Development
If you find the openpyxl project intriguing and want to contribute a new awesome feature, fix a nasty bug or improve the documentation this section will guide you in setting up your development environment.
We will look into the coding standards and version control system workflows used, as well as cloning the openpyxl code to your local machine, setting up a virtual Python environment, running tests and building the documentation.
Getting the source
The source code of openpyxl is hosted on Heptapod as a Mercurial project which you can download using e.g. the GUI client SourceTree by Atlassian. If you prefer working with the command line you can use the following:
$ hg clone https://foss.heptapod.net/openpyxl/openpyxl $ hg up 3.1
Please note that the default branch should never be used for development work. For bug fixes and minor patches you should base your work on the branch of the current release, e.g 3.1. New features should generally be based on the development branch of the next minor version. If in doubt get in touch with the openpyxl development team.
It is worthwhile to add an upstream remote reference to the
original repository to update your fork with the latest changes, by adding
to the ./hg/hgrc
file the following:
[paths]
default = ...
openpyxl-master = https://foss.heptapod.net/openpyxl/openpyxl
You can then grab any new changes using:
$ hg pull openpyxl-master
After that you should create a virtual environment using virtualenv
and install the project requirements and the project itself:
$ cd openpyxl
$ virtualenv openpyxl-env
Activate the environment using:
$ source bin/activate # or ./openpyxl-env/Scripts/activate on Windows
Install the dev and prod dependencies and the package itself using:
(openpyxl-env) $ pip install -U -r requirements.txt
(openpyxl-env) $ pip install -e .
Running tests
Note that contributions to the project without tests will not be accepted.
We use pytest
as the test runner with pytest-cov
for coverage information and
pytest-flakes
for static code analysis.
To run all the tests you need to either execute:
(openpxyl-env) $ pytest -xrf openpyxl # the flags will stop testing at the first error
Or use tox
to run the tests on different Python versions and
configurations:
$ tox openpyxl
Coverage
The goal is 100 % coverage for unit tests - data types and utility functions. Coverage information can be obtained using:
py.test --cov openpyxl
Organisation
Tests should be preferably at package / module level e.g openpyxl/cell
. This
makes testing and getting statistics for code under development easier:
py.test --cov openpyxl/cell openpyxl/cell
Checking XML
Use the openpyxl.tests.helper.compare_xml
function to compare
generated and expected fragments of XML.
Schema validation
When working on code to generate XML it is possible to validate that the generated XML conforms to the published specification. Note, this won’t necessarily guarantee that everything is fine but is preferable to reverse engineering!
Microsoft Tools
Along with the SDK, Microsoft also has a “Productivity Tool” for working with Office OpenXML.
This allows you to quickly inspect or compare whole Excel files. Unfortunately, validation errors contain many false positives. The tool also contain links to the specification and implementers’ notes.
File Support and Specifications
The primary aim of openpyxl is to support reading and writing Microsoft Excel 2010 files. These are zipped OOXML files that are specified by ECMA 376 and ISO 29500.
Where possible we try to support files generated by other libraries or programs, but can’t guarantee it, because often these do not strictly adhere to the above format.
Support of Python Versions
Python 3.6 and upwards are supported
Coding style
We orient ourselves at PEP-8 for the coding style, except when implementing attributes for round tripping. Despite that you are encouraged to use Python data conventions (boolean, None, etc.). Note exceptions from this convention in docstrings.
Contributing
Contributions in the form of pull requests are always welcome. Don’t forget to add yourself to the list of authors!
Branch naming convention
We use a “major.minor.patch” numbering system, ie. 3.1.2. Development branches are named after “major.minor” releases. In general, API change will only happen major releases but there will be exceptions. Always communicate API changes to the mailing list before making them. If you are changing an API try and an implement a fallback (with deprecation warning) for the old behaviour.
The “default branch” is used for releases and always has changes from a development branch merged in. It should never be the target for a pull request.
Pull Requests
Pull requests should be submitted to the current, unreleased development branch. Eg. if the current release is 3.1.2, pull requests should be made to the 3.1 branch. Exceptions are bug fixes to released versions which should be made to the relevant release branch and merged upstream into development.
Please use tox
to test code for different submissions before
making a pull request. This is especially important for picking up problems
across Python versions.
Documentation
Remember to update the documentation when adding or changing features. Check that documentation is syntactically correct.:
tox -e doc
Benchmarking
Benchmarking and profiling are ongoing tasks. Contributions to these are very welcome as we know there is a lot to do.
Memory Use
There is a tox profile for long-running memory benchmarks using the memory_utils package.:
tox -e memory
Pympler
As openpyxl does not include any internal memory benchmarking tools, the
python pympler package was used during the testing of styles to profile the
memory usage in openpyxl.reader.excel.read_style_table()
:
# in openpyxl/reader/style.py
from pympler import muppy, summary
def read_style_table(xml_source):
...
if cell_xfs is not None: # ~ line 47
initialState = summary.summarize(muppy.get_objects()) # Capture the initial state
for index, cell_xfs_node in enumerate(cell_xfs_nodes):
...
table[index] = new_style
finalState = summary.summarize(muppy.get_objects()) # Capture the final state
diff = summary.get_diff(initialState, finalState) # Compare
summary.print_(diff)
pympler.summary.print_()
prints to the console a report of object
memory usage, allowing the comparison of different methods and examination of
memory usage. A useful future development would be to construct a
benchmarking package to measure the performance of different components.