Introduction¶
We welcome and encourage contributions of all kinds, such as:
Tickets with issue reports of feature requests
Documentation improvements
Code, both PR and (especially) PR Review.
In addition to submitting new PRs, we have a healthy tradition of community members reviewing each other’s PRs. Doing so is a great way to help the community as well as get more familiar with Rust and the relevant codebases.
How to develop¶
This assumes that you have rust and cargo installed. We use the workflow recommended by pyo3 and maturin.
Bootstrap:
# fetch this repo
git clone git@github.com:apache/arrow-datafusion-python.git
# prepare development environment (used to build wheel / install in development)
python3 -m venv venv
# activate the venv
source venv/bin/activate
# update pip itself if necessary
python -m pip install -U pip
# install dependencies (for Python 3.8+)
python -m pip install -r requirements-310.txt
The tests rely on test data in git submodules.
git submodule init
git submodule update
Whenever rust code changes (your changes or via git pull):
# make sure you activate the venv using "source venv/bin/activate" first
maturin develop
python -m pytest
Running & Installing pre-commit hooks¶
arrow-datafusion-python takes advantage of pre-commit to assist developers with code linting to help reduce the number of commits that ultimately fail in CI due to linter errors. Using the pre-commit hooks is optional for the developer but certainly helpful for keeping PRs clean and concise.
Our pre-commit hooks can be installed by running pre-commit install
, which will install the configurations in your ARROW_DATAFUSION_PYTHON_ROOT/.github directory and run each time you perform a commit, failing to complete the commit if an offending lint is found allowing you to make changes locally before pushing.
The pre-commit hooks can also be run adhoc without installing them by simply running pre-commit run --all-files
Guidelines for Separating Python and Rust Code¶
Version 40 of datafusion-python
introduced python
wrappers around the pyo3
generated code to vastly improve the user experience. (See the blog post and pull request for more details.)
Mostly, the python
code is limited to pure wrappers with type hints and good docstrings, but there are a few reasons for when the code does more:
Trivial aliases like
array_append()
andlist_append()
.Simple type conversion, like from a
path
to astring
of the path or fromnumber
tolit(number)
.The additional code makes an API much more pythonic, like we do for
named_struct()
(see source code).
Update Dependencies¶
To change test dependencies, change the requirements.in and run
# install pip-tools (this can be done only once), also consider running in venv
python -m pip install pip-tools
python -m piptools compile --generate-hashes -o requirements-310.txt
To update dependencies, run with -U
python -m piptools compile -U --generate-hashes -o requirements-310.txt
More details about pip-tools here