HOWTOs#
How to update the version of Rust used in CI tests#
Make a PR to update the rust-toolchain file in the root of the repository.
Adding new functions#
Implementation
Function type |
Location to implement |
Trait to implement |
Macros to use |
Example |
|---|---|---|---|---|
Scalar |
|
|||
Nested |
|
|||
Aggregate |
|
|
||
Window |
|
|||
Table |
|
The macros are to simplify some boilerplate such as ensuring a DataFrame API compatible function is also created
Ensure new functions are properly exported through the subproject
mod.rsorlib.rs.Functions should preferably provide documentation via the
#[user_doc(...)]attribute so their documentation can be included in the SQL reference documentation (see below section)Scalar functions are further grouped into modules for families of functions (e.g. string, math, datetime). Functions should be added to the relevant module; if a new module needs to be created then a new Rust feature should also be added to allow DataFusion users to conditionally compile the modules as needed
Aggregate functions can optionally implement a
GroupsAccumulatorfor better performance
Spark compatible functions are located in separate crate but otherwise follow the same steps, though all function types (e.g. scalar, nested, aggregate) are grouped together in the single location.
Testing
Prefer adding sqllogictest integration tests where the function is called via SQL against
well known data and returns an expected result. See the existing test files if
there is an appropriate file to add test cases to, otherwise create a new file. See the
sqllogictest documentation for details on how to construct these tests.
Ensure edge case, null input cases are considered in these tests.
If a behaviour cannot be tested via sqllogictest (e.g. testing simplify(), needs to be
tested in isolation from the optimizer, difficult to construct exact input via sqllogictest)
then tests can be added as Rust unit tests in the implementation module, though these should be
kept minimal where possible
Documentation
Run documentation update script ./dev/update_function_docs.sh which will update the relevant
markdown document here (see the documents for scalar,
aggregate and window functions)
You should not manually update the markdown document after running the script as those manual changes would be overwritten on next execution
Reference GitHub issue which introduced this behaviour
How to display plans graphically#
The query plans represented by LogicalPlan nodes can be graphically
rendered using Graphviz.
To do so, save the output of the display_graphviz function to a file.:
// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());
Then, use the dot command line tool to render it into a file that
can be displayed. For example, the following command creates a
/tmp/plan.pdf file:
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
How to format .md documents#
We use prettier to format .md files.
You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary.
Using npx requires a working node environment. Upgrading to the latest prettier is recommended (by adding
--upgrade to the npm command).
$ prettier --version
2.3.0
After you’ve confirmed your prettier version, you can format all the .md files:
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
How to format .toml files#
We use taplo to format .toml files.
To install via cargo:
cargo install taplo-cli --locked
Refer to the taplo installation documentation for other ways to install it.
$ taplo --version
taplo 0.9.0
After you’ve confirmed your taplo version, you can format all the .toml files:
taplo fmt
How to update protobuf/gen dependencies#
For the proto and proto-common crates, the prost/tonic code is generated by running their respective ./regen.sh scripts,
which in turn invokes the Rust binary located in ./gen.
This is necessary after modifying the protobuf definitions or altering the dependencies of ./gen, and requires a
valid installation of protoc (see installation instructions for details).
# From repository root
# proto-common
./datafusion/proto-common/regen.sh
# proto
./datafusion/proto/regen.sh