HOWTOs¶

How to update the version of Rust used in CI tests¶

Make a PR to update the rust-toolchain file in the root of the repository:

How to add a new scalar function¶

Below is a checklist of what you need to do to add a new scalar function to DataFusion:

Add the actual implementation of the function to a new module file within:
- here for arrays, maps and structs functions
- here for crypto functions
- here for datetime functions
- here for encoding functions
- here for math functions
- here for regex functions
- here for string functions
- here for unicode functions
- create a new module here for other functions.
New function modules - for example a vector module, should use a rust feature (for example vector_expressions) to allow DataFusion users to enable or disable the new module as desired.
The implementation of the function is done via implementing ScalarUDFImpl trait for the function struct.
- See the advanced_udf.rs example for an example implementation
- Add tests for the new function
To connect the implementation of the function add to the mod.rs file:
- a mod xyz; where xyz is the new module file
- a call to make_udf_function!(..);
- an item in export_functions!(..);
In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.
- Documentation for sqllogictest here
Add SQL reference documentation here
- An example of this being done can be seen here
- Run ./dev/update_function_docs.sh to update docs

How to add a new aggregate function¶

Below is a checklist of what you need to do to add a new aggregate function to DataFusion:

Add the actual implementation of an Accumulator and AggregateExpr:
In datafusion/expr/src, add:
- a new variant to AggregateFunction
- a new entry to FromStr with the name of the function as called by SQL
- a new line in return_type with the expected return type of the function, given an incoming type
- a new line in signature with the signature of the function (number and types of its arguments)
- a new line in create_aggregate_expr mapping the built-in to the implementation
- tests to the function.
In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.
- Documentation for sqllogictest here
Add SQL reference documentation here
- An example of this being done can be seen here
- Run ./dev/update_function_docs.sh to update docs

How to display plans graphically¶

The query plans represented by LogicalPlan nodes can be graphically rendered using Graphviz.

To do so, save the output of the display_graphviz function to a file.:

// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());

Then, use the dot command line tool to render it into a file that can be displayed. For example, the following command creates a /tmp/plan.pdf file:

dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf

How to format `.md` document¶

We are using prettier to format .md files.

You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).

$ prettier --version
2.3.0

After you’ve confirmed your prettier version, you can format all the .md files:

prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md

How to format `.toml` files¶

We use taplo to format .toml files.

For Rust developers, you can install it via:

cargo install taplo-cli --locked

Refer to the Installation section on other ways to install it.

$ taplo --version
taplo 0.9.0

After you’ve confirmed your taplo version, you can format all the .toml files:

taplo fmt

How to update protobuf/gen dependencies¶

The prost/tonic code can be generated by running ./regen.sh, which in turn invokes the Rust binary located in ./gen

This is necessary after modifying the protobuf definitions or altering the dependencies of ./gen, and requires a valid installation of protoc (see installation instructions for details).

./regen.sh

How to add/edit documentation for UDFs¶

Documentations for the UDF documentations are generated from code (related github issue). To generate markdown run ./update_function_docs.sh.

This is necessary after adding new UDF implementation or modifying existing implementation which requires to update documentation.

./dev/update_function_docs.sh

API health policy

Roadmap