HOWTOs

How to add a new scalar function

Below is a checklist of what you need to do to add a new scalar function to DataFusion:

  • Add the actual implementation of the function to a new module file within:

    • here for arrays, maps and structs functions

    • here for crypto functions

    • here for datetime functions

    • here for encoding functions

    • here for math functions

    • here for regex functions

    • here for string functions

    • here for unicode functions

    • create a new module here for other functions.

  • New function modules - for example a vector module, should use a rust feature (for example vector_expressions) to allow DataFusion users to enable or disable the new module as desired.

  • The implementation of the function is done via implementing ScalarUDFImpl trait for the function struct.

    • See the advanced_udf.rs example for an example implementation

    • Add tests for the new function

  • To connect the implementation of the function add to the mod.rs file:

    • a mod xyz; where xyz is the new module file

    • a call to make_udf_function!(..);

    • an item in export_functions!(..);

  • In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.

    • Documentation for sqllogictest here

  • Add SQL reference documentation here

    • An example of this being done can be seen here

    • Run ./dev/update_function_docs.sh to update docs

How to add a new aggregate function

Below is a checklist of what you need to do to add a new aggregate function to DataFusion:

  • Add the actual implementation of an Accumulator and AggregateExpr:

  • In datafusion/expr/src, add:

    • a new variant to AggregateFunction

    • a new entry to FromStr with the name of the function as called by SQL

    • a new line in return_type with the expected return type of the function, given an incoming type

    • a new line in signature with the signature of the function (number and types of its arguments)

    • a new line in create_aggregate_expr mapping the built-in to the implementation

    • tests to the function.

  • In sqllogictest/test_files, add new sqllogictest integration tests where the function is called through SQL against well known data and returns the expected result.

    • Documentation for sqllogictest here

  • Add SQL reference documentation here

    • An example of this being done can be seen here

    • Run ./dev/update_function_docs.sh to update docs

How to display plans graphically

The query plans represented by LogicalPlan nodes can be graphically rendered using Graphviz.

To do so, save the output of the display_graphviz function to a file.:

// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());

Then, use the dot command line tool to render it into a file that can be displayed. For example, the following command creates a /tmp/plan.pdf file:

dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf

How to format .md document

We are using prettier to format .md files.

You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).

$ prettier --version
2.3.0

After you’ve confirmed your prettier version, you can format all the .md files:

prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md

How to format .toml files

We use taplo to format .toml files.

For Rust developers, you can install it via:

cargo install taplo-cli --locked

Refer to the Installation section on other ways to install it.

$ taplo --version
taplo 0.9.0

After you’ve confirmed your taplo version, you can format all the .toml files:

taplo fmt

How to update protobuf/gen dependencies

The prost/tonic code can be generated by running ./regen.sh, which in turn invokes the Rust binary located in gen

This is necessary after modifying the protobuf definitions or altering the dependencies of gen, and requires a valid installation of protoc (see installation instructions for details).

./regen.sh