Spark-Compatible Functions¶
Ballista provides an optional spark-compat Cargo feature that enables Spark-compatible scalar, aggregate, and window functions from the datafusion-spark crate.
Enabling the Feature¶
The spark-compat feature must be explicitly enabled at build time. It is not enabled by default.
Building from Source¶
To build Ballista components with Spark-compatible functions:
# Build all components with spark-compat feature
cargo build --features spark-compat --release
# Build scheduler only
cargo build -p ballista-scheduler --features spark-compat --release
# Build executor only
cargo build -p ballista-executor --features spark-compat --release
# Build CLI with spark-compat
cargo build -p ballista-cli --features spark-compat --release
For more installation options, see Installing with Cargo.
What’s Included¶
When the spark-compat feature is enabled, Ballista’s function registry automatically includes additional functions from the datafusion-spark crate:
Note: For a comprehensive list of available functions, refer to the datafusion-spark crate documentation. These functions are provided in addition to DataFusion’s default functions.
Scalar Functions¶
Spark-compatible scalar functions provide additional string, mathematical, and cryptographic operations.
Aggregate Functions¶
Spark-compatible aggregate functions extend DataFusion’s built-in aggregations with additional statistical and analytical functions.
Window Functions¶
Spark-compatible window functions provide additional analytical capabilities for windowed operations.
Usage Examples¶
Once the spark-compat feature is enabled at build time, the functions are automatically available in SQL queries:
Example 1: Using SHA-1 Hash Function¶
SELECT sha1('Ballista') AS hash_value;
Output:
+------------------------------------------+
| hash_value |
+------------------------------------------+
| 8b8e1f0e55f8f0e3c7a8... (hex string) |
+------------------------------------------+
Example 2: Using expm1 for Precision¶
SELECT
expm1(0.001) AS precise_value,
exp(0.001) - 1 AS standard_value;
The expm1 function provides better numerical precision for small values compared to computing exp(x) - 1 directly.
Example 3: Combining with DataFusion Functions¶
Spark-compatible functions work alongside DataFusion’s built-in functions:
SELECT
name,
upper(name) AS name_upper, -- DataFusion function
sha1(name) AS name_hash, -- Spark-compat function
length(name) AS name_length -- DataFusion function
FROM users;
Use Cases¶
The spark-compat feature is useful when:
Migrating from Spark: Easing the transition by providing familiar function names and behaviors
Cross-Platform Queries: Writing queries that use similar functions across Spark and Ballista environments
Specific Function Needs: Requiring particular Spark-style functions (like
sha1,conv, etc.) that aren’t in DataFusion’s default setTeam Familiarity: Your team is more familiar with Spark’s function library
See Also¶
datafusion-spark crate - Source of the Spark-compatible functions
Installing with Cargo - Detailed installation instructions