# Spark-Compatible Functions Ballista provides an optional `spark-compat` Cargo feature that enables Spark-compatible scalar, aggregate, and window functions from the [datafusion-spark](https://crates.io/crates/datafusion-spark) crate. ## Enabling the Feature The `spark-compat` feature must be explicitly enabled at build time. It is _not_ enabled by default. ### Building from Source To build Ballista components with Spark-compatible functions: ```bash # Build all components with spark-compat feature cargo build --features spark-compat --release # Build scheduler only cargo build -p ballista-scheduler --features spark-compat --release # Build executor only cargo build -p ballista-executor --features spark-compat --release # Build CLI with spark-compat cargo build -p ballista-cli --features spark-compat --release ``` For more installation options, see [Installing with Cargo](deployment/cargo-install.md). ## What's Included When the `spark-compat` feature is enabled, Ballista's function registry automatically includes additional functions from the `datafusion-spark` crate: > **Note:** For a comprehensive list of available functions, refer to the [datafusion-spark crate documentation](https://docs.rs/datafusion-spark/latest/datafusion_spark/). These functions are provided in addition to DataFusion's default functions. ### Scalar Functions Spark-compatible scalar functions provide additional string, mathematical, and cryptographic operations. ### Aggregate Functions Spark-compatible aggregate functions extend DataFusion's built-in aggregations with additional statistical and analytical functions. ### Window Functions Spark-compatible window functions provide additional analytical capabilities for windowed operations. ## Usage Examples Once the `spark-compat` feature is enabled at build time, the functions are automatically available in SQL queries: ### Example 1: Using SHA-1 Hash Function ```sql SELECT sha1('Ballista') AS hash_value; ``` Output: ``` +------------------------------------------+ | hash_value | +------------------------------------------+ | 8b8e1f0e55f8f0e3c7a8... (hex string) | +------------------------------------------+ ``` ### Example 2: Using expm1 for Precision ```sql SELECT expm1(0.001) AS precise_value, exp(0.001) - 1 AS standard_value; ``` The `expm1` function provides better numerical precision for small values compared to computing `exp(x) - 1` directly. ### Example 3: Combining with DataFusion Functions Spark-compatible functions work alongside DataFusion's built-in functions: ```sql SELECT name, upper(name) AS name_upper, -- DataFusion function sha1(name) AS name_hash, -- Spark-compat function length(name) AS name_length -- DataFusion function FROM users; ``` ## Use Cases The `spark-compat` feature is useful when: - **Migrating from Spark**: Easing the transition by providing familiar function names and behaviors - **Cross-Platform Queries**: Writing queries that use similar functions across Spark and Ballista environments - **Specific Function Needs**: Requiring particular Spark-style functions (like `sha1`, `conv`, etc.) that aren't in DataFusion's default set - **Team Familiarity**: Your team is more familiar with Spark's function library ## See Also - [datafusion-spark crate](https://crates.io/crates/datafusion-spark) - Source of the Spark-compatible functions - [Installing with Cargo](deployment/cargo-install.md) - Detailed installation instructions