Features#
General#
SQL Parser
SQL Query Planner
DataFrame API
Parallel query execution
Streaming Execution
Optimizations#
Query Optimizer
Constant folding
Join Reordering
Limit Pushdown
Projection push down
Predicate push down
SQL Support#
Type coercion
Projection (
SELECT)Filter (
WHERE)Filter post-aggregate (
HAVING)Sorting (
ORDER BY)Limit (
LIMIT)Aggregate (
GROUP BY)cast /try_cast
Aggregate Functions (
SUM,MEDIAN, and many more)Schema Queries
SHOW TABLESSHOW COLUMNS FROM <table/view>SHOW CREATE TABLE <view>Basic SQL Information Schema (
TABLES,VIEWS,COLUMNS)Full SQL Information Schema support
Support for nested types (
ARRAY/LISTandSTRUCT.Read support
Write support
Field access (
col['field']and [col[1]])-
structPostgres JSON operators (
->,->>, etc.)
Subqueries
Common Table Expressions (CTE)
Set Operations (
UNION [ALL],INTERSECT [ALL],EXCEPT[ALL])Joins (
INNER,LEFT,RIGHT,FULL,CROSS)Window Functions
Empty (
OVER())Partitioning and ordering: (
OVER(PARTITION BY <..> ORDER BY <..>))Custom Window (
ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING))User Defined Window and Aggregate Functions
Catalogs
Schemas (
CREATE / DROP SCHEMA)Tables (
CREATE / DROP TABLE,CREATE TABLE AS SELECT)
Data Insert
INSERT INTOCOPY .. INTO ..CSV
JSON
Parquet
Avro
Runtime#
Streaming Grouping
Streaming Window Evaluation
Memory limits enforced
Spilling (to disk) Sort
Spilling (to disk) Grouping
Spilling (to disk) Sort Merge Join
Spilling (to disk) Hash Join
Data Sources#
In addition to allowing arbitrary datasources via the TableProvider
trait, DataFusion includes built in support for the following formats:
CSV
Parquet
Primitive and Nested Types
Row Group and Data Page pruning on min/max statistics
Row Group pruning on Bloom Filters
Predicate push down (late materialization) not by default
JSON
Avro
Arrow