Features#
General#
- SQL Parser 
- SQL Query Planner 
- DataFrame API 
- Parallel query execution 
- Streaming Execution 
Optimizations#
- Query Optimizer 
- Constant folding 
- Join Reordering 
- Limit Pushdown 
- Projection push down 
- Predicate push down 
SQL Support#
- Type coercion 
- Projection ( - SELECT)
- Filter ( - WHERE)
- Filter post-aggregate ( - HAVING)
- Sorting ( - ORDER BY)
- Limit ( - LIMIT)
- Aggregate ( - GROUP BY)
- cast /try_cast 
- Aggregate Functions ( - SUM,- MEDIAN, and many more)
- Schema Queries - SHOW TABLES
- SHOW COLUMNS FROM <table/view>
- SHOW CREATE TABLE <view>
- Basic SQL Information Schema ( - TABLES,- VIEWS,- COLUMNS)
- Full SQL Information Schema support 
 
- Support for nested types ( - ARRAY/- LISTand- STRUCT.- Read support 
- Write support 
- Field access ( - col['field']and [- col[1]])
- 
- struct
- Postgres JSON operators ( - ->,- ->>, etc.)
 
 
- Subqueries 
- Common Table Expressions (CTE) 
- Set Operations ( - UNION [ALL],- INTERSECT [ALL],- EXCEPT[ALL])
- Joins ( - INNER,- LEFT,- RIGHT,- FULL,- CROSS)
- Window Functions - Empty ( - OVER())
- Partitioning and ordering: ( - OVER(PARTITION BY <..> ORDER BY <..>))
- Custom Window ( - ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING))
- User Defined Window and Aggregate Functions 
 
- Catalogs - Schemas ( - CREATE / DROP SCHEMA)
- Tables ( - CREATE / DROP TABLE,- CREATE TABLE AS SELECT)
 
- Data Insert - INSERT INTO
- COPY .. INTO ..
- CSV 
- JSON 
- Parquet 
- Avro 
 
Runtime#
- Streaming Grouping 
- Streaming Window Evaluation 
- Memory limits enforced 
- Spilling (to disk) Sort 
- Spilling (to disk) Grouping 
- Spilling (to disk) Sort Merge Join 
- Spilling (to disk) Hash Join 
Data Sources#
In addition to allowing arbitrary datasources via the TableProvider
trait, DataFusion includes built in support for the following formats:
- CSV 
- Parquet - Primitive and Nested Types 
- Row Group and Data Page pruning on min/max statistics 
- Row Group pruning on Bloom Filters 
- Predicate push down (late materialization) not by default 
 
- JSON 
- Avro 
- Arrow