Features¶
General¶
SQL Parser
SQL Query Planner
DataFrame API
Parallel query execution
Streaming Execution
Optimizations¶
Query Optimizer
Constant folding
Join Reordering
Limit Pushdown
Projection push down
Predicate push down
SQL Support¶
Type coercion
Projection (
SELECT
)Filter (
WHERE
)Filter post-aggregate (
HAVING
)Sorting (
ORDER BY
)Limit (
LIMIT
Aggregate (
GROUP BY
)cast /try_cast
Aggregate Functions (
SUM
,MEDIAN
, and many more)Schema Queries
SHOW TABLES
SHOW COLUMNS FROM <table/view>
SHOW CREATE TABLE <view>
Basic SQL Information Schema (
TABLES
,VIEWS
,COLUMNS
)Full SQL Information Schema support
Support for nested types (
ARRAY
/LIST
andSTRUCT
.Read support
Write support
Field access (
col['field']
and [col[1]
])-
struct
Postgres JSON operators (
->
,->>
, etc.)
Subqueries
Common Table Expressions (CTE)
Set Operations (
UNION [ALL]
,INTERSECT [ALL]
,EXCEPT[ALL]
)Joins (
INNER
,LEFT
,RIGHT
,FULL
,CROSS
)Window Functions
Empty (
OVER()
)Partitioning and ordering: (
OVER(PARTITION BY <..> ORDER BY <..>)
)Custom Window (
ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING)
)User Defined Window and Aggregate Functions
Catalogs
Schemas (
CREATE / DROP SCHEMA
)Tables (
CREATE / DROP TABLE
,CREATE TABLE AS SELECT
)
Data Insert
INSERT INTO
COPY .. INTO ..
CSV
JSON
Parquet
Avro
Runtime¶
Streaming Grouping
Streaming Window Evaluation
Memory limits enforced
Spilling (to disk) Sort
Spilling (to disk) Grouping
Spilling (to disk) Joins
Data Sources¶
In addition to allowing arbitrary datasources via the TableProvider
trait, DataFusion includes built in support for the following formats:
CSV
Parquet
Primitive and Nested Types
Row Group and Data Page pruning on min/max statistics
Row Group pruning on Bloom Filters
Predicate push down (late materialization) not by default
JSON
Avro
Arrow