Configuration¶

Let’s look at how we can configure DataFusion. When creating a SessionContext, you can pass in a SessionConfig and RuntimeEnvBuilder object. These two cover a wide range of options.

from datafusion import RuntimeEnvBuilder, SessionConfig, SessionContext

# create a session context with default settings
ctx = SessionContext()
print(ctx)

# create a session context with explicit runtime and config settings
runtime = RuntimeEnvBuilder().with_disk_manager_os().with_fair_spill_pool(10000000)
config = (
    SessionConfig()
    .with_create_default_catalog_and_schema(True)
    .with_default_catalog_and_schema("foo", "bar")
    .with_target_partitions(8)
    .with_information_schema(True)
    .with_repartition_joins(False)
    .with_repartition_aggregations(False)
    .with_repartition_windows(False)
    .with_parquet_pruning(False)
    .set("datafusion.execution.parquet.pushdown_filters", "true")
)
ctx = SessionContext(config, runtime)
print(ctx)

You can read more about available SessionConfig options in the rust DataFusion Configuration guide, and about RuntimeEnvBuilder options in the rust online API documentation.

Custom Table Provider

SQL