Parquet#

DataFusion Java reads Parquet through two entry points on SessionContext: registerParquet to expose a file as a named table, and readParquet to get a DataFrame directly.

Register a table#

ctx.registerParquet("orders", "/path/to/orders.parquet");

try (DataFrame df = ctx.sql("SELECT * FROM orders LIMIT 10")) {
    df.show();
}

The file’s footer is read at registration time. The table remains in the catalog for the lifetime of the SessionContext.

Read a DataFrame directly#

try (DataFrame df = ctx.readParquet("/path/to/orders.parquet")) {
    df.show();
}

readParquet skips the catalog and hands back a DataFrame straight away.

ParquetReadOptions#

Both entry points accept a ParquetReadOptions to tune the underlying read. Construct one directly and chain setters:

ParquetReadOptions opts = new ParquetReadOptions()
    .fileExtension(".parquet");

ctx.registerParquet("orders", "/path/to/orders.parquet", opts);
// or
try (DataFrame df = ctx.readParquet("/path/to/orders.parquet", opts)) {
    df.show();
}

The supported setters track what DataFusion exposes on its Rust ParquetReadOptions builder. Inspect the class on the Java side for the exact setters available in the version you are using.