Parquet#
DataFusion Java reads Parquet through two entry points on SessionContext:
registerParquet to expose a file as a named table, and readParquet to
get a DataFrame directly.
Register a table#
ctx.registerParquet("orders", "/path/to/orders.parquet");
try (DataFrame df = ctx.sql("SELECT * FROM orders LIMIT 10")) {
df.show();
}
The file’s footer is read at registration time. The table remains in the
catalog for the lifetime of the SessionContext.
Read a DataFrame directly#
try (DataFrame df = ctx.readParquet("/path/to/orders.parquet")) {
df.show();
}
readParquet skips the catalog and hands back a DataFrame straight
away.
ParquetReadOptions#
Both entry points accept a ParquetReadOptions to tune the underlying
read. Construct one directly and chain setters:
ParquetReadOptions opts = new ParquetReadOptions()
.fileExtension(".parquet");
ctx.registerParquet("orders", "/path/to/orders.parquet", opts);
// or
try (DataFrame df = ctx.readParquet("/path/to/orders.parquet", opts)) {
df.show();
}
The supported setters track what DataFusion exposes on its Rust
ParquetReadOptions builder. Inspect the class on the Java side for the
exact setters available in the version you are using.