Class SessionContext
- All Implemented Interfaces:
AutoCloseable
Instances are not thread-safe. Concurrent calls to any of sql(java.lang.String),
registerParquet(java.lang.String, java.lang.String), or close() from different threads can produce a use-after-free
on the native side. Callers must externally synchronize, or confine each context to a single
thread.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic SessionContextBuilderbuilder()Start configuring aSessionContext.voidclose()voidderegisterTable(String name) Removes the table with the given name from this session.fromProto(byte[] planBytes) Decode a DataFusion-ProtoLogicalPlanNodeand return a lazyDataFrame.fromSubstrait(byte[] planBytes) Read the current value of adatafusion.*config key.Snapshot the session's memory pool: bytes currently held and the peak observed since this session was created.Read an Arrow IPC file as aDataFramewithout registering it.readArrow(String path, ArrowReadOptions options) Read an Arrow IPC file as aDataFramewith the suppliedArrowReadOptions.Read an Avro file as aDataFramewithout registering it.readAvro(String path, AvroReadOptions options) Read an Avro file as aDataFramewith the suppliedAvroReadOptions.Read a CSV file as aDataFramewithout registering it.readCsv(String path, CsvReadOptions options) Read a CSV file as aDataFramewith the suppliedCsvReadOptions.Read a newline-delimited JSON file as aDataFramewithout registering it.readJson(String path, NdJsonReadOptions options) Read a newline-delimited JSON file as aDataFramewith the suppliedNdJsonReadOptions.readParquet(String path) Read a parquet file as aDataFramewithout registering it.readParquet(String path, ParquetReadOptions options) Read a parquet file as aDataFramewith the suppliedParquetReadOptions.voidregisterArrow(String name, String path) Register an Arrow IPC file (or directory of Arrow IPC files) as a table.voidregisterArrow(String name, String path, ArrowReadOptions options) Register an Arrow IPC file (or directory of Arrow IPC files) as a table with the suppliedArrowReadOptions.voidregisterAvro(String name, String path) Register an Avro file (or directory of Avro files) as a table.voidregisterAvro(String name, String path, AvroReadOptions options) Register an Avro file (or directory of Avro files) as a table with the suppliedAvroReadOptions.voidregisterCsv(String name, String path) voidregisterCsv(String name, String path, CsvReadOptions options) Register a CSV file (or directory of CSV files) as a table with the suppliedCsvReadOptions.voidregisterJson(String name, String path) voidregisterJson(String name, String path, NdJsonReadOptions options) Register a newline-delimited JSON file (or directory of NDJSON files) as a table with the suppliedNdJsonReadOptions.voidregisterParquet(String name, String path) voidregisterParquet(String name, String path, ParquetReadOptions options) Register a parquet file as a table with the suppliedParquetReadOptions.voidregisterTable(String name, TableProvider provider) Register a Java-implementedTableProviderundername.voidregisterUdf(ScalarUdf udf) Register a Java-implemented scalar UDF.Snapshot operational counters from the underlying Tokio runtime: worker count, busy time, queue depth, etc.Parse and planquery, returning a lazyDataFrame.booleantableExists(String name) Returnstrueif a table with the given name is registered in this session.org.apache.arrow.vector.types.pojo.SchematableSchema(String tableName) Return the ArrowSchemaof a registered table.
-
Constructor Details
-
SessionContext
public SessionContext()
-
-
Method Details
-
builder
Start configuring aSessionContext. -
sql
Parse and planquery, returning a lazyDataFrame. The query is not executed untilDataFrame.collect(org.apache.arrow.memory.BufferAllocator)is called. -
fromProto
Decode a DataFusion-ProtoLogicalPlanNodeand return a lazyDataFrame. The plan is not executed untilDataFrame.collect(org.apache.arrow.memory.BufferAllocator)is called.The bytes must be a serialized
datafusion.LogicalPlanNode(seeorg.apache.datafusion.protobuf.LogicalPlanNode).- Throws:
RuntimeException- if the bytes are not a validLogicalPlanNodeor if logical planning fails.
-
fromSubstrait
Decode a SubstraitPlanmessage and return a lazyDataFrame. The plan is not executed untilDataFrame.collect(org.apache.arrow.memory.BufferAllocator)orDataFrame.executeStream(org.apache.arrow.memory.BufferAllocator)is called.planBytesmust be a serialisedsubstrait.proto.Plan. The plan is translated to a DataFusionDataFrameagainst this context's catalog: any tables referenced by the plan must already be registered (seeregisterCsv(java.lang.String, java.lang.String),registerParquet(java.lang.String, java.lang.String), etc.).This entry point lets Java callers compile plans elsewhere — Calcite via Isthmus, custom planners, or any other Substrait-emitting tool — and hand them to DataFusion without round-tripping through SQL.
Substrait support is gated behind the
substraitCargo feature on the native crate and is off by default. Rebuild the native crate withcargo build -p datafusion-jni --features substrait(or... --features substrait,protocfor hermetic builds that vendorprotocviacmake) to enable it. If invoked against a native binary built without the feature, this method throwsRuntimeExceptionpointing at the flag.- Throws:
IllegalArgumentException- ifplanBytesisnull.IllegalStateException- if this context is closed.RuntimeException- if the bytes are not a validsubstrait.proto.Plan, if Substrait→DataFusion translation fails (e.g. the plan references an unregistered table), or if the native crate was built without thesubstraitfeature.
-
memoryUsage
Snapshot the session's memory pool: bytes currently held and the peak observed since this session was created. Thread-safe; can be polled while queries run.For multi-tenant attribution, place each tenant in its own
SessionContext. Within a single session, the snapshot is the sum across all in-flight queries -- there is no per-DataFrame breakdown today.The session's
MemoryPoolis wrapped transparently with a tracking adapter at construction time; the wrapper layers on top of whatever poolSessionContextBuilder.memoryLimit(long, double)produced (or DataFusion's default unbounded pool) and does not change pool semantics (limits, eviction, spilling).What this counts: bytes reserved against the
MemoryPool-- operator state for sorts, hash joins, aggregates, repartition buffers, and anything else that uses DataFusion'sMemoryReservationmachinery during execution.What this does not count: memory held outside the pool, including record-batch buffers materialised by
DataFrame.cache()(stored in an in-memoryMemTableas plainVec<RecordBatch>with no reservation), record-batch buffers that have crossed the FFI boundary into Arrow's Java allocator, and JVM-side allocations. Operator-level reservations are released as the plan unwinds, so a query that runs to completion typically returnscurrentBytesto ~0 even if the result set is large.- Throws:
IllegalStateException- if this context is closed.RuntimeException- if the native side has not registered a tracker for this handle (should not happen in practice -- tracker registration is done by the constructor).
-
runtimeStats
Snapshot operational counters from the underlying Tokio runtime: worker count, busy time, queue depth, etc. Thread-safe; can be polled while queries run.The runtime is process-wide rather than per-session because the JNI library drives a single shared multi-threaded Tokio runtime. The
SessionContexthandle is checked only to ensure the caller still has a live session; the values returned are not session-specific.Requires the
runtime-metricsCargo feature on the native crate (off by default). Rebuild with:RUSTFLAGS="--cfg tokio_unstable" cargo build -p datafusion-jni --features runtime-metricsIf invoked against a native binary built without the feature, this method throws
RuntimeExceptionwith a message pointing at the rebuild command.- Throws:
IllegalStateException- if this context is closed.RuntimeException- if the native crate was built without theruntime-metricsfeature.
-
tableSchema
Return the ArrowSchemaof a registered table. Transferred via Arrow IPC; noBufferAllocatoris required because a schema carries no buffer data.- Throws:
RuntimeException- iftableNameis not registered in this context.
-
getOption
Read the current value of adatafusion.*config key. The key must be one DataFusion recognises (seeSessionContextBuilder.setOption(String, String)for examples and the upstream configuration reference for the full list).datafusion.runtime.*keys (memory limit, temp directory, cache sizes, etc) are not yet supported by this getter and will throw. Use the typedSessionContextBuilder.memoryLimit(long, double)andSessionContextBuilder.tempDirectory(String)setters at construction time instead. Round-trip support for the runtime subtree is tracked as a follow-up.- Returns:
- the current value as a string, or
nullif the key is recognised but has no value set and no default. - Throws:
IllegalArgumentException- ifkeyisnull.RuntimeException- if the key is not recognised by DataFusion or is in thedatafusion.runtime.*subtree.IllegalStateException- if this context is closed.
-
registerCsv
-
registerCsv
Register a CSV file (or directory of CSV files) as a table with the suppliedCsvReadOptions.- Throws:
RuntimeException- if registration fails (path not found, schema inference error, etc.).
-
readCsv
Read a CSV file as aDataFramewithout registering it. -
readCsv
Read a CSV file as aDataFramewith the suppliedCsvReadOptions.- Throws:
RuntimeException- if the read fails.
-
registerJson
-
registerJson
Register a newline-delimited JSON file (or directory of NDJSON files) as a table with the suppliedNdJsonReadOptions.- Throws:
RuntimeException- if registration fails (path not found, schema inference error, etc.).
-
readJson
Read a newline-delimited JSON file as aDataFramewithout registering it. -
readJson
Read a newline-delimited JSON file as aDataFramewith the suppliedNdJsonReadOptions.- Throws:
RuntimeException- if the read fails.
-
registerParquet
-
registerParquet
Register a parquet file as a table with the suppliedParquetReadOptions.- Throws:
RuntimeException- if registration fails (path not found, schema mismatch, etc.).
-
readParquet
Read a parquet file as aDataFramewithout registering it. -
readParquet
Read a parquet file as aDataFramewith the suppliedParquetReadOptions.- Throws:
RuntimeException- if the read fails.
-
registerArrow
Register an Arrow IPC file (or directory of Arrow IPC files) as a table. -
registerArrow
Register an Arrow IPC file (or directory of Arrow IPC files) as a table with the suppliedArrowReadOptions.- Throws:
IllegalArgumentException- if any ofname,path, oroptionsisnull.RuntimeException- if registration fails (path not found, schema mismatch, etc.).
-
readArrow
Read an Arrow IPC file as aDataFramewithout registering it. -
readArrow
Read an Arrow IPC file as aDataFramewith the suppliedArrowReadOptions.- Throws:
IllegalArgumentException- ifpathoroptionsisnull.RuntimeException- if the read fails.
-
registerAvro
Register an Avro file (or directory of Avro files) as a table. -
registerAvro
Register an Avro file (or directory of Avro files) as a table with the suppliedAvroReadOptions.- Throws:
IllegalArgumentException- if any ofname,path, oroptionsisnull.RuntimeException- if registration fails (path not found, schema mismatch, etc.).
-
readAvro
Read an Avro file as aDataFramewithout registering it. -
readAvro
Read an Avro file as aDataFramewith the suppliedAvroReadOptions.- Throws:
IllegalArgumentException- ifpathoroptionsisnull.RuntimeException- if the read fails.
-
registerUdf
Register a Java-implemented scalar UDF. After registration, the function can be invoked by SQL via the UDF's name or referenced in DataFusion plans deserialised withfromProto(byte[]).The UDF is registered with an exact signature: the runtime will reject calls whose argument types do not match the declared
ScalarFunction.argFields()exactly.- Throws:
RuntimeException- if registration fails (e.g., name already registered with an incompatible signature, schema serialisation failure).
-
registerTable
Register a Java-implementedTableProviderundername. SQL queries that referencenamecall back intoproviderto fetch batches.TableProvider.schema()is called once here, on the calling thread, and cached on the native side.TableProvider.scan(org.apache.arrow.memory.BufferAllocator)is called once per query that touches the table, on a Tokio worker thread; it must return a fresh, independentArrowReaderon every call, with its buffers allocated from theBufferAllocatorthe framework supplies.This is the Java counterpart to DataFusion's Rust
SessionContext::register_table.- Throws:
IllegalArgumentException- ifnameorproviderisnull.IllegalStateException- ifprovider.schema()returnsnull, or this context is closed.RuntimeException- if native registration fails.
-
tableExists
Returnstrueif a table with the given name is registered in this session.This is the Java counterpart to DataFusion's Rust
SessionContext::table_exist.- Throws:
IllegalStateException- if this context is closed.
-
deregisterTable
Removes the table with the given name from this session. Does nothing if no table with that name is registered.This is the Java counterpart to DataFusion's Rust
SessionContext::deregister_table.- Throws:
IllegalStateException- if this context is closed.
-
close
public void close()- Specified by:
closein interfaceAutoCloseable
-