Class SessionContext

java.lang.Object
org.apache.datafusion.SessionContext
All Implemented Interfaces:
AutoCloseable

public final class SessionContext extends Object implements AutoCloseable
A DataFusion session context.

Instances are not thread-safe. Concurrent calls to any of sql(java.lang.String), registerParquet(java.lang.String, java.lang.String), or close() from different threads can produce a use-after-free on the native side. Callers must externally synchronize, or confine each context to a single thread.

  • Constructor Details

    • SessionContext

      public SessionContext()
  • Method Details

    • builder

      public static SessionContextBuilder builder()
      Start configuring a SessionContext.
    • sql

      public DataFrame sql(String query)
      Parse and plan query, returning a lazy DataFrame. The query is not executed until DataFrame.collect(org.apache.arrow.memory.BufferAllocator) is called.
    • fromProto

      public DataFrame fromProto(byte[] planBytes)
      Decode a DataFusion-Proto LogicalPlanNode and return a lazy DataFrame. The plan is not executed until DataFrame.collect(org.apache.arrow.memory.BufferAllocator) is called.

      The bytes must be a serialized datafusion.LogicalPlanNode (see org.apache.datafusion.protobuf.LogicalPlanNode).

      Throws:
      RuntimeException - if the bytes are not a valid LogicalPlanNode or if logical planning fails.
    • fromSubstrait

      public DataFrame fromSubstrait(byte[] planBytes)
      Decode a Substrait Plan message and return a lazy DataFrame. The plan is not executed until DataFrame.collect(org.apache.arrow.memory.BufferAllocator) or DataFrame.executeStream(org.apache.arrow.memory.BufferAllocator) is called.

      planBytes must be a serialised substrait.proto.Plan. The plan is translated to a DataFusion DataFrame against this context's catalog: any tables referenced by the plan must already be registered (see registerCsv(java.lang.String, java.lang.String), registerParquet(java.lang.String, java.lang.String), etc.).

      This entry point lets Java callers compile plans elsewhere — Calcite via Isthmus, custom planners, or any other Substrait-emitting tool — and hand them to DataFusion without round-tripping through SQL.

      Substrait support is gated behind the substrait Cargo feature on the native crate and is off by default. Rebuild the native crate with cargo build -p datafusion-jni --features substrait (or ... --features substrait,protoc for hermetic builds that vendor protoc via cmake) to enable it. If invoked against a native binary built without the feature, this method throws RuntimeException pointing at the flag.

      Throws:
      IllegalArgumentException - if planBytes is null.
      IllegalStateException - if this context is closed.
      RuntimeException - if the bytes are not a valid substrait.proto.Plan, if Substrait→DataFusion translation fails (e.g. the plan references an unregistered table), or if the native crate was built without the substrait feature.
    • memoryUsage

      public MemoryUsage memoryUsage()
      Snapshot the session's memory pool: bytes currently held and the peak observed since this session was created. Thread-safe; can be polled while queries run.

      For multi-tenant attribution, place each tenant in its own SessionContext. Within a single session, the snapshot is the sum across all in-flight queries -- there is no per-DataFrame breakdown today.

      The session's MemoryPool is wrapped transparently with a tracking adapter at construction time; the wrapper layers on top of whatever pool SessionContextBuilder.memoryLimit(long, double) produced (or DataFusion's default unbounded pool) and does not change pool semantics (limits, eviction, spilling).

      What this counts: bytes reserved against the MemoryPool -- operator state for sorts, hash joins, aggregates, repartition buffers, and anything else that uses DataFusion's MemoryReservation machinery during execution.

      What this does not count: memory held outside the pool, including record-batch buffers materialised by DataFrame.cache() (stored in an in-memory MemTable as plain Vec<RecordBatch> with no reservation), record-batch buffers that have crossed the FFI boundary into Arrow's Java allocator, and JVM-side allocations. Operator-level reservations are released as the plan unwinds, so a query that runs to completion typically returns currentBytes to ~0 even if the result set is large.

      Throws:
      IllegalStateException - if this context is closed.
      RuntimeException - if the native side has not registered a tracker for this handle (should not happen in practice -- tracker registration is done by the constructor).
    • runtimeStats

      public RuntimeStats runtimeStats()
      Snapshot operational counters from the underlying Tokio runtime: worker count, busy time, queue depth, etc. Thread-safe; can be polled while queries run.

      The runtime is process-wide rather than per-session because the JNI library drives a single shared multi-threaded Tokio runtime. The SessionContext handle is checked only to ensure the caller still has a live session; the values returned are not session-specific.

      Requires the runtime-metrics Cargo feature on the native crate (off by default). Rebuild with:

      
       RUSTFLAGS="--cfg tokio_unstable" cargo build -p datafusion-jni --features runtime-metrics
       

      If invoked against a native binary built without the feature, this method throws RuntimeException with a message pointing at the rebuild command.

      Throws:
      IllegalStateException - if this context is closed.
      RuntimeException - if the native crate was built without the runtime-metrics feature.
    • tableSchema

      public org.apache.arrow.vector.types.pojo.Schema tableSchema(String tableName)
      Return the Arrow Schema of a registered table. Transferred via Arrow IPC; no BufferAllocator is required because a schema carries no buffer data.
      Throws:
      RuntimeException - if tableName is not registered in this context.
    • getOption

      public String getOption(String key)
      Read the current value of a datafusion.* config key. The key must be one DataFusion recognises (see SessionContextBuilder.setOption(String, String) for examples and the upstream configuration reference for the full list).

      datafusion.runtime.* keys (memory limit, temp directory, cache sizes, etc) are not yet supported by this getter and will throw. Use the typed SessionContextBuilder.memoryLimit(long, double) and SessionContextBuilder.tempDirectory(String) setters at construction time instead. Round-trip support for the runtime subtree is tracked as a follow-up.

      Returns:
      the current value as a string, or null if the key is recognised but has no value set and no default.
      Throws:
      IllegalArgumentException - if key is null.
      RuntimeException - if the key is not recognised by DataFusion or is in the datafusion.runtime.* subtree.
      IllegalStateException - if this context is closed.
    • registerCsv

      public void registerCsv(String name, String path)
    • registerCsv

      public void registerCsv(String name, String path, CsvReadOptions options)
      Register a CSV file (or directory of CSV files) as a table with the supplied CsvReadOptions.
      Throws:
      RuntimeException - if registration fails (path not found, schema inference error, etc.).
    • readCsv

      public DataFrame readCsv(String path)
      Read a CSV file as a DataFrame without registering it.
    • readCsv

      public DataFrame readCsv(String path, CsvReadOptions options)
      Read a CSV file as a DataFrame with the supplied CsvReadOptions.
      Throws:
      RuntimeException - if the read fails.
    • registerJson

      public void registerJson(String name, String path)
    • registerJson

      public void registerJson(String name, String path, NdJsonReadOptions options)
      Register a newline-delimited JSON file (or directory of NDJSON files) as a table with the supplied NdJsonReadOptions.
      Throws:
      RuntimeException - if registration fails (path not found, schema inference error, etc.).
    • readJson

      public DataFrame readJson(String path)
      Read a newline-delimited JSON file as a DataFrame without registering it.
    • readJson

      public DataFrame readJson(String path, NdJsonReadOptions options)
      Read a newline-delimited JSON file as a DataFrame with the supplied NdJsonReadOptions.
      Throws:
      RuntimeException - if the read fails.
    • registerParquet

      public void registerParquet(String name, String path)
    • registerParquet

      public void registerParquet(String name, String path, ParquetReadOptions options)
      Register a parquet file as a table with the supplied ParquetReadOptions.
      Throws:
      RuntimeException - if registration fails (path not found, schema mismatch, etc.).
    • readParquet

      public DataFrame readParquet(String path)
      Read a parquet file as a DataFrame without registering it.
    • readParquet

      public DataFrame readParquet(String path, ParquetReadOptions options)
      Read a parquet file as a DataFrame with the supplied ParquetReadOptions.
      Throws:
      RuntimeException - if the read fails.
    • registerArrow

      public void registerArrow(String name, String path)
      Register an Arrow IPC file (or directory of Arrow IPC files) as a table.
    • registerArrow

      public void registerArrow(String name, String path, ArrowReadOptions options)
      Register an Arrow IPC file (or directory of Arrow IPC files) as a table with the supplied ArrowReadOptions.
      Throws:
      IllegalArgumentException - if any of name, path, or options is null.
      RuntimeException - if registration fails (path not found, schema mismatch, etc.).
    • readArrow

      public DataFrame readArrow(String path)
      Read an Arrow IPC file as a DataFrame without registering it.
    • readArrow

      public DataFrame readArrow(String path, ArrowReadOptions options)
      Read an Arrow IPC file as a DataFrame with the supplied ArrowReadOptions.
      Throws:
      IllegalArgumentException - if path or options is null.
      RuntimeException - if the read fails.
    • registerAvro

      public void registerAvro(String name, String path)
      Register an Avro file (or directory of Avro files) as a table.
    • registerAvro

      public void registerAvro(String name, String path, AvroReadOptions options)
      Register an Avro file (or directory of Avro files) as a table with the supplied AvroReadOptions.
      Throws:
      IllegalArgumentException - if any of name, path, or options is null.
      RuntimeException - if registration fails (path not found, schema mismatch, etc.).
    • readAvro

      public DataFrame readAvro(String path)
      Read an Avro file as a DataFrame without registering it.
    • readAvro

      public DataFrame readAvro(String path, AvroReadOptions options)
      Read an Avro file as a DataFrame with the supplied AvroReadOptions.
      Throws:
      IllegalArgumentException - if path or options is null.
      RuntimeException - if the read fails.
    • registerUdf

      public void registerUdf(ScalarUdf udf)
      Register a Java-implemented scalar UDF. After registration, the function can be invoked by SQL via the UDF's name or referenced in DataFusion plans deserialised with fromProto(byte[]).

      The UDF is registered with an exact signature: the runtime will reject calls whose argument types do not match the declared ScalarFunction.argFields() exactly.

      Throws:
      RuntimeException - if registration fails (e.g., name already registered with an incompatible signature, schema serialisation failure).
    • registerTable

      public void registerTable(String name, TableProvider provider)
      Register a Java-implemented TableProvider under name. SQL queries that reference name call back into provider to fetch batches.

      TableProvider.schema() is called once here, on the calling thread, and cached on the native side. TableProvider.scan(org.apache.arrow.memory.BufferAllocator) is called once per query that touches the table, on a Tokio worker thread; it must return a fresh, independent ArrowReader on every call, with its buffers allocated from the BufferAllocator the framework supplies.

      This is the Java counterpart to DataFusion's Rust SessionContext::register_table.

      Throws:
      IllegalArgumentException - if name or provider is null.
      IllegalStateException - if provider.schema() returns null, or this context is closed.
      RuntimeException - if native registration fails.
    • tableExists

      public boolean tableExists(String name)
      Returns true if a table with the given name is registered in this session.

      This is the Java counterpart to DataFusion's Rust SessionContext::table_exist.

      Throws:
      IllegalStateException - if this context is closed.
    • deregisterTable

      public void deregisterTable(String name)
      Removes the table with the given name from this session. Does nothing if no table with that name is registered.

      This is the Java counterpart to DataFusion's Rust SessionContext::deregister_table.

      Throws:
      IllegalStateException - if this context is closed.
    • close

      public void close()
      Specified by:
      close in interface AutoCloseable