Interface TableProvider

All Known Implementing Classes:
SimpleTableProvider

public interface TableProvider
A Java-implemented table that can be registered with a SessionContext via SessionContext.registerTable(String, TableProvider). Mirrors the role of DataFusion's Rust TableProvider trait, but at present only exposes the methods needed for a full table scan; future versions may add filter/projection pushdown and multi-partition support as default methods so existing implementations keep working.

SimpleTableProvider is a ready-made implementation for the common case of "I have a schema and a function that returns an ArrowReader".

Each call to scan(BufferAllocator) must return a fresh, independent ArrowReader so that queries which touch the table more than once (self-joins, UNION ALL, repeated reads) work correctly. The returned reader is closed by the framework when the stream ends.

The schema returned by schema() is captured once at registration time. Every batch produced by every ArrowReader returned from scan(BufferAllocator) must conform to it; a mismatch fails the query.

  • Method Summary

    Modifier and Type
    Method
    Description
    org.apache.arrow.vector.ipc.ArrowReader
    scan(org.apache.arrow.memory.BufferAllocator allocator)
    Open a fresh batch stream for this table.
    org.apache.arrow.vector.types.pojo.Schema
    The fixed schema of this table.
  • Method Details

    • schema

      org.apache.arrow.vector.types.pojo.Schema schema()
      The fixed schema of this table. Called once, at registration time.
    • scan

      org.apache.arrow.vector.ipc.ArrowReader scan(org.apache.arrow.memory.BufferAllocator allocator)
      Open a fresh batch stream for this table. Called once per physical scan of the table — a single query may invoke this more than once (self-joins, UNION ALL over the same table, etc.).

      Each invocation MUST return an independent ArrowReader. The reader's schema MUST equal schema(). The reader's buffers MUST be allocated from allocator (or from a child of it) — the framework needs the reader's allocator hierarchy to share a root with the one it passes here. The allocator contract mirrors the one on ScalarFunction.evaluate(org.apache.arrow.memory.BufferAllocator, org.apache.datafusion.ScalarFunctionArgs).