Interface TableProvider
- All Known Implementing Classes:
SimpleTableProvider
SessionContext via SessionContext.registerTable(String, TableProvider). Mirrors the role of DataFusion's Rust
TableProvider trait, but at present only exposes the methods needed for a full table
scan; future versions may add filter/projection pushdown and multi-partition support as default
methods so existing implementations keep working.
SimpleTableProvider is a ready-made implementation for the common case of "I have a
schema and a function that returns an ArrowReader".
Each call to scan(BufferAllocator) must return a fresh, independent ArrowReader so that queries which touch the table more than once (self-joins, UNION ALL,
repeated reads) work correctly. The returned reader is closed by the framework when the stream
ends.
The schema returned by schema() is captured once at registration time. Every batch
produced by every ArrowReader returned from scan(BufferAllocator) must conform
to it; a mismatch fails the query.
-
Method Summary
-
Method Details
-
schema
org.apache.arrow.vector.types.pojo.Schema schema()The fixed schema of this table. Called once, at registration time. -
scan
org.apache.arrow.vector.ipc.ArrowReader scan(org.apache.arrow.memory.BufferAllocator allocator) Open a fresh batch stream for this table. Called once per physical scan of the table — a single query may invoke this more than once (self-joins,UNION ALLover the same table, etc.).Each invocation MUST return an independent
ArrowReader. The reader's schema MUST equalschema(). The reader's buffers MUST be allocated fromallocator(or from a child of it) — the framework needs the reader's allocator hierarchy to share a root with the one it passes here. The allocator contract mirrors the one onScalarFunction.evaluate(org.apache.arrow.memory.BufferAllocator, org.apache.datafusion.ScalarFunctionArgs).
-