datafusion.record_batch

This module provides the classes for handling record batches.

These are typically the result of dataframe datafusion.dataframe.execute_stream() operations.

Classes

RecordBatch

This class is essentially a wrapper for pa.RecordBatch.

RecordBatchStream

This class represents a stream of record batches.

Module Contents

class datafusion.record_batch.RecordBatch(record_batch: datafusion._internal.RecordBatch)

This class is essentially a wrapper for pa.RecordBatch.

This constructor is generally not called by the end user.

See the RecordBatchStream iterator for generating this class.

__arrow_c_array__(requested_schema: object | None = None) tuple[object, object]

Export the record batch via the Arrow C Data Interface.

This allows zero-copy interchange with libraries that support the Arrow PyCapsule interface.

Parameters:

requested_schema – Attempt to provide the record batch using this schema. Only straightforward projections such as column selection or reordering are applied.

Returns:

Two Arrow PyCapsule objects representing the ArrowArray and ArrowSchema.

to_pyarrow() pyarrow.RecordBatch

Convert to pa.RecordBatch.

record_batch
class datafusion.record_batch.RecordBatchStream(record_batch_stream: datafusion._internal.RecordBatchStream)

This class represents a stream of record batches.

These are typically the result of a execute_stream() operation.

This constructor is typically not called by the end user.

__aiter__() typing_extensions.Self

Return an asynchronous iterator over record batches.

async __anext__() RecordBatch

Return the next RecordBatch in the stream asynchronously.

__iter__() typing_extensions.Self

Return an iterator over record batches.

__next__() RecordBatch

Return the next RecordBatch in the stream.

next() RecordBatch

See __next__() for the iterator function.

rbs