datafusion.record_batch¶
This module provides the classes for handling record batches.
These are typically the result of dataframe
datafusion.dataframe.execute_stream() operations.
Classes¶
This class is essentially a wrapper for |
|
This class represents a stream of record batches. |
Module Contents¶
- class datafusion.record_batch.RecordBatch(record_batch: datafusion._internal.RecordBatch)¶
This class is essentially a wrapper for
pa.RecordBatch.This constructor is generally not called by the end user.
See the
RecordBatchStreamiterator for generating this class.- __arrow_c_array__(requested_schema: object | None = None) tuple[object, object]¶
Export the record batch via the Arrow C Data Interface.
This allows zero-copy interchange with libraries that support the Arrow PyCapsule interface.
- Parameters:
requested_schema – Attempt to provide the record batch using this schema. Only straightforward projections such as column selection or reordering are applied.
- Returns:
Two Arrow PyCapsule objects representing the
ArrowArrayandArrowSchema.
- to_pyarrow() pyarrow.RecordBatch¶
Convert to
pa.RecordBatch.
- record_batch¶
- class datafusion.record_batch.RecordBatchStream(record_batch_stream: datafusion._internal.RecordBatchStream)¶
This class represents a stream of record batches.
These are typically the result of a
execute_stream()operation.This constructor is typically not called by the end user.
- __aiter__() typing_extensions.Self¶
Return an asynchronous iterator over record batches.
- async __anext__() RecordBatch¶
Return the next
RecordBatchin the stream asynchronously.
- __iter__() typing_extensions.Self¶
Return an iterator over record batches.
- __next__() RecordBatch¶
Return the next
RecordBatchin the stream.
- next() RecordBatch¶
See
__next__()for the iterator function.
- rbs¶