datafusion.catalog¶
Data catalog providers.
Classes¶
DataFusion data catalog. |
|
DataFusion data catalog list. |
|
Abstract class for defining a Python based Catalog Provider. |
|
Abstract class for defining a Python based Catalog Provider List. |
|
DataFusion Schema. |
|
Abstract class for defining a Python based Schema Provider. |
|
A DataFusion table. |
Module Contents¶
- class datafusion.catalog.Catalog(catalog: datafusion._internal.catalog.RawCatalog)¶
DataFusion data catalog.
This constructor is not typically called by the end user.
- __repr__() str¶
Print a string representation of the catalog.
- deregister_schema(name: str, cascade: bool = True) Schema | None¶
Deregister a schema from this catalog.
- static memory_catalog(ctx: datafusion.SessionContext | None = None) Catalog¶
Create an in-memory catalog provider.
- names() set[str]¶
This is an alias for schema_names.
- register_schema(name: str, schema: Schema | SchemaProvider | SchemaProviderExportable) Schema | None¶
Register a schema with this catalog.
- schema_names() set[str]¶
Returns the list of schemas in this catalog.
- catalog¶
- class datafusion.catalog.CatalogList(catalog_list: datafusion._internal.catalog.RawCatalogList)¶
DataFusion data catalog list.
This constructor is not typically called by the end user.
- __repr__() str¶
Print a string representation of the catalog list.
- catalog(name: str = 'datafusion') Catalog¶
Returns the catalog with the given
namefrom this catalog.
- catalog_names() set[str]¶
Returns the list of schemas in this catalog.
- static memory_catalog(ctx: datafusion.SessionContext | None = None) CatalogList¶
Create an in-memory catalog provider list.
- names() set[str]¶
This is an alias for catalog_names.
- register_catalog(name: str, catalog: Catalog | CatalogProvider | CatalogProviderExportable) Catalog | None¶
Register a catalog with this catalog list.
- catalog_list¶
- class datafusion.catalog.CatalogProvider¶
Bases:
abc.ABCAbstract class for defining a Python based Catalog Provider.
- deregister_schema(name: str, cascade: bool) None¶
Remove a schema from this catalog.
This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.
- Parameters:
name – The name of the schema to remove.
cascade – If true, deregister the tables within the schema.
- register_schema(name: str, schema: SchemaProviderExportable | SchemaProvider | Schema) None¶
Add a schema to this catalog.
This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.
- abstract schema_names() set[str]¶
Set of the names of all schemas in this catalog.
- class datafusion.catalog.CatalogProviderList¶
Bases:
abc.ABCAbstract class for defining a Python based Catalog Provider List.
- abstract catalog(name: str) CatalogProviderExportable | CatalogProvider | Catalog | None¶
Retrieve a specific catalog from this catalog list.
- abstract catalog_names() set[str]¶
Set of the names of all catalogs in this catalog list.
- register_catalog(name: str, catalog: CatalogProviderExportable | CatalogProvider | Catalog) None¶
Add a catalog to this catalog list.
This method is optional. If your catalog provides a fixed list of catalogs, you do not need to implement this method.
- class datafusion.catalog.Schema(schema: datafusion._internal.catalog.RawSchema)¶
DataFusion Schema.
This constructor is not typically called by the end user.
- __repr__() str¶
Print a string representation of the schema.
- deregister_table(name: str) None¶
Deregister a table provider from this schema.
- static memory_schema(ctx: datafusion.SessionContext | None = None) Schema¶
Create an in-memory schema provider.
- names() set[str]¶
This is an alias for table_names.
- register_table(name: str, table: Table | datafusion.context.TableProviderExportable | datafusion.DataFrame | pyarrow.dataset.Dataset) None¶
Register a table in this schema.
- table_exist(name: str) bool¶
Determines if a table exists in this schema.
- table_names() set[str]¶
Returns the list of all tables in this schema.
- _raw_schema¶
- class datafusion.catalog.SchemaProvider¶
Bases:
abc.ABCAbstract class for defining a Python based Schema Provider.
- deregister_table(name: str, cascade: bool) None¶
Remove a table from this schema.
This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.
- owner_name() str | None¶
Returns the owner of the schema.
This is an optional method. The default return is None.
- register_table(name: str, table: Table | datafusion.context.TableProviderExportable | Any) None¶
Add a table to this schema.
This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.
- abstract table_exist(name: str) bool¶
Returns true if the table exists in this schema.
- abstract table_names() set[str]¶
Set of the names of all tables in this schema.
- class datafusion.catalog.Table(table: Table | datafusion.context.TableProviderExportable | datafusion.DataFrame | pyarrow.dataset.Dataset, ctx: datafusion.SessionContext | None = None)¶
A DataFusion table.
Internally we currently support the following types of tables:
Tables created using built-in DataFusion methods, such as reading from CSV or Parquet
pyarrow datasets
DataFusion DataFrames, which will be converted into a view
Externally provided tables implemented with the FFI PyCapsule interface (advanced)
Constructor.
- __repr__() str¶
Print a string representation of the table.
- static from_dataset(dataset: pyarrow.dataset.Dataset) Table¶
Turn a
pyarrow.datasetDatasetinto aTable.
- __slots__ = ('_inner',)¶
- _inner¶
- property kind: str¶
Returns the kind of table.
- property schema: pyarrow.Schema¶
Returns the schema associated with this table.