datafusion.catalog#

Data catalog providers.

Classes#

Catalog

DataFusion data catalog.

CatalogList

DataFusion data catalog list.

CatalogProvider

Abstract class for defining a Python based Catalog Provider.

CatalogProviderList

Abstract class for defining a Python based Catalog Provider List.

Schema

DataFusion Schema.

SchemaProvider

Abstract class for defining a Python based Schema Provider.

Table

A DataFusion table.

Module Contents#

class datafusion.catalog.Catalog(catalog: datafusion._internal.catalog.RawCatalog)#

DataFusion data catalog.

This constructor is not typically called by the end user.

__repr__() str#

Print a string representation of the catalog.

deregister_schema(name: str, cascade: bool = True) Schema | None#

Deregister a schema from this catalog.

static memory_catalog(ctx: datafusion.SessionContext | None = None) Catalog#

Create an in-memory catalog provider.

names() set[str]#

This is an alias for schema_names.

register_schema(name: str, schema: Schema | SchemaProvider | SchemaProviderExportable) Schema | None#

Register a schema with this catalog.

schema(name: str = 'public') Schema#

Returns the database with the given name from this catalog.

schema_names() set[str]#

Returns the list of schemas in this catalog.

catalog#
class datafusion.catalog.CatalogList(catalog_list: datafusion._internal.catalog.RawCatalogList)#

DataFusion data catalog list.

This constructor is not typically called by the end user.

__repr__() str#

Print a string representation of the catalog list.

catalog(name: str = 'datafusion') Catalog#

Returns the catalog with the given name from this catalog.

catalog_names() set[str]#

Returns the list of schemas in this catalog.

static memory_catalog(ctx: datafusion.SessionContext | None = None) CatalogList#

Create an in-memory catalog provider list.

names() set[str]#

This is an alias for catalog_names.

register_catalog(name: str, catalog: Catalog | CatalogProvider | CatalogProviderExportable) Catalog | None#

Register a catalog with this catalog list.

catalog_list#
class datafusion.catalog.CatalogProvider#

Bases: abc.ABC

Abstract class for defining a Python based Catalog Provider.

deregister_schema(name: str, cascade: bool) None#

Remove a schema from this catalog.

This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.

Parameters:
  • name – The name of the schema to remove.

  • cascade – If true, deregister the tables within the schema.

register_schema(name: str, schema: SchemaProviderExportable | SchemaProvider | Schema) None#

Add a schema to this catalog.

This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.

abstract schema(name: str) Schema | None#

Retrieve a specific schema from this catalog.

abstract schema_names() set[str]#

Set of the names of all schemas in this catalog.

class datafusion.catalog.CatalogProviderList#

Bases: abc.ABC

Abstract class for defining a Python based Catalog Provider List.

abstract catalog(name: str) CatalogProviderExportable | CatalogProvider | Catalog | None#

Retrieve a specific catalog from this catalog list.

abstract catalog_names() set[str]#

Set of the names of all catalogs in this catalog list.

register_catalog(name: str, catalog: CatalogProviderExportable | CatalogProvider | Catalog) None#

Add a catalog to this catalog list.

This method is optional. If your catalog provides a fixed list of catalogs, you do not need to implement this method.

class datafusion.catalog.Schema(schema: datafusion._internal.catalog.RawSchema)#

DataFusion Schema.

This constructor is not typically called by the end user.

__repr__() str#

Print a string representation of the schema.

deregister_table(name: str) None#

Deregister a table provider from this schema.

static memory_schema(ctx: datafusion.SessionContext | None = None) Schema#

Create an in-memory schema provider.

names() set[str]#

This is an alias for table_names.

register_table(name: str, table: Table | datafusion.context.TableProviderExportable | datafusion.DataFrame | pyarrow.dataset.Dataset) None#

Register a table in this schema.

table(name: str) Table#

Return the table with the given name from this schema.

table_exist(name: str) bool#

Determines if a table exists in this schema.

table_names() set[str]#

Returns the list of all tables in this schema.

_raw_schema#
class datafusion.catalog.SchemaProvider#

Bases: abc.ABC

Abstract class for defining a Python based Schema Provider.

deregister_table(name: str, cascade: bool) None#

Remove a table from this schema.

This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.

owner_name() str | None#

Returns the owner of the schema.

This is an optional method. The default return is None.

register_table(name: str, table: Table | datafusion.context.TableProviderExportable | Any) None#

Add a table to this schema.

This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.

abstract table(name: str) Table | None#

Retrieve a specific table from this schema.

abstract table_exist(name: str) bool#

Returns true if the table exists in this schema.

abstract table_names() set[str]#

Set of the names of all tables in this schema.

class datafusion.catalog.Table(table: Table | datafusion.context.TableProviderExportable | datafusion.DataFrame | pyarrow.dataset.Dataset, ctx: datafusion.SessionContext | None = None)#

A DataFusion table.

Internally we currently support the following types of tables:

  • Tables created using built-in DataFusion methods, such as reading from CSV or Parquet

  • pyarrow datasets

  • DataFusion DataFrames, which will be converted into a view

  • Externally provided tables implemented with the FFI PyCapsule interface (advanced)

Constructor.

__repr__() str#

Print a string representation of the table.

static from_dataset(dataset: pyarrow.dataset.Dataset) Table#

Turn a pyarrow.dataset Dataset into a Table.

__slots__ = ('_inner',)#
_inner#
property kind: str#

Returns the kind of table.

property schema: pyarrow.Schema#

Returns the schema associated with this table.