datafusion.catalog

Data catalog providers.

Classes

Catalog

DataFusion data catalog.

CatalogProvider

Abstract class for defining a Python based Catalog Provider.

Schema

DataFusion Schema.

SchemaProvider

Abstract class for defining a Python based Schema Provider.

Table

DataFusion table.

Module Contents

class datafusion.catalog.Catalog(catalog: datafusion._internal.catalog.RawCatalog)

DataFusion data catalog.

This constructor is not typically called by the end user.

__repr__() str

Print a string representation of the catalog.

database(name: str = 'public') Schema

Returns the database with the given name from this catalog.

deregister_schema(name: str, cascade: bool = True) Schema | None

Deregister a schema from this catalog.

static memory_catalog() Catalog

Create an in-memory catalog provider.

names() set[str]

This is an alias for schema_names.

register_schema(name, schema) Schema | None

Register a schema with this catalog.

schema(name: str = 'public') Schema

Returns the database with the given name from this catalog.

schema_names() set[str]

Returns the list of schemas in this catalog.

catalog
class datafusion.catalog.CatalogProvider

Bases: abc.ABC

Abstract class for defining a Python based Catalog Provider.

deregister_schema(name: str, cascade: bool) None

Remove a schema from this catalog.

This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.

Parameters:
  • name – The name of the schema to remove.

  • cascade – If true, deregister the tables within the schema.

register_schema(name: str, schema: SchemaProviderExportable | SchemaProvider | Schema) None

Add a schema to this catalog.

This method is optional. If your catalog provides a fixed list of schemas, you do not need to implement this method.

abstract schema(name: str) Schema | None

Retrieve a specific schema from this catalog.

abstract schema_names() set[str]

Set of the names of all schemas in this catalog.

class datafusion.catalog.Schema(schema: datafusion._internal.catalog.RawSchema)

DataFusion Schema.

This constructor is not typically called by the end user.

__repr__() str

Print a string representation of the schema.

deregister_table(name: str) None

Deregister a table provider from this schema.

static memory_schema() Schema

Create an in-memory schema provider.

names() set[str]

This is an alias for table_names.

register_table(name, table) None

Register a table provider in this schema.

table(name: str) Table

Return the table with the given name from this schema.

table_names() set[str]

Returns the list of all tables in this schema.

_raw_schema
class datafusion.catalog.SchemaProvider

Bases: abc.ABC

Abstract class for defining a Python based Schema Provider.

deregister_table(name, cascade: bool) None

Remove a table from this schema.

This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.

owner_name() str | None

Returns the owner of the schema.

This is an optional method. The default return is None.

register_table(name: str, table: Table) None

Add a table from this schema.

This method is optional. If your schema provides a fixed list of tables, you do not need to implement this method.

abstract table(name: str) Table | None

Retrieve a specific table from this schema.

abstract table_exist(name: str) bool

Returns true if the table exists in this schema.

abstract table_names() set[str]

Set of the names of all tables in this schema.

class datafusion.catalog.Table(table: datafusion._internal.catalog.RawTable)

DataFusion table.

This constructor is not typically called by the end user.

__repr__() str

Print a string representation of the table.

static from_dataset(dataset: pyarrow.dataset.Dataset) Table

Turn a pyarrow Dataset into a Table.

property kind: str

Returns the kind of table.

property schema: pyarrow.Schema

Returns the schema associated with this table.

table