datafusion.dataframe_formatter#

HTML formatting utilities for DataFusion DataFrames.

Classes#

CellFormatter

Protocol for cell value formatters.

DataFrameHtmlFormatter

Configurable HTML formatter for DataFusion DataFrames.

DefaultStyleProvider

Default implementation of StyleProvider.

FormatterManager

Manager class for the global DataFrame HTML formatter instance.

StyleProvider

Protocol for HTML style providers.

Functions#

_refresh_formatter_reference(→ None)

Refresh formatter reference in any modules using it.

_validate_bool(→ None)

Validate that a parameter is a boolean.

_validate_formatter_parameters(→ int)

Validate all formatter parameters and return resolved max_rows value.

_validate_positive_int(→ None)

Validate that a parameter is a positive integer.

configure_formatter(→ None)

Configure the global DataFrame HTML formatter.

get_formatter(→ DataFrameHtmlFormatter)

Get the current global DataFrame HTML formatter.

reset_formatter(→ None)

Reset the global DataFrame HTML formatter to default settings.

set_formatter(→ None)

Set the global DataFrame HTML formatter.

Module Contents#

class datafusion.dataframe_formatter.CellFormatter#

Bases: Protocol

Protocol for cell value formatters.

__call__(value: Any) str#

Format a cell value to string representation.

class datafusion.dataframe_formatter.DataFrameHtmlFormatter(max_cell_length: int = 25, max_width: int = 1000, max_height: int = 300, max_memory_bytes: int = 2 * 1024 * 1024, min_rows: int = 10, max_rows: int | None = None, repr_rows: int | None = None, enable_cell_expansion: bool = True, custom_css: str | None = None, show_truncation_message: bool = True, style_provider: StyleProvider | None = None, use_shared_styles: bool = True)#

Configurable HTML formatter for DataFusion DataFrames.

This class handles the HTML rendering of DataFrames for display in Jupyter notebooks and other rich display contexts.

This class supports extension through composition. Key extension points: - Provide a custom StyleProvider for styling cells and headers - Register custom formatters for specific types - Provide custom cell builders for specialized cell rendering

Parameters:
  • max_cell_length – Maximum characters to display in a cell before truncation

  • max_width – Maximum width of the HTML table in pixels

  • max_height – Maximum height of the HTML table in pixels

  • max_memory_bytes – Maximum memory in bytes for rendered data (default: 2MB)

  • min_rows – Minimum number of rows to display (must be <= max_rows)

  • max_rows – Maximum number of rows to display in repr output

  • repr_rows – Deprecated alias for max_rows

  • enable_cell_expansion – Whether to add expand/collapse buttons for long cell values

  • custom_css – Additional CSS to include in the HTML output

  • show_truncation_message – Whether to display a message when data is truncated

  • style_provider – Custom provider for cell and header styles

  • use_shared_styles – Whether to load styles and scripts only once per notebook session

Initialize the HTML formatter.

Parameters:
  • max_cell_length – Maximum length of cell content before truncation.

  • max_width – Maximum width of the displayed table in pixels.

  • max_height – Maximum height of the displayed table in pixels.

  • max_memory_bytes – Maximum memory in bytes for rendered data. Helps prevent performance issues with large datasets.

  • min_rows – Minimum number of rows to display even if memory limit is reached. Must not exceed max_rows.

  • max_rows – Maximum number of rows to display. Takes precedence over memory limits when fewer rows are requested.

  • repr_rows – Deprecated alias for max_rows. Use max_rows instead.

  • enable_cell_expansion – Whether to allow cells to expand when clicked.

  • custom_css – Custom CSS to apply to the HTML table.

  • show_truncation_message – Whether to show a message indicating that content has been truncated.

  • style_provider – Provider of CSS styles for the HTML table. If None, DefaultStyleProvider is used.

  • use_shared_styles – Whether to use shared styles across multiple tables. This improves performance when displaying many DataFrames in a single notebook.

  • Raises

  • ------

  • ValueError – If max_cell_length, max_width, max_height, max_memory_bytes, min_rows or max_rows is not a positive integer, or if min_rows exceeds max_rows.

  • TypeError – If enable_cell_expansion, show_truncation_message, or use_shared_styles is not a boolean, or if custom_css is provided but is not a string, or if style_provider is provided but does not implement the StyleProvider protocol.

_build_expandable_cell(formatted_value: str, row_count: int, col_idx: int, table_uuid: str) str#

Build an expandable cell for long content.

Build the HTML footer with JavaScript and messages.

_build_html_header() list[str]#

Build the HTML header with CSS styles.

_build_regular_cell(formatted_value: str) str#

Build a regular table cell.

_build_table_body(batches: list, table_uuid: str) list[str]#

Build the HTML table body with data rows.

_build_table_container_start() list[str]#

Build the opening tags for the table container.

_build_table_header(schema: Any) list[str]#

Build the HTML table header with column names.

_format_cell_value(value: Any) str#

Format a cell value for display.

Uses registered type formatters if available.

Parameters:

value – The cell value to format

Returns:

Formatted cell value as string

_get_cell_value(column: Any, row_idx: int) Any#

Extract a cell value from a column.

Parameters:
  • column – Arrow array

  • row_idx – Row index

Returns:

The raw cell value

_get_default_css() str#

Get default CSS styles for the HTML table.

_get_javascript() str#

Get JavaScript code for interactive elements.

format_html(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str#

Format record batches as HTML.

This method is used by DataFrame’s _repr_html_ implementation and can be called directly when custom HTML rendering is needed.

Parameters:
  • batches – List of Arrow RecordBatch objects

  • schema – Arrow Schema object

  • has_more – Whether there are more batches not shown

  • table_uuid – Unique ID for the table, used for JavaScript interactions

Returns:

HTML string representation of the data

Raises:

TypeError – If schema is invalid and no batches are provided

format_str(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str#

Format record batches as a string.

This method is used by DataFrame’s __repr__ implementation and can be called directly when string rendering is needed.

Parameters:
  • batches – List of Arrow RecordBatch objects

  • schema – Arrow Schema object

  • has_more – Whether there are more batches not shown

  • table_uuid – Unique ID for the table, used for JavaScript interactions

Returns:

String representation of the data

Raises:

TypeError – If schema is invalid and no batches are provided

register_formatter(type_class: type, formatter: CellFormatter) None#

Register a custom formatter for a specific data type.

Parameters:
  • type_class – The type to register a formatter for

  • formatter – Function that takes a value of the given type and returns a formatted string

set_custom_cell_builder(builder: collections.abc.Callable[[Any, int, int, str], str]) None#

Set a custom cell builder function.

Parameters:

builder – Function that takes (value, row, col, table_id) and returns HTML

set_custom_header_builder(builder: collections.abc.Callable[[Any], str]) None#

Set a custom header builder function.

Parameters:

builder – Function that takes a field and returns HTML

_custom_cell_builder: collections.abc.Callable[[Any, int, int, str], str] | None = None#
_custom_header_builder: collections.abc.Callable[[Any], str] | None = None#
_max_rows = None#
_type_formatters: dict[type, CellFormatter]#
custom_css = None#
enable_cell_expansion = True#
max_cell_length = 25#
max_height = 300#
max_memory_bytes = 2097152#
property max_rows: int#

Get the maximum number of rows to display.

Returns:

The maximum number of rows to display in repr output

max_width = 1000#
min_rows = 10#
property repr_rows: int#

Get the maximum number of rows (deprecated name).

Deprecated since version Use: max_rows instead. This property is provided for backward compatibility.

Returns:

The maximum number of rows to display

show_truncation_message = True#
style_provider#
use_shared_styles = True#
class datafusion.dataframe_formatter.DefaultStyleProvider#

Default implementation of StyleProvider.

get_cell_style() str#

Get the CSS style for table cells.

Returns:

CSS style string

get_header_style() str#

Get the CSS style for header cells.

Returns:

CSS style string

class datafusion.dataframe_formatter.FormatterManager#

Manager class for the global DataFrame HTML formatter instance.

classmethod get_formatter() DataFrameHtmlFormatter#

Get the current global DataFrame HTML formatter.

Returns:

The global HTML formatter instance

classmethod set_formatter(formatter: DataFrameHtmlFormatter) None#

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

_default_formatter: DataFrameHtmlFormatter#
class datafusion.dataframe_formatter.StyleProvider#

Bases: Protocol

Protocol for HTML style providers.

get_cell_style() str#

Get the CSS style for table cells.

get_header_style() str#

Get the CSS style for header cells.

datafusion.dataframe_formatter._refresh_formatter_reference() None#

Refresh formatter reference in any modules using it.

This helps ensure that changes to the formatter are reflected in existing DataFrames that might be caching the formatter reference.

datafusion.dataframe_formatter._validate_bool(value: Any, param_name: str) None#

Validate that a parameter is a boolean.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

TypeError – If the value is not a boolean

datafusion.dataframe_formatter._validate_formatter_parameters(max_cell_length: int, max_width: int, max_height: int, max_memory_bytes: int, min_rows: int, max_rows: int | None, repr_rows: int | None, enable_cell_expansion: bool, show_truncation_message: bool, use_shared_styles: bool, custom_css: str | None, style_provider: Any) int#

Validate all formatter parameters and return resolved max_rows value.

Parameters:
  • max_cell_length – Maximum cell length value to validate

  • max_width – Maximum width value to validate

  • max_height – Maximum height value to validate

  • max_memory_bytes – Maximum memory bytes value to validate

  • min_rows – Minimum rows to display value to validate

  • max_rows – Maximum rows value to validate (None means use default)

  • repr_rows – Deprecated repr_rows value to validate

  • enable_cell_expansion – Boolean expansion flag to validate

  • show_truncation_message – Boolean message flag to validate

  • use_shared_styles – Boolean styles flag to validate

  • custom_css – Custom CSS string to validate

  • style_provider – Style provider object to validate

Returns:

The resolved max_rows value after handling repr_rows deprecation

Raises:
  • ValueError – If any numeric parameter is invalid or constraints are violated

  • TypeError – If any parameter has invalid type

  • DeprecationWarning – If repr_rows parameter is used

datafusion.dataframe_formatter._validate_positive_int(value: Any, param_name: str) None#

Validate that a parameter is a positive integer.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

ValueError – If the value is not a positive integer

datafusion.dataframe_formatter.configure_formatter(**kwargs: Any) None#

Configure the global DataFrame HTML formatter.

This function creates a new formatter with the provided configuration and sets it as the global formatter for all DataFrames.

Parameters:

**kwargs – Formatter configuration parameters like max_cell_length, max_width, max_height, enable_cell_expansion, etc.

Raises:

ValueError – If any invalid parameters are provided

Example

>>> from datafusion.dataframe_formatter import configure_formatter
>>> configure_formatter(
...     max_cell_length=50,
...     max_height=500,
...     enable_cell_expansion=True,
...     use_shared_styles=True
... )
datafusion.dataframe_formatter.get_formatter() DataFrameHtmlFormatter#

Get the current global DataFrame HTML formatter.

This function is used by the DataFrame._repr_html_ implementation to access the shared formatter instance. It can also be used directly when custom HTML rendering is needed.

Returns:

The global HTML formatter instance

Example

>>> from datafusion.dataframe_formatter import get_formatter
>>> formatter = get_formatter()
>>> formatter.max_cell_length = 50  # Increase cell length
datafusion.dataframe_formatter.reset_formatter() None#

Reset the global DataFrame HTML formatter to default settings.

This function creates a new formatter with default configuration and sets it as the global formatter for all DataFrames.

Example

>>> from datafusion.dataframe_formatter import reset_formatter
>>> reset_formatter()  # Reset formatter to default settings
datafusion.dataframe_formatter.set_formatter(formatter: DataFrameHtmlFormatter) None#

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

Example

>>> from datafusion.dataframe_formatter import get_formatter, set_formatter
>>> custom_formatter = DataFrameHtmlFormatter(max_cell_length=100)
>>> set_formatter(custom_formatter)