datafusion.html_formatter

HTML formatting utilities for DataFusion DataFrames.

Classes

CellFormatter

Protocol for cell value formatters.

DataFrameHtmlFormatter

Configurable HTML formatter for DataFusion DataFrames.

DefaultStyleProvider

Default implementation of StyleProvider.

FormatterManager

Manager class for the global DataFrame HTML formatter instance.

StyleProvider

Protocol for HTML style providers.

Functions

_refresh_formatter_reference(→ None)

Refresh formatter reference in any modules using it.

_validate_bool(→ None)

Validate that a parameter is a boolean.

_validate_positive_int(→ None)

Validate that a parameter is a positive integer.

configure_formatter(→ None)

Configure the global DataFrame HTML formatter.

get_formatter(→ DataFrameHtmlFormatter)

Get the current global DataFrame HTML formatter.

reset_formatter(→ None)

Reset the global DataFrame HTML formatter to default settings.

reset_styles_loaded_state(→ None)

Reset the styles loaded state to force reloading of styles.

set_formatter(→ None)

Set the global DataFrame HTML formatter.

Module Contents

class datafusion.html_formatter.CellFormatter

Bases: Protocol

Protocol for cell value formatters.

__call__(value: Any) str

Format a cell value to string representation.

class datafusion.html_formatter.DataFrameHtmlFormatter(max_cell_length: int = 25, max_width: int = 1000, max_height: int = 300, max_memory_bytes: int = 2 * 1024 * 1024, min_rows_display: int = 20, repr_rows: int = 10, enable_cell_expansion: bool = True, custom_css: str | None = None, show_truncation_message: bool = True, style_provider: StyleProvider | None = None, use_shared_styles: bool = True)

Configurable HTML formatter for DataFusion DataFrames.

This class handles the HTML rendering of DataFrames for display in Jupyter notebooks and other rich display contexts.

This class supports extension through composition. Key extension points: - Provide a custom StyleProvider for styling cells and headers - Register custom formatters for specific types - Provide custom cell builders for specialized cell rendering

Parameters:
  • max_cell_length – Maximum characters to display in a cell before truncation

  • max_width – Maximum width of the HTML table in pixels

  • max_height – Maximum height of the HTML table in pixels

  • max_memory_bytes – Maximum memory in bytes for rendered data (default: 2MB)

  • min_rows_display – Minimum number of rows to display

  • repr_rows – Default number of rows to display in repr output

  • enable_cell_expansion – Whether to add expand/collapse buttons for long cell values

  • custom_css – Additional CSS to include in the HTML output

  • show_truncation_message – Whether to display a message when data is truncated

  • style_provider – Custom provider for cell and header styles

  • use_shared_styles – Whether to load styles and scripts only once per notebook session

Initialize the HTML formatter.

Parameters:
  • max_cell_length (int, default 25) – Maximum length of cell content before truncation.

  • max_width (int, default 1000) – Maximum width of the displayed table in pixels.

  • max_height (int, default 300) – Maximum height of the displayed table in pixels.

  • max_memory_bytes (int, default 2097152 (2MB)) – Maximum memory in bytes for rendered data.

  • min_rows_display (int, default 20) – Minimum number of rows to display.

  • repr_rows (int, default 10) – Default number of rows to display in repr output.

  • enable_cell_expansion (bool, default True) – Whether to allow cells to expand when clicked.

  • custom_css (str, optional) – Custom CSS to apply to the HTML table.

  • show_truncation_message (bool, default True) – Whether to show a message indicating that content has been truncated.

  • style_provider (StyleProvider, optional) – Provider of CSS styles for the HTML table. If None, DefaultStyleProvider is used.

  • use_shared_styles (bool, default True) – Whether to use shared styles across multiple tables.

  • Raises

  • ------

  • ValueError – If max_cell_length, max_width, max_height, max_memory_bytes, min_rows_display, or repr_rows is not a positive integer.

  • TypeError – If enable_cell_expansion, show_truncation_message, or use_shared_styles is not a boolean, or if custom_css is provided but is not a string, or if style_provider is provided but does not implement the StyleProvider protocol.

_build_expandable_cell(formatted_value: str, row_count: int, col_idx: int, table_uuid: str) str

Build an expandable cell for long content.

Build the HTML footer with JavaScript and messages.

_build_html_header() list[str]

Build the HTML header with CSS styles.

_build_regular_cell(formatted_value: str) str

Build a regular table cell.

_build_table_body(batches: list, table_uuid: str) list[str]

Build the HTML table body with data rows.

_build_table_container_start() list[str]

Build the opening tags for the table container.

_build_table_header(schema: Any) list[str]

Build the HTML table header with column names.

_format_cell_value(value: Any) str

Format a cell value for display.

Uses registered type formatters if available.

Parameters:

value – The cell value to format

Returns:

Formatted cell value as string

_get_cell_value(column: Any, row_idx: int) Any

Extract a cell value from a column.

Parameters:
  • column – Arrow array

  • row_idx – Row index

Returns:

The raw cell value

_get_default_css() str

Get default CSS styles for the HTML table.

_get_javascript() str

Get JavaScript code for interactive elements.

format_html(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str

Format record batches as HTML.

This method is used by DataFrame’s _repr_html_ implementation and can be called directly when custom HTML rendering is needed.

Parameters:
  • batches – List of Arrow RecordBatch objects

  • schema – Arrow Schema object

  • has_more – Whether there are more batches not shown

  • table_uuid – Unique ID for the table, used for JavaScript interactions

Returns:

HTML string representation of the data

Raises:

TypeError – If schema is invalid and no batches are provided

classmethod is_styles_loaded() bool

Check if HTML styles have been loaded in the current session.

This method is primarily intended for debugging UI rendering issues related to style loading.

Returns:

True if styles have been loaded, False otherwise

Example

>>> from datafusion.html_formatter import DataFrameHtmlFormatter
>>> DataFrameHtmlFormatter.is_styles_loaded()
False
register_formatter(type_class: type, formatter: CellFormatter) None

Register a custom formatter for a specific data type.

Parameters:
  • type_class – The type to register a formatter for

  • formatter – Function that takes a value of the given type and returns a formatted string

set_custom_cell_builder(builder: Callable[[Any, int, int, str], str]) None

Set a custom cell builder function.

Parameters:

builder – Function that takes (value, row, col, table_id) and returns HTML

set_custom_header_builder(builder: Callable[[Any], str]) None

Set a custom header builder function.

Parameters:

builder – Function that takes a field and returns HTML

_custom_cell_builder: Callable[[Any, int, int, str], str] | None = None
_custom_header_builder: Callable[[Any], str] | None = None
_styles_loaded = False
_type_formatters: dict[type, CellFormatter]
custom_css = None
enable_cell_expansion = True
max_cell_length = 25
max_height = 300
max_memory_bytes = 2097152
max_width = 1000
min_rows_display = 20
repr_rows = 10
show_truncation_message = True
style_provider
use_shared_styles = True
class datafusion.html_formatter.DefaultStyleProvider

Default implementation of StyleProvider.

get_cell_style() str

Get the CSS style for table cells.

Returns:

CSS style string

get_header_style() str

Get the CSS style for header cells.

Returns:

CSS style string

class datafusion.html_formatter.FormatterManager

Manager class for the global DataFrame HTML formatter instance.

classmethod get_formatter() DataFrameHtmlFormatter

Get the current global DataFrame HTML formatter.

Returns:

The global HTML formatter instance

classmethod set_formatter(formatter: DataFrameHtmlFormatter) None

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

_default_formatter: DataFrameHtmlFormatter
class datafusion.html_formatter.StyleProvider

Bases: Protocol

Protocol for HTML style providers.

get_cell_style() str

Get the CSS style for table cells.

get_header_style() str

Get the CSS style for header cells.

datafusion.html_formatter._refresh_formatter_reference() None

Refresh formatter reference in any modules using it.

This helps ensure that changes to the formatter are reflected in existing DataFrames that might be caching the formatter reference.

datafusion.html_formatter._validate_bool(value: Any, param_name: str) None

Validate that a parameter is a boolean.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

TypeError – If the value is not a boolean

datafusion.html_formatter._validate_positive_int(value: Any, param_name: str) None

Validate that a parameter is a positive integer.

Parameters:
  • value – The value to validate

  • param_name – Name of the parameter (used in error message)

Raises:

ValueError – If the value is not a positive integer

datafusion.html_formatter.configure_formatter(**kwargs: Any) None

Configure the global DataFrame HTML formatter.

This function creates a new formatter with the provided configuration and sets it as the global formatter for all DataFrames.

Parameters:

**kwargs – Formatter configuration parameters like max_cell_length, max_width, max_height, enable_cell_expansion, etc.

Raises:

ValueError – If any invalid parameters are provided

Example

>>> from datafusion.html_formatter import configure_formatter
>>> configure_formatter(
...     max_cell_length=50,
...     max_height=500,
...     enable_cell_expansion=True,
...     use_shared_styles=True
... )
datafusion.html_formatter.get_formatter() DataFrameHtmlFormatter

Get the current global DataFrame HTML formatter.

This function is used by the DataFrame._repr_html_ implementation to access the shared formatter instance. It can also be used directly when custom HTML rendering is needed.

Returns:

The global HTML formatter instance

Example

>>> from datafusion.html_formatter import get_formatter
>>> formatter = get_formatter()
>>> formatter.max_cell_length = 50  # Increase cell length
datafusion.html_formatter.reset_formatter() None

Reset the global DataFrame HTML formatter to default settings.

This function creates a new formatter with default configuration and sets it as the global formatter for all DataFrames.

Example

>>> from datafusion.html_formatter import reset_formatter
>>> reset_formatter()  # Reset formatter to default settings
datafusion.html_formatter.reset_styles_loaded_state() None

Reset the styles loaded state to force reloading of styles.

This can be useful when switching between notebook sessions or when styles need to be refreshed.

Example

>>> from datafusion.html_formatter import reset_styles_loaded_state
>>> reset_styles_loaded_state()  # Force styles to reload in next render
datafusion.html_formatter.set_formatter(formatter: DataFrameHtmlFormatter) None

Set the global DataFrame HTML formatter.

Parameters:

formatter – The formatter instance to use globally

Example

>>> from datafusion.html_formatter import get_formatter, set_formatter
>>> custom_formatter = DataFrameHtmlFormatter(max_cell_length=100)
>>> set_formatter(custom_formatter)