datafusion.dataframe_formatter¶
HTML formatting utilities for DataFusion DataFrames.
Classes¶
Protocol for cell value formatters. |
|
Configurable HTML formatter for DataFusion DataFrames. |
|
Default implementation of StyleProvider. |
|
Manager class for the global DataFrame HTML formatter instance. |
|
Protocol for HTML style providers. |
Functions¶
|
Refresh formatter reference in any modules using it. |
|
Validate that a parameter is a boolean. |
Validate all formatter parameters and return resolved max_rows value. |
|
|
Validate that a parameter is a positive integer. |
|
Configure the global DataFrame HTML formatter. |
|
Get the current global DataFrame HTML formatter. |
|
Reset the global DataFrame HTML formatter to default settings. |
|
Set the global DataFrame HTML formatter. |
Module Contents¶
- class datafusion.dataframe_formatter.CellFormatter¶
Bases:
ProtocolProtocol for cell value formatters.
- __call__(value: Any) str¶
Format a cell value to string representation.
- class datafusion.dataframe_formatter.DataFrameHtmlFormatter(max_cell_length: int = 25, max_width: int = 1000, max_height: int = 300, max_memory_bytes: int = 2 * 1024 * 1024, min_rows: int = 10, max_rows: int | None = None, repr_rows: int | None = None, enable_cell_expansion: bool = True, custom_css: str | None = None, show_truncation_message: bool = True, style_provider: StyleProvider | None = None, use_shared_styles: bool = True)¶
Configurable HTML formatter for DataFusion DataFrames.
This class handles the HTML rendering of DataFrames for display in Jupyter notebooks and other rich display contexts.
This class supports extension through composition. Key extension points: - Provide a custom StyleProvider for styling cells and headers - Register custom formatters for specific types - Provide custom cell builders for specialized cell rendering
- Parameters:
max_cell_length – Maximum characters to display in a cell before truncation
max_width – Maximum width of the HTML table in pixels
max_height – Maximum height of the HTML table in pixels
max_memory_bytes – Maximum memory in bytes for rendered data (default: 2MB)
min_rows – Minimum number of rows to display (must be <= max_rows)
max_rows – Maximum number of rows to display in repr output
repr_rows – Deprecated alias for max_rows
enable_cell_expansion – Whether to add expand/collapse buttons for long cell values
custom_css – Additional CSS to include in the HTML output
show_truncation_message – Whether to display a message when data is truncated
style_provider – Custom provider for cell and header styles
use_shared_styles – Whether to load styles and scripts only once per notebook session
Initialize the HTML formatter.
- Parameters:
max_cell_length – Maximum length of cell content before truncation.
max_width – Maximum width of the displayed table in pixels.
max_height – Maximum height of the displayed table in pixels.
max_memory_bytes – Maximum memory in bytes for rendered data. Helps prevent performance issues with large datasets.
min_rows – Minimum number of rows to display even if memory limit is reached. Must not exceed
max_rows.max_rows – Maximum number of rows to display. Takes precedence over memory limits when fewer rows are requested.
repr_rows – Deprecated alias for
max_rows. Usemax_rowsinstead.enable_cell_expansion – Whether to allow cells to expand when clicked.
custom_css – Custom CSS to apply to the HTML table.
show_truncation_message – Whether to show a message indicating that content has been truncated.
style_provider – Provider of CSS styles for the HTML table. If None, DefaultStyleProvider is used.
use_shared_styles – Whether to use shared styles across multiple tables. This improves performance when displaying many DataFrames in a single notebook.
Raises
------
ValueError – If max_cell_length, max_width, max_height, max_memory_bytes, min_rows or max_rows is not a positive integer, or if min_rows exceeds max_rows.
TypeError – If enable_cell_expansion, show_truncation_message, or use_shared_styles is not a boolean, or if custom_css is provided but is not a string, or if style_provider is provided but does not implement the StyleProvider protocol.
- _build_expandable_cell(formatted_value: str, row_count: int, col_idx: int, table_uuid: str) str¶
Build an expandable cell for long content.
Build the HTML footer with JavaScript and messages.
- _build_html_header() list[str]¶
Build the HTML header with CSS styles.
- _build_regular_cell(formatted_value: str) str¶
Build a regular table cell.
- _build_table_body(batches: list, table_uuid: str) list[str]¶
Build the HTML table body with data rows.
- _build_table_container_start() list[str]¶
Build the opening tags for the table container.
- _build_table_header(schema: Any) list[str]¶
Build the HTML table header with column names.
- _format_cell_value(value: Any) str¶
Format a cell value for display.
Uses registered type formatters if available.
- Parameters:
value – The cell value to format
- Returns:
Formatted cell value as string
- _get_cell_value(column: Any, row_idx: int) Any¶
Extract a cell value from a column.
- Parameters:
column – Arrow array
row_idx – Row index
- Returns:
The raw cell value
- _get_default_css() str¶
Get default CSS styles for the HTML table.
- _get_javascript() str¶
Get JavaScript code for interactive elements.
- format_html(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str¶
Format record batches as HTML.
This method is used by DataFrame’s _repr_html_ implementation and can be called directly when custom HTML rendering is needed.
- Parameters:
batches – List of Arrow RecordBatch objects
schema – Arrow Schema object
has_more – Whether there are more batches not shown
table_uuid – Unique ID for the table, used for JavaScript interactions
- Returns:
HTML string representation of the data
- Raises:
TypeError – If schema is invalid and no batches are provided
- format_str(batches: list, schema: Any, has_more: bool = False, table_uuid: str | None = None) str¶
Format record batches as a string.
This method is used by DataFrame’s __repr__ implementation and can be called directly when string rendering is needed.
- Parameters:
batches – List of Arrow RecordBatch objects
schema – Arrow Schema object
has_more – Whether there are more batches not shown
table_uuid – Unique ID for the table, used for JavaScript interactions
- Returns:
String representation of the data
- Raises:
TypeError – If schema is invalid and no batches are provided
- register_formatter(type_class: type, formatter: CellFormatter) None¶
Register a custom formatter for a specific data type.
- Parameters:
type_class – The type to register a formatter for
formatter – Function that takes a value of the given type and returns a formatted string
- set_custom_cell_builder(builder: collections.abc.Callable[[Any, int, int, str], str]) None¶
Set a custom cell builder function.
- Parameters:
builder – Function that takes (value, row, col, table_id) and returns HTML
- set_custom_header_builder(builder: collections.abc.Callable[[Any], str]) None¶
Set a custom header builder function.
- Parameters:
builder – Function that takes a field and returns HTML
- _custom_cell_builder: collections.abc.Callable[[Any, int, int, str], str] | None = None¶
- _custom_header_builder: collections.abc.Callable[[Any], str] | None = None¶
- _max_rows = None¶
- _type_formatters: dict[type, CellFormatter]¶
- custom_css = None¶
- enable_cell_expansion = True¶
- max_cell_length = 25¶
- max_height = 300¶
- max_memory_bytes = 2097152¶
- property max_rows: int¶
Get the maximum number of rows to display.
- Returns:
The maximum number of rows to display in repr output
- max_width = 1000¶
- min_rows = 10¶
- property repr_rows: int¶
Get the maximum number of rows (deprecated name).
Deprecated since version Use:
max_rowsinstead. This property is provided for backward compatibility.- Returns:
The maximum number of rows to display
- show_truncation_message = True¶
- style_provider¶
- class datafusion.dataframe_formatter.DefaultStyleProvider¶
Default implementation of StyleProvider.
- get_cell_style() str¶
Get the CSS style for table cells.
- Returns:
CSS style string
- get_header_style() str¶
Get the CSS style for header cells.
- Returns:
CSS style string
- class datafusion.dataframe_formatter.FormatterManager¶
Manager class for the global DataFrame HTML formatter instance.
- classmethod get_formatter() DataFrameHtmlFormatter¶
Get the current global DataFrame HTML formatter.
- Returns:
The global HTML formatter instance
- classmethod set_formatter(formatter: DataFrameHtmlFormatter) None¶
Set the global DataFrame HTML formatter.
- Parameters:
formatter – The formatter instance to use globally
- _default_formatter: DataFrameHtmlFormatter¶
- class datafusion.dataframe_formatter.StyleProvider¶
Bases:
ProtocolProtocol for HTML style providers.
- get_cell_style() str¶
Get the CSS style for table cells.
- get_header_style() str¶
Get the CSS style for header cells.
- datafusion.dataframe_formatter._refresh_formatter_reference() None¶
Refresh formatter reference in any modules using it.
This helps ensure that changes to the formatter are reflected in existing DataFrames that might be caching the formatter reference.
- datafusion.dataframe_formatter._validate_bool(value: Any, param_name: str) None¶
Validate that a parameter is a boolean.
- Parameters:
value – The value to validate
param_name – Name of the parameter (used in error message)
- Raises:
TypeError – If the value is not a boolean
- datafusion.dataframe_formatter._validate_formatter_parameters(max_cell_length: int, max_width: int, max_height: int, max_memory_bytes: int, min_rows: int, max_rows: int | None, repr_rows: int | None, enable_cell_expansion: bool, show_truncation_message: bool, use_shared_styles: bool, custom_css: str | None, style_provider: Any) int¶
Validate all formatter parameters and return resolved max_rows value.
- Parameters:
max_cell_length – Maximum cell length value to validate
max_width – Maximum width value to validate
max_height – Maximum height value to validate
max_memory_bytes – Maximum memory bytes value to validate
min_rows – Minimum rows to display value to validate
max_rows – Maximum rows value to validate (None means use default)
repr_rows – Deprecated repr_rows value to validate
enable_cell_expansion – Boolean expansion flag to validate
show_truncation_message – Boolean message flag to validate
use_shared_styles – Boolean styles flag to validate
custom_css – Custom CSS string to validate
style_provider – Style provider object to validate
- Returns:
The resolved max_rows value after handling repr_rows deprecation
- Raises:
ValueError – If any numeric parameter is invalid or constraints are violated
TypeError – If any parameter has invalid type
DeprecationWarning – If repr_rows parameter is used
- datafusion.dataframe_formatter._validate_positive_int(value: Any, param_name: str) None¶
Validate that a parameter is a positive integer.
- Parameters:
value – The value to validate
param_name – Name of the parameter (used in error message)
- Raises:
ValueError – If the value is not a positive integer
- datafusion.dataframe_formatter.configure_formatter(**kwargs: Any) None¶
Configure the global DataFrame HTML formatter.
This function creates a new formatter with the provided configuration and sets it as the global formatter for all DataFrames.
- Parameters:
**kwargs – Formatter configuration parameters like max_cell_length, max_width, max_height, enable_cell_expansion, etc.
- Raises:
ValueError – If any invalid parameters are provided
Example
>>> from datafusion.html_formatter import configure_formatter >>> configure_formatter( ... max_cell_length=50, ... max_height=500, ... enable_cell_expansion=True, ... use_shared_styles=True ... )
- datafusion.dataframe_formatter.get_formatter() DataFrameHtmlFormatter¶
Get the current global DataFrame HTML formatter.
This function is used by the DataFrame._repr_html_ implementation to access the shared formatter instance. It can also be used directly when custom HTML rendering is needed.
- Returns:
The global HTML formatter instance
Example
>>> from datafusion.html_formatter import get_formatter >>> formatter = get_formatter() >>> formatter.max_cell_length = 50 # Increase cell length
- datafusion.dataframe_formatter.reset_formatter() None¶
Reset the global DataFrame HTML formatter to default settings.
This function creates a new formatter with default configuration and sets it as the global formatter for all DataFrames.
Example
>>> from datafusion.html_formatter import reset_formatter >>> reset_formatter() # Reset formatter to default settings
- datafusion.dataframe_formatter.set_formatter(formatter: DataFrameHtmlFormatter) None¶
Set the global DataFrame HTML formatter.
- Parameters:
formatter – The formatter instance to use globally
Example
>>> from datafusion.html_formatter import get_formatter, set_formatter >>> custom_formatter = DataFrameHtmlFormatter(max_cell_length=100) >>> set_formatter(custom_formatter)