CSVΒΆ
Reading a csv is very straightforward with read_csv()
from datafusion import SessionContext
ctx = SessionContext()
df = ctx.read_csv("file.csv")
An alternative is to use register_csv()
ctx.register_csv("file", "file.csv")
df = ctx.table("file")
If you require additional control over how to read the CSV file, you can use
CsvReadOptions to set a variety of options.
from datafusion import CsvReadOptions
options = (
CsvReadOptions()
.with_has_header(True) # File contains a header row
.with_delimiter(";") # Use ; as the delimiter instead of ,
.with_comment("#") # Skip lines starting with #
.with_escape("\\") # Escape character
.with_null_regex(r"^(null|NULL|N/A)$") # Treat these as NULL
.with_truncated_rows(True) # Allow rows to have incomplete columns
.with_file_compression_type("gzip") # Read gzipped CSV
.with_file_extension(".gz") # File extension other than .csv
)
df = ctx.read_csv("data.csv.gz", options=options)
Details for all CSV reading options can be found on the DataFusion documentation site.