Supported Spark Data Sources

Parquet

When spark.comet.scan.enabled is enabled, Parquet scans will be performed natively by Comet if all data types in the schema are supported. When this option is not enabled, the scan will fall back to Spark. In this case, enabling spark.comet.convert.parquet.enabled will immediately convert the data into Arrow format, allowing native execution to happen after that, but the process may not be efficient.

CSV

Comet does not provide native CSV scan, but when spark.comet.convert.csv.enabled is enabled, data is immediately converted into Arrow format, allowing native execution to happen after that.

JSON

Comet does not provide native JSON scan, but when spark.comet.convert.json.enabled is enabled, data is immediately converted into Arrow format, allowing native execution to happen after that.