# Example Usage In this example some simple processing is performed on the [`example.csv`](https://github.com/apache/datafusion/blob/main/datafusion/core/tests/data/example.csv) file. Even [`more code examples`](https://github.com/apache/datafusion/tree/main/datafusion-examples) attached to the project. ## Add published DataFusion dependency Find latest available Datafusion version on [DataFusion's crates.io] page. Add the dependency to your `Cargo.toml` file: ```toml datafusion = "latest_version" tokio = { version = "1.0", features = ["rt-multi-thread"] } ``` ## Run a SQL query against data stored in a CSV ```rust use datafusion::prelude::*; #[tokio::main] async fn main() -> datafusion::error::Result<()> { // register the table let ctx = SessionContext::new(); ctx.register_csv("example", "tests/data/example.csv", CsvReadOptions::new()).await?; // create a plan to run a SQL query let df = ctx.sql("SELECT a, MIN(b) FROM example WHERE a <= b GROUP BY a LIMIT 100").await?; // execute and print results df.show().await?; Ok(()) } ``` See [the SQL API](../library-user-guide/using-the-sql-api.md) section of the library guide for more information on the SQL API. ## Use the DataFrame API to process data stored in a CSV ```rust use datafusion::prelude::*; use datafusion::functions_aggregate::expr_fn::min; #[tokio::main] async fn main() -> datafusion::error::Result<()> { // create the dataframe let ctx = SessionContext::new(); let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?; let df = df.filter(col("a").lt_eq(col("b")))? .aggregate(vec![col("a")], vec![min(col("b"))])? .limit(0, Some(100))?; // execute and print results df.show().await?; Ok(()) } ``` ## Output from both examples ```text +---+--------+ | a | MIN(b) | +---+--------+ | 1 | 2 | +---+--------+ ``` ## Arrow Versions Many of DataFusion's public APIs use types from the [`arrow`] and [`parquet`] crates, so if you use `arrow` in your project, the `arrow` version must match that used by DataFusion. You can check the required version on [DataFusion's crates.io] page. The easiest way to ensure the versions match is to use the `arrow` exported by DataFusion, for example: ```rust use datafusion::arrow::datatypes::Schema; ``` For example, [DataFusion `25.0.0` dependencies] require `arrow` `39.0.0`. If instead you used `arrow` `40.0.0` in your project you may see errors such as: ```text mismatched types [E0308] expected `Schema`, found `arrow_schema::Schema` Note: `arrow_schema::Schema` and `Schema` have similar names, but are actually distinct types Note: `arrow_schema::Schema` is defined in crate `arrow_schema` Note: `Schema` is defined in crate `arrow_schema` Note: perhaps two different versions of crate `arrow_schema` are being used? Note: associated function defined here ``` Or calling `downcast_ref` on an `ArrayRef` may return `None` unexpectedly. [`arrow`]: https://docs.rs/arrow/latest/arrow/ [`parquet`]: https://docs.rs/parquet/latest/parquet/ [datafusion's crates.io]: https://crates.io/crates/datafusion [datafusion `26.0.0` dependencies]: https://crates.io/crates/datafusion/26.0.0/dependencies ## Identifiers and Capitalization Please be aware that all identifiers are effectively made lower-case in SQL, so if your csv file has capital letters (ex: `Name`) you must put your column name in double quotes or the examples won't work. To illustrate this behavior, consider the [`capitalized_example.csv`](../../../datafusion/core/tests/data/capitalized_example.csv) file: ## Run a SQL query against data stored in a CSV: ```rust use datafusion::prelude::*; #[tokio::main] async fn main() -> datafusion::error::Result<()> { // register the table let ctx = SessionContext::new(); ctx.register_csv("example", "tests/data/capitalized_example.csv", CsvReadOptions::new()).await?; // create a plan to run a SQL query let df = ctx.sql("SELECT \"A\", MIN(b) FROM example WHERE \"A\" <= c GROUP BY \"A\" LIMIT 100").await?; // execute and print results df.show().await?; Ok(()) } ``` ## Use the DataFrame API to process data stored in a CSV: ```rust use datafusion::prelude::*; use datafusion::functions_aggregate::expr_fn::min; #[tokio::main] async fn main() -> datafusion::error::Result<()> { // create the dataframe let ctx = SessionContext::new(); let df = ctx.read_csv("tests/data/capitalized_example.csv", CsvReadOptions::new()).await?; let df = df // col will parse the input string, hence requiring double quotes to maintain the capitalization .filter(col("\"A\"").lt_eq(col("c")))? // alternatively use ident to pass in an unqualified column name directly without parsing .aggregate(vec![ident("A")], vec![min(col("b"))])? .limit(0, Some(100))?; // execute and print results df.show().await?; Ok(()) } ``` ## Output from both examples ```text +---+--------+ | A | MIN(b) | +---+--------+ | 2 | 1 | | 1 | 2 | +---+--------+ ```