DataFrame API¶
A DataFrame represents a logical set of rows with the same named columns, similar to a Pandas DataFrame or Spark DataFrame.
DataFrames are typically created by calling a method on SessionContext
, such
as read_csv
, and can then be modified by calling the transformation methods,
such as filter
, select
, aggregate
, and limit
to build up a query
definition.
The query can be executed by calling the collect
method.
DataFusion DataFrames use lazy evaluation, meaning that each transformation
creates a new plan but does not actually perform any immediate actions. This
approach allows for the overall plan to be optimized before execution. The plan
is evaluated (executed) when an action method is invoked, such as collect
.
See the Library Users Guide for more details.
The DataFrame API is well documented in the API reference on docs.rs.
Please refer to the Expressions Reference for more information on
building logical expressions (Expr
) to use with the DataFrame API.
Example¶
The DataFrame struct is part of DataFusion’s prelude
and can be imported with
the following statement.
use datafusion::prelude::*;
Here is a minimal example showing the execution of a query using the DataFrame API.
use datafusion::prelude::*;
use datafusion::error::Result;
use datafusion::functions_aggregate::expr_fn::min;
#[tokio::main]
async fn main() -> Result<()> {
let ctx = SessionContext::new();
let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
let df = df.filter(col("a").lt_eq(col("b")))?
.aggregate(vec![col("a")], vec![min(col("b"))])?
.limit(0, Some(100))?;
// Print results
df.show().await?;
Ok(())
}