# DataFrame and SQL DataFusion Java supports two query interfaces: SQL strings via `SessionContext.sql(String)`, and a programmatic DataFrame API. ## SQL ```java try (DataFrame df = ctx.sql("SELECT a, b FROM t WHERE a > 10")) { df.show(); } ``` `sql(String)` plans the query and returns a `DataFrame`. Execution does not start until you pull results. ## DataFrame transformations The DataFrame API exposes `select`, `filter`, `limit`, `distinct`, `dropColumns`, and `withColumnRenamed`. ```java try (DataFrame df = ctx.readParquet("/path/to/orders.parquet")) { try (DataFrame filtered = df.filter("o_orderpriority = '1-URGENT'")) { filtered.show(); } } ``` Each transformation returns a new `DataFrame` that must be closed. ## Pulling results Three patterns are available: **Stream as Arrow batches.** Use `collect(allocator)` to pull the result set as Arrow record batches via the [Arrow C Data Interface]: ```java try (DataFrame df = ctx.sql("SELECT ..."); ArrowReader reader = df.collect(allocator)) { while (reader.loadNextBatch()) { var batch = reader.getVectorSchemaRoot(); // process batch... } } ``` [Arrow C Data Interface]: https://arrow.apache.org/docs/format/CDataInterface.html **Count rows.** `df.count()` returns the row count without materializing the rows in the JVM. **Print for inspection.** `df.show()` and `df.show(int n)` print results to standard output. Useful for exploration; not appropriate for production code paths. ## Schema introspection To get the schema of a registered table without running a query: ```java org.apache.arrow.vector.types.pojo.Schema schema = ctx.tableSchema("orders"); ``` ## Plan input A DataFusion logical plan can be deserialized from `datafusion-proto` bytes via `SessionContext.fromProto(byte[])`. The `datafusion-proto` Java classes are generated by the Maven build. This is useful for accepting plans produced by other DataFusion-aware tooling.