Uses of Class org.apache.datafusion.DataFrame (Apache DataFusion Java 0.2.0-SNAPSHOT)

Packages that use DataFrame

Package

Description

org.apache.datafusion

Uses of DataFrame in org.apache.datafusion

Methods in org.apache.datafusion that return DataFrame

Modifier and Type

Method

Description

DataFrame

DataFrame.cache()

Materialise this DataFrame into an in-memory table and return a new DataFrame that scans it.

DataFrame

DataFrame.describe()

Compute summary statistics (count, null_count, mean, std, min, max, median) over this DataFrame's columns and return them as a new DataFrame.

DataFrame

DataFrame.distinct()

Deduplicate rows across all columns.

DataFrame

DataFrame.dropColumns(String... columnNames)

Drop the named columns.

DataFrame

DataFrame.except(DataFrame other)

Rows present in this DataFrame but not in other, keeping duplicates from the receiver (SQL EXCEPT ALL).

DataFrame

DataFrame.exceptDistinct(DataFrame other)

Rows present in this DataFrame but not in other, deduplicated (SQL EXCEPT).

DataFrame

DataFrame.explain(boolean verbose, boolean analyze)

Return a new DataFrame whose rows describe the plan that would execute this DataFrame.

DataFrame

DataFrame.filter(String predicate)

Apply a SQL predicate to produce a filtered DataFrame.

DataFrame

SessionContext.fromProto(byte[] planBytes)

Decode a DataFusion-Proto LogicalPlanNode and return a lazy DataFrame.

DataFrame

SessionContext.fromSubstrait(byte[] planBytes)

Decode a Substrait Plan message and return a lazy DataFrame.

DataFrame

DataFrame.intersect(DataFrame other)

Rows present in both this DataFrame and other, keeping duplicates from the receiver (SQL INTERSECT ALL).

DataFrame

DataFrame.intersectDistinct(DataFrame other)

Rows present in both this DataFrame and other, deduplicated (SQL INTERSECT).

DataFrame

DataFrame.join(DataFrame right, JoinType type, String[] leftCols, String[] rightCols)

Equi-join this DataFrame with right on the named columns, using the given JoinType.

DataFrame

DataFrame.join(DataFrame right, JoinType type, String[] leftCols, String[] rightCols, String filter)

Equi-join this DataFrame with right, restricting the result with a residual SQL filter parsed against the combined schema (left columns followed by right columns; columns may be qualified with the relation alias when ambiguous).

DataFrame

DataFrame.joinOn(DataFrame right, JoinType type, String... predicates)

Join this DataFrame with right using arbitrary SQL predicates parsed against the combined schema.

DataFrame

DataFrame.limit(int fetch)

Take the first fetch rows.

DataFrame

DataFrame.limit(int skip, int fetch)

Skip skip rows, then take the next fetch rows.

DataFrame

SessionContext.readArrow(String path)

Read an Arrow IPC file as a DataFrame without registering it.

DataFrame

SessionContext.readArrow(String path, ArrowReadOptions options)

Read an Arrow IPC file as a DataFrame with the supplied ArrowReadOptions.

DataFrame

SessionContext.readAvro(String path)

Read an Avro file as a DataFrame without registering it.

DataFrame

SessionContext.readAvro(String path, AvroReadOptions options)

Read an Avro file as a DataFrame with the supplied AvroReadOptions.

DataFrame

SessionContext.readCsv(String path)

Read a CSV file as a DataFrame without registering it.

DataFrame

SessionContext.readCsv(String path, CsvReadOptions options)

Read a CSV file as a DataFrame with the supplied CsvReadOptions.

DataFrame

SessionContext.readJson(String path)

Read a newline-delimited JSON file as a DataFrame without registering it.

DataFrame

SessionContext.readJson(String path, NdJsonReadOptions options)

Read a newline-delimited JSON file as a DataFrame with the supplied NdJsonReadOptions.

DataFrame

SessionContext.readParquet(String path)

Read a parquet file as a DataFrame without registering it.

DataFrame

SessionContext.readParquet(String path, ParquetReadOptions options)

Read a parquet file as a DataFrame with the supplied ParquetReadOptions.

DataFrame

DataFrame.repartitionHash(int numPartitions, String... columns)

Repartition this DataFrame by hashing the named columns into numPartitions output partitions. v1 supports column-name keys only; expression keys are deferred until the Java binding gains an Expr builder.

DataFrame

DataFrame.repartitionRoundRobin(int numPartitions)

Repartition this DataFrame using a round-robin scheme across numPartitions output partitions.

DataFrame

DataFrame.select(String... columnNames)

Project the listed columns into a new DataFrame.

DataFrame

DataFrame.sort(SortExpr... exprs)

Order the rows by the supplied sort keys.

DataFrame

SessionContext.sql(String query)

Parse and plan query, returning a lazy DataFrame.

DataFrame

DataFrame.union(DataFrame other)

Concatenate this DataFrame with other by column position, keeping all duplicates (SQL UNION ALL).

DataFrame

DataFrame.unionByName(DataFrame other)

Concatenate this DataFrame with other by column name, keeping all duplicates.

DataFrame

DataFrame.unionByNameDistinct(DataFrame other)

Concatenate this DataFrame with other by column name, removing duplicates.

DataFrame

DataFrame.unionDistinct(DataFrame other)

Concatenate this DataFrame with other by column position, removing duplicates (SQL UNION DISTINCT -- equivalent to plain UNION in standard SQL).

DataFrame

DataFrame.unnestColumns(String... columns)

Expand list or struct columns into rows or fields, with default UnnestOptions (i.e.

DataFrame

DataFrame.unnestColumns(UnnestOptions options, String... columns)

Expand list or struct columns into rows or fields with the supplied UnnestOptions.

DataFrame

DataFrame.withColumn(String name, String expr)

Add a column to this DataFrame computed from a SQL expression.

DataFrame

DataFrame.withColumnRenamed(String oldName, String newName)

Rename a column.

Methods in org.apache.datafusion with parameters of type DataFrame

Modifier and Type

Method

Description

DataFrame

DataFrame.except(DataFrame other)

Rows present in this DataFrame but not in other, keeping duplicates from the receiver (SQL EXCEPT ALL).

DataFrame

DataFrame.exceptDistinct(DataFrame other)

Rows present in this DataFrame but not in other, deduplicated (SQL EXCEPT).

DataFrame

DataFrame.intersect(DataFrame other)

Rows present in both this DataFrame and other, keeping duplicates from the receiver (SQL INTERSECT ALL).

DataFrame

DataFrame.intersectDistinct(DataFrame other)

Rows present in both this DataFrame and other, deduplicated (SQL INTERSECT).

DataFrame

DataFrame.join(DataFrame right, JoinType type, String[] leftCols, String[] rightCols)

Equi-join this DataFrame with right on the named columns, using the given JoinType.

DataFrame

DataFrame.join(DataFrame right, JoinType type, String[] leftCols, String[] rightCols, String filter)

Equi-join this DataFrame with right, restricting the result with a residual SQL filter parsed against the combined schema (left columns followed by right columns; columns may be qualified with the relation alias when ambiguous).

DataFrame

DataFrame.joinOn(DataFrame right, JoinType type, String... predicates)

Join this DataFrame with right using arbitrary SQL predicates parsed against the combined schema.

DataFrame

DataFrame.union(DataFrame other)

Concatenate this DataFrame with other by column position, keeping all duplicates (SQL UNION ALL).

DataFrame

DataFrame.unionByName(DataFrame other)

Concatenate this DataFrame with other by column name, keeping all duplicates.

DataFrame

DataFrame.unionByNameDistinct(DataFrame other)

Concatenate this DataFrame with other by column name, removing duplicates.

DataFrame

DataFrame.unionDistinct(DataFrame other)

Concatenate this DataFrame with other by column position, removing duplicates (SQL UNION DISTINCT -- equivalent to plain UNION in standard SQL).

Uses of Classorg.apache.datafusion.DataFrame

Uses of DataFrame in org.apache.datafusion

Uses of Class
org.apache.datafusion.DataFrame