# SELECT syntax The queries in DataFusion scan data from tables and return 0 or more rows. Please be aware that column names in queries are made lower-case, but not on the inferred schema. Accordingly, if you want to query against a capitalized field, make sure to use double quotes. Please see this [example](https://datafusion.apache.org/user-guide/example-usage.html) for clarification. In this documentation we describe the SQL syntax in DataFusion. DataFusion supports the following syntax for queries: [ [WITH](#with-clause) with_query [, ...] ]
[SELECT](#select-clause) [ ALL | DISTINCT ] select_expr [, ...]
[ [FROM](#from-clause) from_item [, ...] ]
[ [JOIN](#join-clause) join_item [, ...] ]
[ [WHERE](#where-clause) condition ]
[ [GROUP BY](#group-by-clause) grouping_element [, ...] ]
[ [HAVING](#having-clause) condition]
[ [QUALIFY](#qualify-clause) condition]
[ [UNION](#union-clause) [ ALL | select ]
[ [ORDER BY](#order-by-clause) expression [ ASC | DESC ][, ...] ]
[ [LIMIT](#limit-clause) count ]
[ [EXCLUDE | EXCEPT](#exclude-and-except-clause) ]
[Pipe operators](#pipe-operators)
## WITH clause A with clause allows to give names for queries and reference them by name. ```sql WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a) SELECT a, b FROM x; ``` ## SELECT clause Example: ```sql SELECT a, b, a + b FROM table ``` The `DISTINCT` quantifier can be added to make the query return all distinct rows. By default `ALL` will be used, which returns all the rows. ```sql SELECT DISTINCT person, age FROM employees ``` ## FROM clause Example: ```sql SELECT t.a FROM table AS t ``` ## WHERE clause Example: ```sql SELECT a FROM table WHERE a > 10 ``` ## JOIN clause DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, `FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`, `RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`. The following examples are based on this table: ```sql select * from x; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ ``` ### INNER JOIN The keywords `JOIN` or `INNER JOIN` define a join that only shows rows where there is a match in both tables. ```sql SELECT * FROM x INNER JOIN x y ON x.column_1 = y.column_1; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | 1 | 2 | +----------+----------+----------+----------+ ``` ### LEFT OUTER JOIN The keywords `LEFT JOIN` or `LEFT OUTER JOIN` define a join that includes all rows from the left table even if there is not a match in the right table. When there is no match, null values are produced for the right side of the join. ```sql SELECT * FROM x LEFT JOIN x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | | | +----------+----------+----------+----------+ ``` ### RIGHT OUTER JOIN The keywords `RIGHT JOIN` or `RIGHT OUTER JOIN` define a join that includes all rows from the right table even if there is not a match in the left table. When there is no match, null values are produced for the left side of the join. ```sql SELECT * FROM x RIGHT JOIN x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | | | 1 | 2 | +----------+----------+----------+----------+ ``` ### FULL OUTER JOIN The keywords `FULL JOIN` or `FULL OUTER JOIN` define a join that is effectively a union of a `LEFT OUTER JOIN` and `RIGHT OUTER JOIN`. It will show all rows from the left and right side of the join and will produce null values on either side of the join where there is not a match. ```sql SELECT * FROM x FULL OUTER JOIN x y ON x.column_1 = y.column_2; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | | | | | | 1 | 2 | +----------+----------+----------+----------+ ``` ### NATURAL JOIN A `NATURAL JOIN` defines an inner join based on common column names found between the input tables. When no common column names are found, it behaves like a `CROSS JOIN`. ```sql SELECT * FROM x NATURAL JOIN x y; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ ``` ### CROSS JOIN A `CROSS JOIN` produces a cartesian product that matches every row in the left side of the join with every row in the right side of the join. ```sql SELECT * FROM x CROSS JOIN x y; +----------+----------+----------+----------+ | column_1 | column_2 | column_1 | column_2 | +----------+----------+----------+----------+ | 1 | 2 | 1 | 2 | +----------+----------+----------+----------+ ``` ### LEFT SEMI JOIN The `LEFT SEMI JOIN` returns all rows from the left table that have at least one matching row in the right table, and projects only the columns from the left table. ```sql SELECT * FROM x LEFT SEMI JOIN x y ON x.column_1 = y.column_1; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ ``` ### RIGHT SEMI JOIN The `RIGHT SEMI JOIN` returns all rows from the right table that have at least one matching row in the left table, and only projects the columns from the right table. ```sql SELECT * FROM x RIGHT SEMI JOIN x y ON x.column_1 = y.column_1; +----------+----------+ | column_1 | column_2 | +----------+----------+ | 1 | 2 | +----------+----------+ ``` ### LEFT ANTI JOIN The `LEFT ANTI JOIN` returns all rows from the left table that do not have any matching row in the right table, projecting only the left table’s columns. ```sql SELECT * FROM x LEFT ANTI JOIN x y ON x.column_1 = y.column_1; +----------+----------+ | column_1 | column_2 | +----------+----------+ +----------+----------+ ``` ### RIGHT ANTI JOIN The `RIGHT ANTI JOIN` returns all rows from the right table that do not have any matching row in the left table, projecting only the right table’s columns. ```sql SELECT * FROM x RIGHT ANTI JOIN x y ON x.column_1 = y.column_1; +----------+----------+ | column_1 | column_2 | +----------+----------+ +----------+----------+ ``` ## GROUP BY clause Example: ```sql SELECT a, b, MAX(c) FROM table GROUP BY a, b ``` Some aggregation functions accept optional ordering requirement, such as `ARRAY_AGG`. If a requirement is given, aggregation is calculated in the order of the requirement. Example: ```sql SELECT a, b, ARRAY_AGG(c, ORDER BY d) FROM table GROUP BY a, b ``` ## HAVING clause Example: ```sql SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10 ``` ## QUALIFY clause Example: ```sql SELECT ROW_NUMBER() OVER (PARTITION BY region) AS rk FROM table QUALIFY rk > 1; ``` ## UNION clause Example: ```sql SELECT a, b, c FROM table1 UNION ALL SELECT a, b, c FROM table2 ``` ## ORDER BY clause Orders the results by the referenced expression. By default it uses ascending order (`ASC`). This order can be changed to descending by adding `DESC` after the order-by expressions. Examples: ```sql SELECT age, person FROM table ORDER BY age; SELECT age, person FROM table ORDER BY age DESC; SELECT age, person FROM table ORDER BY age, person DESC; ``` ## LIMIT clause Limits the number of rows to be a maximum of `count` rows. `count` should be a non-negative integer. Example: ```sql SELECT age, person FROM table LIMIT 10 ``` ## EXCLUDE and EXCEPT clause Excluded named columns from query results. Example selecting all columns except for `age` and `person`: ```sql SELECT * EXCEPT(age, person) FROM table; ``` ```sql SELECT * EXCLUDE(age, person) FROM table; ``` ## Pipe operators Some SQL dialects (e.g. BigQuery) support the pipe operator `|>`. The SQL dialect can be set like this: ```sql set datafusion.sql_parser.dialect = 'BigQuery'; ``` DataFusion currently supports the following pipe operators: - [WHERE](#pipe_where) - [ORDER BY](#pipe_order_by) - [LIMIT](#pipe_limit) - [SELECT](#pipe_select) - [EXTEND](#pipe_extend) - [AS](#pipe_as) - [UNION](#pipe_union) - [INTERSECT](#pipe_intersect) - [EXCEPT](#pipe_except) - [AGGREGATE](#pipe_aggregate) - [JOIN](#pipe_join) (pipe_where)= ### WHERE ```sql select * from range(0,10) |> where value < 2; +-------+ | value | +-------+ | 0 | | 1 | +-------+ ``` (pipe_order_by)= ### ORDER BY ```sql select * from range(0,3) |> order by value desc; +-------+ | value | +-------+ | 2 | | 1 | | 0 | +-------+ ``` (pipe_limit)= ### LIMIT ```sql select * from range(0,3) |> order by value desc |> limit 1; +-------+ | value | +-------+ | 2 | +-------+ ``` (pipe_select)= ### SELECT ```sql select * from range(0,3) |> select value + 10; +---------------------------+ | range().value + Int64(10) | +---------------------------+ | 10 | | 11 | | 12 | +---------------------------+ ``` (pipe_extend)= ### EXTEND ```sql select * from range(0,3) |> extend -value AS minus_value; +-------+-------------+ | value | minus_value | +-------+-------------+ | 0 | 0 | | 1 | -1 | | 2 | -2 | +-------+-------------+ ``` (pipe_as)= ### AS ```sql select * from range(0,3) |> as my_range |> SELECT my_range.value; +-------+ | value | +-------+ | 0 | | 1 | | 2 | +-------+ ``` (pipe_union)= ### UNION ```sql select * from range(0,3) |> union all ( select * from range(3,6) ); +-------+ | value | +-------+ | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | +-------+ ``` (pipe_intersect)= ### INTERSECT ```sql select * from range(0,100) |> INTERSECT DISTINCT ( select 3 ); +-------+ | value | +-------+ | 3 | +-------+ ``` (pipe_except)= ### EXCEPT ```sql select * from range(0,10) |> EXCEPT DISTINCT (select * from range(5,10)); +-------+ | value | +-------+ | 0 | | 1 | | 2 | | 3 | | 4 | +-------+ ``` (pipe_aggregate)= ### AGGREGATE ```sql select * from range(0,3) |> aggregate sum(value) AS total; +-------+ | total | +-------+ | 3 | +-------+ ``` (pipe_join)= ### JOIN ```sql ( SELECT 'apples' AS item, 2 AS sales UNION ALL SELECT 'bananas' AS item, 5 AS sales ) |> AS produce_sales |> LEFT JOIN ( SELECT 'apples' AS item, 123 AS id ) AS produce_data ON produce_sales.item = produce_data.item |> SELECT produce_sales.item, sales, id; +--------+-------+------+ | item | sales | id | +--------+-------+------+ | apples | 2 | 123 | | bananas| 5 | NULL | +--------+-------+------+ ```