Output field name semantics#

This specification documents how field names in output record batches should be generated based on given user queries. The field name rules apply to DataFusion queries planned from both SQL queries and Dataframe APIs.

Field name rules#

All bare column field names MUST not contain relation/table qualifier.
- Both SELECT t1.id, SELECT id and df.select_columns(&["id"]) SHOULD result in field name: id
All compound column field names MUST contain relation/table qualifier.
- SELECT foo + bar SHOULD result in field name: table.foo PLUS table.bar
Function names MUST be converted to lowercase.
- SELECT AVG(c1) SHOULD result in field name: avg(table.c1)
Literal string MUST not be wrapped with quotes or double quotes.
- SELECT 'foo' SHOULD result in field name: foo
Operator expressions MUST be wrapped with parentheses.
- SELECT -2 SHOULD result in field name: (- 2)
Operator and operand MUST be separated by spaces.
- SELECT 1+2 SHOULD result in field name: (1 + 2)
Function arguments MUST be separated by a comma , and a space.
- SELECT f(c1,c2) and df.select(vec![f.udf("f")?.call(vec![col("c1"), col("c2")])]) SHOULD result in field name: f(table.c1, table.c2)

Appendices#

Examples and comparison with other systems#

Data schema for test sample queries:

CREATE TABLE t1 (id INT, a VARCHAR(5));
INSERT INTO t1 (id, a) VALUES (1, 'foo');
INSERT INTO t1 (id, a) VALUES (2, 'bar');

CREATE TABLE t2 (id INT, b VARCHAR(5));
INSERT INTO t2 (id, b) VALUES (1, 'hello');
INSERT INTO t2 (id, b) VALUES (2, 'world');

Projected columns#

Query:

SELECT t1.id, a, t2.id, b
FROM t1
JOIN t2 ON t1.id = t2.id

DataFusion Arrow record batches output:

id	a	id	b
1	foo	1	hello
2	bar	2	world

Spark, MySQL 8 and PostgreSQL 13 output:

id	a	id	b
1	foo	1	hello
2	bar	2	world

SQLite 3 output:

id	a	b
1	foo	hello
2	bar	world

Function transformed columns#

Query:

SELECT ABS(t1.id), abs(-id) FROM t1;

DataFusion Arrow record batches output:

abs(t1.id)	abs((- t1.id))
1	1
2	2

Spark output:

abs(id)	abs((- id))
1	1
2	2

MySQL 8 output:

ABS(t1.id)	abs(-id)
1	1
2	2

PostgreSQL 13 output:

abs	abs
1	1
2	2

SQlite 3 output:

ABS(t1.id)	abs(-id)
1	1
2	2

Function with operators#

Query:

SELECT t1.id + ABS(id), ABS(id * t1.id) FROM t1;

DataFusion Arrow record batches output:

t1.id + abs(t1.id)	abs(t1.id * t1.id)
2	1
4	4

Spark output:

id + abs(id)	abs(id * id)
2	1
4	4

MySQL 8 output:

t1.id + ABS(id)	ABS(id * t1.id)
2	1
4	4

PostgreSQL output:

?column?	abs
2	1
4	4

SQLite output:

t1.id + ABS(id)	ABS(id * t1.id)
2	1
4	4

Project literals#

Query:

SELECT 1, 2+5, 'foo_bar';

DataFusion Arrow record batches output:

1	(2 + 5)	foo_bar
1	7	foo_bar

Spark output:

1	(2 + 5)	foo_bar
1	7	foo_bar

MySQL output:

1	2+5	foo_bar
1	7	foo_bar

PostgreSQL output:

?column?	?column?	?column?
1	7	foo_bar

SQLite 3 output:

1	2+5	‘foo_bar’
1	7	foo_bar