Usage

See the current usage using datafusion-cli --help:

Apache Arrow <dev@arrow.apache.org>
Command Line Client for DataFusion query engine.

USAGE:
    datafusion-cli [OPTIONS]

OPTIONS:
    -b, --batch-size <BATCH_SIZE>
            The batch size of each query, or use DataFusion default

    -c, --command <COMMAND>...
            Execute the given command string(s), then exit

        --color
            Enables console syntax highlighting

    -f, --file <FILE>...
            Execute commands from file(s), then exit

        --format <FORMAT>
            [default: table] [possible values: csv, tsv, table, json, nd-json]

    -h, --help
            Print help information

    -m, --memory-limit <MEMORY_LIMIT>
            The memory pool limitation (e.g. '10g'), default to None (no limit)

        --maxrows <MAXROWS>
            The max number of rows to display for 'Table' format
            [possible values: numbers(0/10/...), inf(no limit)] [default: 40]

        --mem-pool-type <MEM_POOL_TYPE>
            Specify the memory pool type 'greedy' or 'fair', default to 'greedy'

        --top-memory-consumers <TOP_MEMORY_CONSUMERS>
            The number of top memory consumers to display when query fails due to memory exhaustion. To disable memory consumer tracking, set this value to 0 [default: 3]

    -d, --disk-limit <DISK_LIMIT>
            Available disk space for spilling queries (e.g. '10g'), default to None (uses DataFusion's default value of '100g')

    -p, --data-path <DATA_PATH>
            Path to your data, default to current directory

    -q, --quiet
            Reduce printing other than the results and work quietly

    -r, --rc <RC>...
            Run the provided files on startup instead of ~/.datafusionrc

    -V, --version
            Print version information

Commands

Available commands inside DataFusion CLI are:

  • Quit

> \q
  • Help

> \?
  • ListTables

> \d
  • DescribeTable

> \d table_name
  • QuietMode

> \quiet [true|false]
  • list function

> \h
  • Search and describe function

> \h function

Supported SQL

In addition to the normal SQL supported in DataFusion, datafusion-cli also supports additional statements and commands:

SHOW ALL [VERBOSE]

Show configuration options

> show all;

+-------------------------------------------------+---------+
| name                                            | value   |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size                 | 8192    |
| datafusion.execution.coalesce_batches           | true    |
| datafusion.execution.time_zone                  | UTC     |
| datafusion.explain.logical_plan_only            | false   |
| datafusion.explain.physical_plan_only           | false   |
| datafusion.optimizer.filter_null_join_keys      | false   |
| datafusion.optimizer.skip_failed_rules          | true    |
+-------------------------------------------------+---------+

SHOW <OPTION>>

Show specific configuration option

> show datafusion.execution.batch_size;

+-------------------------------------------------+---------+
| name                                            | value   |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size                 | 8192    |
+-------------------------------------------------+---------+

SET <OPTION> TO <VALUE>

  • Set configuration options

> SET datafusion.execution.batch_size to 1024;

Configuration Options

All available configuration options can be seen using SHOW ALL as described above.

You can change the configuration options using environment variables. datafusion-cli looks in the corresponding environment variable with an upper case name and all . converted to _.

For example, to set datafusion.execution.batch_size to 1024 you would set the DATAFUSION_EXECUTION_BATCH_SIZE environment variable appropriately:

$ DATAFUSION_EXECUTION_BATCH_SIZE=1024 datafusion-cli
DataFusion CLI v12.0.0
> show all;
+-------------------------------------------------+---------+
| name                                            | value   |
+-------------------------------------------------+---------+
| datafusion.execution.batch_size                 | 1024    |
| datafusion.execution.coalesce_batches           | true    |
| datafusion.execution.time_zone                  | UTC     |
| datafusion.explain.logical_plan_only            | false   |
| datafusion.explain.physical_plan_only           | false   |
| datafusion.optimizer.filter_null_join_keys      | false   |
| datafusion.optimizer.skip_failed_rules          | true    |
+-------------------------------------------------+---------+
8 rows in set. Query took 0.002 seconds.

You can change the configuration options using SET statement as well

$ datafusion-cli
DataFusion CLI v13.0.0
> show datafusion.execution.batch_size;
+---------------------------------+---------+
| name                            | value   |
+---------------------------------+---------+
| datafusion.execution.batch_size | 8192    |
+---------------------------------+---------+
1 row in set. Query took 0.011 seconds.

> set datafusion.execution.batch_size to 1024;
0 rows in set. Query took 0.000 seconds.

> show datafusion.execution.batch_size;
+---------------------------------+---------+
| name                            | value   |
+---------------------------------+---------+
| datafusion.execution.batch_size | 1024    |
+---------------------------------+---------+
1 row in set. Query took 0.005 seconds.

Functions

datafusion-cli comes with build-in functions that are not included in the DataFusion SQL engine, see DataFusion CLI specific functions section for details.