Introduction

Welcome to the User Guide for the Python bindings of Arrow DataFusion. This guide aims to provide an introduction to DataFusion through various examples and highlight the most effective ways of using it.

Installation

DataFusion is a Python library and, as such, can be installed via pip from PyPI.

pip install datafusion

You can verify the installation by running:

In [1]: import datafusion

In [2]: datafusion.__version__
Out[2]: '44.0.0'

In this documentation we will also show some examples for how DataFusion integrates with Jupyter notebooks. To install and start a Jupyter labs session use

pip install jupyterlab
jupyter lab

To demonstrate working with DataFusion, we need a data source. Later in the tutorial we will show options for data sources. For our first example, we demonstrate using a Pokemon dataset that you can download here.

With that file in place you can use the following python example to view the DataFrame in DataFusion.

In [3]: from datafusion import SessionContext

In [4]: ctx = SessionContext()

In [5]: df = ctx.read_csv("pokemon.csv")

In [6]: df.show()
DataFrame()
+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+
| #  | Name                      | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary |
+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+
| 1  | Bulbasaur                 | Grass  | Poison | 318   | 45 | 49     | 49      | 65      | 65      | 45    | 1          | false     |
| 2  | Ivysaur                   | Grass  | Poison | 405   | 60 | 62     | 63      | 80      | 80      | 60    | 1          | false     |
| 3  | Venusaur                  | Grass  | Poison | 525   | 80 | 82     | 83      | 100     | 100     | 80    | 1          | false     |
| 3  | VenusaurMega Venusaur     | Grass  | Poison | 625   | 80 | 100    | 123     | 122     | 120     | 80    | 1          | false     |
| 4  | Charmander                | Fire   |        | 309   | 39 | 52     | 43      | 60      | 50      | 65    | 1          | false     |
| 5  | Charmeleon                | Fire   |        | 405   | 58 | 64     | 58      | 80      | 65      | 80    | 1          | false     |
| 6  | Charizard                 | Fire   | Flying | 534   | 78 | 84     | 78      | 109     | 85      | 100   | 1          | false     |
| 6  | CharizardMega Charizard X | Fire   | Dragon | 634   | 78 | 130    | 111     | 130     | 85      | 100   | 1          | false     |
| 6  | CharizardMega Charizard Y | Fire   | Flying | 634   | 78 | 104    | 78      | 159     | 115     | 100   | 1          | false     |
| 7  | Squirtle                  | Water  |        | 314   | 44 | 48     | 65      | 50      | 64      | 43    | 1          | false     |
| 8  | Wartortle                 | Water  |        | 405   | 59 | 63     | 80      | 65      | 80      | 58    | 1          | false     |
| 9  | Blastoise                 | Water  |        | 530   | 79 | 83     | 100     | 85      | 105     | 78    | 1          | false     |
| 9  | BlastoiseMega Blastoise   | Water  |        | 630   | 79 | 103    | 120     | 135     | 115     | 78    | 1          | false     |
| 10 | Caterpie                  | Bug    |        | 195   | 45 | 30     | 35      | 20      | 20      | 45    | 1          | false     |
| 11 | Metapod                   | Bug    |        | 205   | 50 | 20     | 55      | 25      | 25      | 30    | 1          | false     |
| 12 | Butterfree                | Bug    | Flying | 395   | 60 | 45     | 50      | 90      | 80      | 70    | 1          | false     |
| 13 | Weedle                    | Bug    | Poison | 195   | 40 | 35     | 30      | 20      | 20      | 50    | 1          | false     |
| 14 | Kakuna                    | Bug    | Poison | 205   | 45 | 25     | 50      | 25      | 25      | 35    | 1          | false     |
| 15 | Beedrill                  | Bug    | Poison | 395   | 65 | 90     | 40      | 45      | 80      | 75    | 1          | false     |
| 15 | BeedrillMega Beedrill     | Bug    | Poison | 495   | 65 | 150    | 40      | 15      | 80      | 145   | 1          | false     |
+----+---------------------------+--------+--------+-------+----+--------+---------+---------+---------+-------+------------+-----------+

If you are working in a Jupyter notebook, you can also use the following to give you a table display that may be easier to read.

display(df)
Rendered table showing Pokemon DataFrame