Basic OperationsΒΆ
In this section, you will learn how to display essential details of DataFrames using specific functions.
In [1]: from datafusion import SessionContext
In [2]: import random
In [3]: ctx = SessionContext()
In [4]: df = ctx.from_pydict({
...: "nrs": [1, 2, 3, 4, 5],
...: "names": ["python", "ruby", "java", "haskell", "go"],
...: "random": random.sample(range(1000), 5),
...: "groups": ["A", "A", "B", "C", "B"],
...: })
...:
In [5]: df
Out[5]:
DataFrame()
+-----+---------+--------+--------+
| nrs | names | random | groups |
+-----+---------+--------+--------+
| 1 | python | 97 | A |
| 2 | ruby | 46 | A |
| 3 | java | 344 | B |
| 4 | haskell | 855 | C |
| 5 | go | 11 | B |
+-----+---------+--------+--------+
Use limit()
to view the top rows of the frame:
In [6]: df.limit(2)
Out[6]:
DataFrame()
+-----+--------+--------+--------+
| nrs | names | random | groups |
+-----+--------+--------+--------+
| 1 | python | 97 | A |
| 2 | ruby | 46 | A |
+-----+--------+--------+--------+
Display the columns of the DataFrame using schema()
:
In [7]: df.schema()
Out[7]:
nrs: int64
names: string
random: int64
groups: string
The method to_pandas()
uses pyarrow to convert to pandas DataFrame, by collecting the batches,
passing them to an Arrow table, and then converting them to a pandas DataFrame.
In [8]: df.to_pandas()
Out[8]:
nrs names random groups
0 1 python 97 A
1 2 ruby 46 A
2 3 java 344 B
3 4 haskell 855 C
4 5 go 11 B
describe()
shows a quick statistic summary of your data:
In [9]: df.describe()
Out[9]:
DataFrame()
+------------+--------------------+-------+--------------------+--------+
| describe | nrs | names | random | groups |
+------------+--------------------+-------+--------------------+--------+
| count | 5.0 | 5 | 5.0 | 5 |
| null_count | 0.0 | 0 | 0.0 | 0 |
| mean | 3.0 | null | 270.6 | null |
| std | 1.5811388300841898 | null | 351.74038721761826 | null |
| min | 1.0 | go | 11.0 | A |
| max | 5.0 | ruby | 855.0 | C |
| median | 3.0 | null | 97.0 | null |
+------------+--------------------+-------+--------------------+--------+