Basic OperationsΒΆ
In this section, you will learn how to display essential details of DataFrames using specific functions.
In [1]: from datafusion import SessionContext
In [2]: import random
In [3]: ctx = SessionContext()
In [4]: df = ctx.from_pydict({
...: "nrs": [1, 2, 3, 4, 5],
...: "names": ["python", "ruby", "java", "haskell", "go"],
...: "random": random.sample(range(1000), 5),
...: "groups": ["A", "A", "B", "C", "B"],
...: })
...:
In [5]: df
Out[5]:
DataFrame()
+-----+---------+--------+--------+
| nrs | names | random | groups |
+-----+---------+--------+--------+
| 1 | python | 889 | A |
| 2 | ruby | 991 | A |
| 3 | java | 672 | B |
| 4 | haskell | 14 | C |
| 5 | go | 713 | B |
+-----+---------+--------+--------+
Use limit()
to view the top rows of the frame:
In [6]: df.limit(2)
Out[6]:
DataFrame()
+-----+--------+--------+--------+
| nrs | names | random | groups |
+-----+--------+--------+--------+
| 1 | python | 889 | A |
| 2 | ruby | 991 | A |
+-----+--------+--------+--------+
Display the columns of the DataFrame using schema()
:
In [7]: df.schema()
Out[7]:
nrs: int64
names: string
random: int64
groups: string
The method to_pandas()
uses pyarrow to convert to pandas DataFrame, by collecting the batches,
passing them to an Arrow table, and then converting them to a pandas DataFrame.
In [8]: df.to_pandas()
Out[8]:
nrs names random groups
0 1 python 889 A
1 2 ruby 991 A
2 3 java 672 B
3 4 haskell 14 C
4 5 go 713 B
describe()
shows a quick statistic summary of your data:
In [9]: df.describe()
Out[9]:
DataFrame()
+------------+--------------------+-------+--------------------+--------+
| describe | nrs | names | random | groups |
+------------+--------------------+-------+--------------------+--------+
| count | 5.0 | 5 | 5.0 | 5 |
| null_count | 0.0 | 0 | 0.0 | 0 |
| mean | 3.0 | null | 655.8 | null |
| std | 1.5811388300841898 | null | 381.50452159836846 | null |
| min | 1.0 | go | 14.0 | A |
| max | 5.0 | ruby | 991.0 | C |
| median | 3.0 | null | 713.0 | null |
+------------+--------------------+-------+--------------------+--------+