NumPy Integration¶
PyArrow allows converting back and forth from NumPy arrays to Arrow Arrays.
NumPy to Arrow¶
To convert a NumPy array to Arrow, one can simply call the pyarrow.array()
factory function.
>>> import numpy as np
>>> import pyarrow as pa
>>> data = np.arange(10, dtype='int16')
>>> arr = pa.array(data)
>>> arr
<pyarrow.lib.Int16Array object at 0x7fb1d1e6ae58>
[
0,
1,
2,
3,
4,
5,
6,
7,
8,
9
]
Converting from NumPy supports a wide range of input dtypes, including structured dtypes or strings.
Arrow to NumPy¶
In the reverse direction, it is possible to produce a view of an Arrow Array
for use with NumPy using the to_numpy()
method.
This is limited to primitive types for which NumPy has the same physical
representation as Arrow, and assuming the Arrow data has no nulls.
>>> import numpy as np
>>> import pyarrow as pa
>>> arr = pa.array([4, 5, 6], type=pa.int32())
>>> view = arr.to_numpy()
>>> view
array([4, 5, 6], dtype=int32)
For more complex data types, you have to use the to_pandas()
method (which will construct a Numpy array with Pandas semantics for, e.g.,
representation of null values).