Python bindings¶
This is the documentation of the Python API of Apache Arrow. For more details on the Arrow format and other language bindings see the parent documentation.
The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
Here will we detail the usage of the Python API for Arrow and the leaf libraries that add additional functionality such as reading Apache Parquet files into Arrow structures.
- Installing PyArrow
- Memory and IO Interfaces
- Data Types and In-Memory Data Model
- Compute Functions
- Streaming, Serialization, and IPC
- Filesystem Interface
- Filesystem Interface (legacy)
- The Plasma In-Memory Object Store
- NumPy Integration
- Pandas Integration
- Timestamps
- Reading CSV files
- Feather File Format
- Reading JSON files
- Reading and Writing the Apache Parquet Format
- Obtaining pyarrow with Parquet Support
- Reading and Writing Single Files
- Finer-grained Reading and Writing
- Inspecting the Parquet File Metadata
- Data Type Handling
- Compression, Encoding, and File Compatibility
- Partitioned Datasets (Multiple Files)
- Writing to Partitioned Datasets
- Reading from Partitioned Datasets
- Using with Spark
- Multithreaded Reads
- Reading a Parquet File from Azure Blob storage
- Tabular Datasets
- CUDA Integration
- Extending pyarrow
- Using pyarrow from C++ and Cython Code
- API Reference
- Getting Involved
- Benchmarks