Apache Arrow¶
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.
The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:
Zero-copy shared memory and RPC-based data movement
Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)
In-memory analytics and query processing
Specifications and Protocols¶
Libraries¶
- Implementation Status
 - C/GLib
 - C++
 - C#
 - Go
 - Java
 - JavaScript
 - Julia
 - MATLAB
 - Python
- Installing PyArrow
 - Memory and IO Interfaces
 - Data Types and In-Memory Data Model
 - Compute Functions
 - Streaming, Serialization, and IPC
 - Filesystem Interface
 - Filesystem Interface (legacy)
 - The Plasma In-Memory Object Store
 - NumPy Integration
 - Pandas Integration
 - Timestamps
 - Reading CSV files
 - Feather File Format
 - Reading JSON files
 - Reading and Writing the Apache Parquet Format
 - Tabular Datasets
 - CUDA Integration
 - Extending pyarrow
 - Using pyarrow from C++ and Cython Code
 - API Reference
 - Getting Involved
 - Benchmarks
 
 - R
 - Ruby
 - Rust