Reading JSON files¶
Arrow allows reading line-separated JSON files as Arrow tables. Each independent JSON object in the input file is converted to a row in the target Arrow table.
See also
Basic usage¶
A JSON file is read from a InputStream.
#include "arrow/json/api.h"
{
   // ...
   arrow::Status st;
   arrow::MemoryPool* pool = default_memory_pool();
   std::shared_ptr<arrow::io::InputStream> input = ...;
   auto read_options = arrow::json::ReadOptions::Defaults();
   auto parse_options = arrow::json::ParseOptions::Defaults();
   // Instantiate TableReader from input stream and options
   std::shared_ptr<arrow::json::TableReader> reader;
   st = arrow::json::TableReader::Make(pool, input, read_options,
                                       parse_options, &reader);
   if (!st.ok()) {
      // Handle TableReader instantiation error...
   }
   std::shared_ptr<arrow::Table> table;
   // Read table from JSON file
   st = reader->Read(&table);
   if (!st.ok()) {
      // Handle JSON read error
      // (for example a JSON syntax error or failed type conversion)
   }
}
Data types¶
Since JSON values are typed, the possible Arrow data types on output depend on the input value types. Top-level JSON values should always be objects. The fields of top-level objects are taken to represent columns in the Arrow data. For each name/value pair in a JSON object, there are two possible modes of deciding the output data type:
if the name is in
ConvertOptions::explicit_schema, conversion of the JSON value to the corresponding Arrow data type is attempted;otherwise, the Arrow data type is determined via type inference on the JSON value, trying out a number of Arrow data types in order.
The following tables show the possible combinations for each of those two modes.
JSON value type  | 
Allowed Arrow data types  | 
|---|---|
Null  | 
Any (including Null)  | 
Number  | 
All Integer types, Float32, Float64, Date32, Date64, Time32, Time64  | 
Boolean  | 
Boolean  | 
String  | 
Binary, LargeBinary, String, LargeString, Timestamp  | 
Array  | 
List  | 
Object (nested)  | 
Struct  | 
JSON value type  | 
Inferred Arrow data types (in order)  | 
|---|---|
Null  | 
Null, any other  | 
Number  | 
Int64, Float64  | 
Boolean  | 
Boolean  | 
String  | 
Timestamp (with seconds unit), String  | 
Array  | 
List  | 
Object (nested)  | 
Struct  |