Reading JSON files¶
Arrow allows reading line-separated JSON files as Arrow tables. Each independent JSON object in the input file is converted to a row in the target Arrow table.
See also
Basic usage¶
A JSON file is read from a InputStream
.
#include "arrow/json/api.h"
{
// ...
arrow::Status st;
arrow::MemoryPool* pool = default_memory_pool();
std::shared_ptr<arrow::io::InputStream> input = ...;
auto read_options = arrow::json::ReadOptions::Defaults();
auto parse_options = arrow::json::ParseOptions::Defaults();
// Instantiate TableReader from input stream and options
std::shared_ptr<arrow::json::TableReader> reader;
st = arrow::json::TableReader::Make(pool, input, read_options,
parse_options, &reader);
if (!st.ok()) {
// Handle TableReader instantiation error...
}
std::shared_ptr<arrow::Table> table;
// Read table from JSON file
st = reader->Read(&table);
if (!st.ok()) {
// Handle JSON read error
// (for example a JSON syntax error or failed type conversion)
}
}
Data types¶
Since JSON values are typed, the possible Arrow data types on output depend on the input value types. Top-level JSON values should always be objects. The fields of top-level objects are taken to represent columns in the Arrow data. For each name/value pair in a JSON object, there are two possible modes of deciding the output data type:
if the name is in
ConvertOptions::explicit_schema
, conversion of the JSON value to the corresponding Arrow data type is attempted;otherwise, the Arrow data type is determined via type inference on the JSON value, trying out a number of Arrow data types in order.
The following tables show the possible combinations for each of those two modes.
JSON value type |
Allowed Arrow data types |
---|---|
Null |
Any (including Null) |
Number |
All Integer types, Float32, Float64, Date32, Date64, Time32, Time64 |
Boolean |
Boolean |
String |
Binary, LargeBinary, String, LargeString, Timestamp |
Array |
List |
Object (nested) |
Struct |
JSON value type |
Inferred Arrow data types (in order) |
---|---|
Null |
Null, any other |
Number |
Int64, Float64 |
Boolean |
Boolean |
String |
Timestamp (with seconds unit), String |
Array |
List |
Object (nested) |
Struct |