Data Types¶
- 
enum 
arrow::Type::type¶ Main data type enumeration.
This enumeration provides a quick way to interrogate the category of a DataType instance.
Values:
- 
enumerator 
NA¶ A NULL type having no physical storage.
- 
enumerator 
BOOL¶ Boolean as 1 bit, LSB bit-packed ordering.
- 
enumerator 
UINT8¶ Unsigned 8-bit little-endian integer.
- 
enumerator 
INT8¶ Signed 8-bit little-endian integer.
- 
enumerator 
UINT16¶ Unsigned 16-bit little-endian integer.
- 
enumerator 
INT16¶ Signed 16-bit little-endian integer.
- 
enumerator 
UINT32¶ Unsigned 32-bit little-endian integer.
- 
enumerator 
INT32¶ Signed 32-bit little-endian integer.
- 
enumerator 
UINT64¶ Unsigned 64-bit little-endian integer.
- 
enumerator 
INT64¶ Signed 64-bit little-endian integer.
- 
enumerator 
HALF_FLOAT¶ 2-byte floating point value
- 
enumerator 
FLOAT¶ 4-byte floating point value
- 
enumerator 
DOUBLE¶ 8-byte floating point value
- 
enumerator 
STRING¶ UTF8 variable-length string as List<Char>
- 
enumerator 
BINARY¶ Variable-length bytes (no guarantee of UTF8-ness)
- 
enumerator 
FIXED_SIZE_BINARY¶ Fixed-size binary. Each value occupies the same number of bytes.
- 
enumerator 
DATE32¶ int32_t days since the UNIX epoch
- 
enumerator 
DATE64¶ int64_t milliseconds since the UNIX epoch
- 
enumerator 
TIMESTAMP¶ Exact timestamp encoded with int64 since UNIX epoch Default unit millisecond.
- 
enumerator 
TIME32¶ Time as signed 32-bit integer, representing either seconds or milliseconds since midnight.
- 
enumerator 
TIME64¶ Time as signed 64-bit integer, representing either microseconds or nanoseconds since midnight.
- 
enumerator 
INTERVAL_MONTHS¶ YEAR_MONTH interval in SQL style.
- 
enumerator 
INTERVAL_DAY_TIME¶ DAY_TIME interval in SQL style.
- 
enumerator 
DECIMAL128¶ Precision- and scale-based decimal type with 128 bits.
- 
enumerator 
DECIMAL¶ Defined for backward-compatibility.
- 
enumerator 
DECIMAL256¶ Precision- and scale-based decimal type with 256 bits.
- 
enumerator 
LIST¶ A list of some logical data type.
- 
enumerator 
STRUCT¶ Struct of logical types.
- 
enumerator 
SPARSE_UNION¶ Sparse unions of logical types.
- 
enumerator 
DENSE_UNION¶ Dense unions of logical types.
- 
enumerator 
DICTIONARY¶ Dictionary-encoded type, also called “categorical” or “factor” in other programming languages.
Holds the dictionary value type but not the dictionary itself, which is part of the ArrayData struct
- 
enumerator 
MAP¶ Map, a repeated struct logical type.
- 
enumerator 
EXTENSION¶ Custom data type, implemented by user.
- 
enumerator 
FIXED_SIZE_LIST¶ Fixed size list of some logical type.
- 
enumerator 
DURATION¶ Measure of elapsed time in either seconds, milliseconds, microseconds or nanoseconds.
- 
enumerator 
LARGE_STRING¶ Like STRING, but with 64-bit offsets.
- 
enumerator 
LARGE_BINARY¶ Like BINARY, but with 64-bit offsets.
- 
enumerator 
LARGE_LIST¶ Like LIST, but with 64-bit offsets.
- 
enumerator 
MAX_ID¶ 
- 
enumerator 
 
- 
class 
arrow::DataType: public arrow::detail::Fingerprintable¶ Base class for all data types.
Data types in this library are all logical. They can be expressed as either a primitive physical type (bytes or bits of some fixed size), a nested type consisting of other data types, or another data type (e.g. a timestamp encoded as an int64).
Simple datatypes may be entirely described by their Type::type id, but complex datatypes are usually parametric.
Subclassed by arrow::BaseBinaryType, arrow::ExtensionType, arrow::FixedWidthType, arrow::NestedType, arrow::NullType
Public Functions
- 
bool 
Equals(const DataType &other, bool check_metadata = false) const¶ Return whether the types are equal.
Types that are logically convertible from one to another (e.g. List<UInt8> and Binary) are NOT equal.
Return whether the types are equal.
- 
const std::vector<std::shared_ptr<Field>> &
fields() const¶ Returns the children fields associated with this type.
- 
int 
num_fields() const¶ Returns the number of children fields associated with this type.
- 
std::string 
ToString() const = 0¶ A string representation of the type, including any children.
- 
size_t 
Hash() const¶ Return hash value (excluding metadata in child fields)
- 
std::string 
name() const = 0¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
DataTypeLayout 
layout() const = 0¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
bool 
 
Factory functions¶
These functions are recommended for creating data types. They may return new objects or existing singletons, depending on the type requested.
- 
std::shared_ptr<DataType> 
boolean()¶ Return a BooleanType instance.
- 
std::shared_ptr<DataType> 
uint16()¶ Return a UInt16Type instance.
- 
std::shared_ptr<DataType> 
uint32()¶ Return a UInt32Type instance.
- 
std::shared_ptr<DataType> 
uint64()¶ Return a UInt64Type instance.
- 
std::shared_ptr<DataType> 
float16()¶ Return a HalfFloatType instance.
- 
std::shared_ptr<DataType> 
float64()¶ Return a DoubleType instance.
- 
std::shared_ptr<DataType> 
utf8()¶ Return a StringType instance.
- 
std::shared_ptr<DataType> 
large_utf8()¶ Return a LargeStringType instance.
- 
std::shared_ptr<DataType> 
binary()¶ Return a BinaryType instance.
- 
std::shared_ptr<DataType> 
large_binary()¶ Return a LargeBinaryType instance.
- 
std::shared_ptr<DataType> 
date32()¶ Return a Date32Type instance.
- 
std::shared_ptr<DataType> 
date64()¶ Return a Date64Type instance.
- 
std::shared_ptr<DataType> 
fixed_size_binary(int32_t byte_width)¶ Create a FixedSizeBinaryType instance.
- 
std::shared_ptr<DataType> 
decimal(int32_t precision, int32_t scale)¶ Create a Decimal128Type or Decimal256Type instance depending on the precision.
- 
std::shared_ptr<DataType> 
decimal128(int32_t precision, int32_t scale)¶ Create a Decimal128Type instance.
- 
std::shared_ptr<DataType> 
decimal256(int32_t precision, int32_t scale)¶ Create a Decimal256Type instance.
Create a LargeListType instance from its child Field type.
Create a LargeListType instance from its child DataType.
Create a MapType instance from its key and value DataTypes.
Create a MapType instance from its key DataType and value field.
The field override is provided to communicate nullability of the value.
Create a FixedSizeListType instance from its child Field type.
Create a FixedSizeListType instance from its child DataType.
- 
std::shared_ptr<DataType> 
duration(TimeUnit::type unit)¶ Return a Duration instance (naming use _type to avoid namespace conflict with built in time classes).
- 
std::shared_ptr<DataType> 
day_time_interval()¶ Return a DayTimeIntervalType instance.
- 
std::shared_ptr<DataType> 
month_interval()¶ Return a MonthIntervalType instance.
- 
std::shared_ptr<DataType> 
timestamp(TimeUnit::type unit)¶ Create a TimestampType instance from its unit.
- 
std::shared_ptr<DataType> 
timestamp(TimeUnit::type unit, const std::string &timezone)¶ Create a TimestampType instance from its unit and timezone.
- 
std::shared_ptr<DataType> 
time32(TimeUnit::type unit)¶ Create a 32-bit time type instance.
Unit can be either SECOND or MILLI
- 
std::shared_ptr<DataType> 
time64(TimeUnit::type unit)¶ Create a 64-bit time type instance.
Unit can be either MICRO or NANO
Create a StructType instance.
- 
std::shared_ptr<DataType> 
sparse_union(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
- 
std::shared_ptr<DataType> 
dense_union(FieldVector child_fields, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
- 
std::shared_ptr<DataType> 
sparse_union(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a SparseUnionType instance.
- 
std::shared_ptr<DataType> 
dense_union(const ArrayVector &children, std::vector<std::string> field_names = {}, std::vector<int8_t> type_codes = {})¶ Create a DenseUnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a UnionType instance.
Create a DictionaryType instance.
- Parameters
 [in] index_type: the type of the dictionary indices (must be a signed integer)[in] dict_type: the type of the values in the variable dictionary[in] ordered: true if the order of the dictionary values has semantic meaning and should be preserved where possible
Concrete type subclasses¶
Primitive¶
- 
class 
arrow::NullType: public arrow::DataType¶ Concrete type class for always-null data.
Public Functions
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
std::string 
 
- 
class 
arrow::BooleanType: public arrow::detail::CTypeImpl<BooleanType, PrimitiveCType, Type::BOOL, bool>¶ Concrete type class for boolean data.
Public Functions
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
DataTypeLayout 
 
- 
class 
Int8Type: public arrow::detail::IntegerTypeImpl<Int8Type, Type::INT8, int8_t>¶ Concrete type class for signed 8-bit integer data.
- 
class 
Int16Type: public arrow::detail::IntegerTypeImpl<Int16Type, Type::INT16, int16_t>¶ Concrete type class for signed 16-bit integer data.
- 
class 
Int32Type: public arrow::detail::IntegerTypeImpl<Int32Type, Type::INT32, int32_t>¶ Concrete type class for signed 32-bit integer data.
- 
class 
Int64Type: public arrow::detail::IntegerTypeImpl<Int64Type, Type::INT64, int64_t>¶ Concrete type class for signed 64-bit integer data.
- 
class 
UInt8Type: public arrow::detail::IntegerTypeImpl<UInt8Type, Type::UINT8, uint8_t>¶ Concrete type class for unsigned 8-bit integer data.
- 
class 
UInt16Type: public arrow::detail::IntegerTypeImpl<UInt16Type, Type::UINT16, uint16_t>¶ Concrete type class for unsigned 16-bit integer data.
- 
class 
UInt32Type: public arrow::detail::IntegerTypeImpl<UInt32Type, Type::UINT32, uint32_t>¶ Concrete type class for unsigned 32-bit integer data.
- 
class 
UInt64Type: public arrow::detail::IntegerTypeImpl<UInt64Type, Type::UINT64, uint64_t>¶ Concrete type class for unsigned 64-bit integer data.
- 
class 
HalfFloatType: public arrow::detail::CTypeImpl<HalfFloatType, FloatingPointType, Type::HALF_FLOAT, uint16_t>¶ Concrete type class for 16-bit floating-point data.
- 
class 
FloatType: public arrow::detail::CTypeImpl<FloatType, FloatingPointType, Type::FLOAT, float>¶ Concrete type class for 32-bit floating-point data (C “float”)
- 
class 
DoubleType: public arrow::detail::CTypeImpl<DoubleType, FloatingPointType, Type::DOUBLE, double>¶ Concrete type class for 64-bit floating-point data (C “double”)
Binary-like¶
- 
class 
arrow::BinaryType: public arrow::BaseBinaryType¶ Concrete type class for variable-size binary data.
Subclassed by arrow::StringType
Public Functions
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
DataTypeLayout 
 
- 
class 
arrow::StringType: public arrow::BinaryType¶ Concrete type class for variable-size string data, utf8-encoded.
- 
class 
arrow::FixedSizeBinaryType: public arrow::FixedWidthType, public arrow::ParametricType¶ Concrete type class for fixed-size binary data.
Subclassed by arrow::DecimalType
Public Functions
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
 
- 
class 
arrow::Decimal128Type: public arrow::DecimalType¶ Concrete type class for 128-bit decimal data.
Public Functions
- 
Decimal128Type(int32_t precision, int32_t scale)¶ Decimal128Type constructor that aborts on invalid input.
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
Public Static Functions
- 
Result<std::shared_ptr<DataType>> 
Make(int32_t precision, int32_t scale)¶ Decimal128Type constructor that returns an error on invalid input.
- 
 
Nested¶
- 
class 
arrow::ListType: public arrow::BaseListType¶ Concrete type class for list data.
List data is nested data where each value is a variable number of child items. Lists can be recursively nested, for example list(list(int32)).
Subclassed by arrow::MapType
Public Functions
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
DataTypeLayout 
 
- 
class 
arrow::MapType: public arrow::ListType¶ Concrete type class for map data.
Map data is nested data where each value is a variable number of key-item pairs. Maps can be recursively nested, for example map(utf8, map(utf8, int32)).
- 
class 
arrow::StructType: public arrow::NestedType¶ Concrete type class for struct data.
Public Functions
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
std::shared_ptr<Field> 
GetFieldByName(const std::string &name) const¶ Returns null if name not found.
- 
std::vector<std::shared_ptr<Field>> 
GetAllFieldsByName(const std::string &name) const¶ Return all fields having this name.
- 
int 
GetFieldIndex(const std::string &name) const¶ Returns -1 if name not found or if there are multiple fields having the same name.
- 
std::vector<int> 
GetAllFieldIndices(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
- 
DataTypeLayout 
 
- 
class 
arrow::UnionType: public arrow::NestedType¶ Concrete type class for union data.
Subclassed by arrow::DenseUnionType, arrow::SparseUnionType
Public Functions
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
const std::vector<int8_t> &
type_codes() const¶ The array of logical type ids.
For example, the first type in the union might be denoted by the id 5 (instead of 0).
- 
const std::vector<int> &
child_ids() const¶ An array mapping logical type ids to physical child ids.
- 
DataTypeLayout 
 
Dictionary-encoded¶
- 
class 
arrow::DictionaryType: public arrow::FixedWidthType¶ Dictionary-encoded value type with data-dependent dictionary.
Indices are represented by any integer types.
Public Functions
- 
std::string 
ToString() const override¶ A string representation of the type, including any children.
- 
std::string 
name() const override¶ A string name of the type, omitting any child fields.
- Note
 Experimental API
- Since
 0.7.0
- 
DataTypeLayout 
layout() const override¶ Return the data type layout.
Children are not included.
- Note
 Experimental API
- 
std::string 
 
Fields and Schemas¶
Create a Field instance.
- Parameters
 name: the field nametype: the field value typenullable: whether the values are nullable, default truemetadata: any custom key-value metadata, default null
Create a Schema instance.
- Return
 schema shared_ptr to Schema
- Parameters
 fields: the schema’s fieldsmetadata: any custom key-value metadata, default null
- 
class 
arrow::Field: public arrow::detail::Fingerprintable¶ The combination of a field name and data type, with optional metadata.
Fields are used to describe the individual constituents of a nested DataType or a Schema.
A field’s metadata is represented by a KeyValueMetadata instance, which holds arbitrary key-value pairs.
Public Functions
- 
std::shared_ptr<const KeyValueMetadata> 
metadata() const¶ Return the field’s attached metadata.
- 
bool 
HasMetadata() const¶ Return whether the field has non-empty metadata.
Return a copy of this field with the given metadata attached to it.
EXPERIMENTAL: Return a copy of this field with the given metadata merged with existing metadata (any colliding keys will be overridden by the passed metadata)
- 
std::shared_ptr<Field> 
RemoveMetadata() const¶ Return a copy of this field without any metadata attached to it.
Return a copy of this field with the replaced type.
- 
std::shared_ptr<Field> 
WithName(const std::string &name) const¶ Return a copy of this field with the replaced name.
- 
std::shared_ptr<Field> 
WithNullable(bool nullable) const¶ Return a copy of this field with the replaced nullability.
- 
Result<std::shared_ptr<Field>> 
MergeWith(const Field &other, MergeOptions options = MergeOptions::Defaults()) const¶ Merge the current field with a field of the same name.
The two fields must be compatible, i.e:
have the same name
have the same type, or of compatible types according to
options.
The metadata of the current field is preserved; the metadata of the other field is discarded.
- 
bool 
Equals(const Field &other, bool check_metadata = false) const¶ Indicate if fields are equals.
- Return
 true if fields are equal, false otherwise.
- Parameters
 [in] other: field to check equality with.[in] check_metadata: controls if it should check for metadata equality.
- 
bool 
IsCompatibleWith(const Field &other) const¶ Indicate if fields are compatibles.
See the criteria of MergeWith.
- Return
 true if fields are compatible, false otherwise.
- 
std::string 
ToString(bool show_metadata = false) const¶ Return a string representation ot the field.
- Parameters
 [in] show_metadata: when true, if KeyValueMetadata is non-empty, print keys and values in the output
- 
const std::string &
name() const¶ Return the field name.
- 
bool 
nullable() const¶ Return whether the field is nullable.
- 
struct 
MergeOptions¶ Options that control the behavior of
MergeWith.Options are to be added to allow type conversions, including integer widening, promotion from integer to float, or conversion to or from boolean.
- 
std::shared_ptr<const KeyValueMetadata> 
 
- 
class 
arrow::Schema: public arrow::detail::Fingerprintable, public arrow::util::EqualityComparable<Schema>, public arrow::util::ToStringOstreamable<Schema>¶ Sequence of arrow::Field objects describing the columns of a record batch or table data structure.
Public Functions
- 
bool 
Equals(const Schema &other, bool check_metadata = false) const¶ Returns true if all of the schema fields are equal.
- 
int 
num_fields() const¶ Return the number of fields (columns) in the schema.
- 
const std::shared_ptr<Field> &
field(int i) const¶ Return the ith schema element. Does not boundscheck.
- 
std::shared_ptr<Field> 
GetFieldByName(const std::string &name) const¶ Returns null if name not found.
- 
std::vector<std::shared_ptr<Field>> 
GetAllFieldsByName(const std::string &name) const¶ Return the indices of all fields having this name in sorted order.
- 
int 
GetFieldIndex(const std::string &name) const¶ Returns -1 if name not found.
- 
std::vector<int> 
GetAllFieldIndices(const std::string &name) const¶ Return the indices of all fields having this name.
- 
Status 
CanReferenceFieldsByNames(const std::vector<std::string> &names) const¶ Indicate if fields named
namescan be found unambiguously in the schema.
- 
const std::shared_ptr<const KeyValueMetadata> &
metadata() const¶ The custom key-value metadata, if any.
- Return
 metadata may be null
- 
std::string 
ToString(bool show_metadata = false) const¶ Render a string representation of the schema suitable for debugging.
- Parameters
 [in] show_metadata: when true, if KeyValueMetadata is non-empty, print keys and values in the output
Replace key-value metadata with new metadata.
- Return
 new Schema
- Parameters
 [in] metadata: new KeyValueMetadata
- 
bool