Array Builders

class arrow::ArrayBuilder

Base class for all data array builders.

This class provides a facilities for incrementally building the null bitmap (see Append methods) and as a side effect the current number of slots and the null count.

Note

Users are expected to use builders as one of the concrete types below. For example, ArrayBuilder* pointing to BinaryBuilder should be downcast before use.

Subclassed by arrow::BaseBinaryBuilder< TYPE >, arrow::BaseListBuilder< TYPE >, arrow::BasicUnionBuilder, arrow::BooleanBuilder, arrow::FixedSizeBinaryBuilder, arrow::FixedSizeListBuilder, arrow::internal::AdaptiveIntBuilderBase, arrow::internal::DictionaryBuilderBase< BuilderType, T >, arrow::internal::DictionaryBuilderBase< BuilderType, NullType >, arrow::MapBuilder, arrow::NullBuilder, arrow::NumericBuilder< T >, arrow::StructBuilder, arrow::BaseBinaryBuilder< BinaryType >, arrow::BaseBinaryBuilder< LargeBinaryType >, arrow::BaseListBuilder< LargeListType >, arrow::BaseListBuilder< ListType >, arrow::internal::DictionaryBuilderBase< AdaptiveIntBuilder, T >, arrow::internal::DictionaryBuilderBase< Int32Builder, T >, arrow::NumericBuilder< DayTimeIntervalType >

Public Functions

ArrayBuilder *child(int i)

For nested types.

Since the objects are owned by this class instance, we skip shared pointers and just return a raw pointer

Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return

Status

Parameters
  • [in] capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status Reserve(int64_t additional_capacity)

Ensure that there is enough space allocated to append the indicated number of elements without any further reallocation.

Overallocation is used in order to minimize the impact of incremental Reserve() calls. Note that additional_capacity is relative to the current number of elements rather than to the current capacity, so calls to Reserve() which are not interspersed with addition of new elements may not increase the capacity.

Return

Status

Parameters
  • [in] additional_capacity: the number of additional array values

void Reset()

Reset the builder.

Status AppendNull() = 0

Append a null value to builder.

Status AppendNulls(int64_t length) = 0

Append a number of null values to builder.

Status AppendEmptyValue() = 0

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

Status AppendEmptyValues(int64_t length) = 0

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

Status Advance(int64_t elements)

For cases where raw data was memcpy’d into the internal buffers, allows us to advance the length of the builder.

It is your responsibility to use this function responsibly.

Status FinishInternal(std::shared_ptr<ArrayData> *out) = 0

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

Status Finish(std::shared_ptr<Array> *out)

Return result of builder as an Array object.

The builder is reset except for DictionaryBuilder.

Return

Status

Parameters
  • [out] out: the finalized Array object

Result<std::shared_ptr<Array>> Finish()

Return result of builder as an Array object.

The builder is reset except for DictionaryBuilder.

Return

The finalized Array object

std::shared_ptr<DataType> type() const = 0

Return the type of the built Array.

Concrete builder subclasses

class arrow::NullBuilder : public arrow::ArrayBuilder

Public Functions

Status AppendNulls(int64_t length) final

Append the specified number of null elements.

Status AppendNull() final

Append a single null element.

Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::BooleanBuilder : public arrow::ArrayBuilder

Public Functions

Status AppendNulls(int64_t length) final

Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory.

Status AppendNull() final

Append a null value to builder.

Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

Status Append(const bool val)

Scalar append.

void UnsafeAppend(const bool val)

Scalar append, without checking for capacity.

Status AppendValues(const uint8_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a contiguous array of bytes (non-zero is 1)

  • [in] length: the number of values to append

  • [in] valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

Status AppendValues(const uint8_t *values, int64_t length, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a contiguous C array of values

  • [in] length: the number of values to append

  • [in] is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<uint8_t> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a std::vector of bytes

  • [in] is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<uint8_t> &values)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a std::vector of bytes

Status AppendValues(const std::vector<bool> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: an std::vector<bool> indicating true (1) or false

  • [in] is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<bool> &values)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: an std::vector<bool> indicating true (1) or false

template<typename ValuesIter>
Status AppendValues(ValuesIter values_begin, ValuesIter values_end)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values_begin: InputIterator to the beginning of the values

  • [in] values_end: InputIterator pointing to the end of the values or null(0) values

template<typename ValuesIter, typename ValidIter>
enable_if_t<!std::is_pointer<ValidIter>::value, Status> AppendValues(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)

Append a sequence of elements in one shot, with a specified nullmap.

Return

Status

Parameters
  • [in] values_begin: InputIterator to the beginning of the values

  • [in] values_end: InputIterator pointing to the end of the values

  • [in] valid_begin: InputIterator with elements indication valid(1) or null(0) values

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

void Reset() override

Reset the builder.

Status Resize(int64_t capacity) override

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return

Status

Parameters
  • [in] capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

template<typename T>
class arrow::NumericBuilder : public arrow::ArrayBuilder

Base class for all Builders that emit an Array of a scalar numerical type.

Public Functions

Status Append(const value_type val)

Append a single scalar and increase the size if necessary.

Status AppendNulls(int64_t length) final

Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory The memory at the corresponding data slot is set to 0 to prevent uninitialized memory access.

Status AppendNull() final

Append a single null element.

Status AppendEmptyValue() final

Append a empty element.

Status AppendEmptyValues(int64_t length) final

Append several empty elements.

void Reset() override

Reset the builder.

Status Resize(int64_t capacity) override

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return

Status

Parameters
  • [in] capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status AppendValues(const value_type *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a contiguous C array of values

  • [in] length: the number of values to append

  • [in] valid_bytes: an optional sequence of bytes where non-zero indicates a valid (non-null) value

Status AppendValues(const value_type *values, int64_t length, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a contiguous C array of values

  • [in] length: the number of values to append

  • [in] is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<value_type> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a std::vector of values

  • [in] is_valid: an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Status AppendValues(const std::vector<value_type> &values)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values: a std::vector of values

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

template<typename ValuesIter>
Status AppendValues(ValuesIter values_begin, ValuesIter values_end)

Append a sequence of elements in one shot.

Return

Status

Parameters
  • [in] values_begin: InputIterator to the beginning of the values

  • [in] values_end: InputIterator pointing to the end of the values

template<typename ValuesIter, typename ValidIter>
enable_if_t<!std::is_pointer<ValidIter>::value, Status> AppendValues(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)

Append a sequence of elements in one shot, with a specified nullmap.

Return

Status

Parameters
  • [in] values_begin: InputIterator to the beginning of the values

  • [in] values_end: InputIterator pointing to the end of the values

  • [in] valid_begin: InputIterator with elements indication valid(1) or null(0) values.

void UnsafeAppend(const value_type val)

Append a single scalar under the assumption that the underlying Buffer is large enough.

This method does not capacity-check; make sure to call Reserve beforehand.

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::BinaryBuilder : public arrow::BaseBinaryBuilder<BinaryType>

Builder class for variable-length binary data.

Subclassed by arrow::StringBuilder

Public Functions

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::StringBuilder : public arrow::BinaryBuilder

Builder class for UTF8 strings.

Public Functions

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::FixedSizeBinaryBuilder : public arrow::ArrayBuilder

Subclassed by arrow::Decimal128Builder, arrow::Decimal256Builder

Public Functions

Status AppendNull() final

Append a null value to builder.

Status AppendNulls(int64_t length) final

Append a number of null values to builder.

Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

Status ReserveData(int64_t elements)

Ensures there is enough allocated capacity to append the indicated number of bytes to the value data buffer without additional allocations.

void Reset() override

Reset the builder.

Status Resize(int64_t capacity) override

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Return

Status

Parameters
  • [in] capacity: the minimum number of total array values to accommodate. Must be greater than the current capacity.

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

int64_t value_data_length() const

Return

size of values buffer so far

const uint8_t *GetValue(int64_t i) const

Temporary access to a value.

This pointer becomes invalid on the next modifying operation.

util::string_view GetView(int64_t i) const

Temporary access to a value.

This view becomes invalid on the next modifying operation.

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::Decimal128Builder : public arrow::FixedSizeBinaryBuilder

Public Functions

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

void Reset() override

Reset the builder.

class arrow::ListBuilder : public arrow::BaseListBuilder<ListType>

Builder class for variable-length list array value types.

To use this class, you must append values to the child array builder and use the Append function to delimit each distinct list value (once the values have been appended to the child array) or use the bulk API to append a sequence of offsets and null values.

A note on types. Per arrow/type.h all types in the c++ implementation are logical so even though this class always builds list array, this can represent multiple different logical types. If no logical type is provided at construction time, the class defaults to List<T> where t is taken from the value_builder/values that the object is constructed with.

Public Functions

BaseListBuilder(MemoryPool *pool, std::shared_ptr<ArrayBuilder> const &value_builder, const std::shared_ptr<DataType> &type)

Use this constructor to incrementally build the value array along with offsets and null bitmap.

class arrow::StructBuilder : public arrow::ArrayBuilder

Append, Resize and Reserve methods are acting on StructBuilder.

Please make sure all these methods of all child-builders’ are consistently called to maintain data-structure consistency.

Public Functions

StructBuilder(const std::shared_ptr<DataType> &type, MemoryPool *pool, std::vector<std::shared_ptr<ArrayBuilder>> field_builders)

If any of field_builders has indeterminate type, this builder will also.

Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Return

Status

Parameters
  • [out] out: the finalized ArrayData object

Status AppendValues(int64_t length, const uint8_t *valid_bytes)

Null bitmap is of equal length to every child field, and any zero byte will be considered as a null for that field, but users must using app- end methods or advance methods of the child builders’ independently to insert data.

Status Append(bool is_valid = true)

Append an element to the Struct.

All child-builders’ Append method must be called independently to maintain data-structure consistency.

Status AppendNull() final

Append a null value.

Automatically appends an empty value to each child builder.

Status AppendNulls(int64_t length) final

Append multiple null values.

Automatically appends empty values to each child builder.

Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

void Reset() override

Reset the builder.

std::shared_ptr<DataType> type() const override

Return the type of the built Array.

template<typename T>
class arrow::DictionaryBuilder : public arrow::internal::DictionaryBuilderBase<AdaptiveIntBuilder, T>

A DictionaryArray builder that uses AdaptiveIntBuilder to return the smallest index size that can accommodate the dictionary indices.

Public Functions

Status AppendIndices(const int64_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append dictionary indices directly without modifying memo.

NOTE: Experimental API