Array Builders

class arrow::ArrayBuilder

Base class for all data array builders.

This class provides a facilities for incrementally building the null bitmap (see Append methods) and as a side effect the current number of slots and the null count.

Note

Users are expected to use builders as one of the concrete types below. For example, ArrayBuilder* pointing to BinaryBuilder should be downcast before use.

Subclassed by arrow::BaseBinaryBuilder< LargeBinaryType >, arrow::BaseBinaryBuilder< BinaryType >, arrow::BaseListBuilder< LargeListType >, arrow::BaseListBuilder< ListType >, arrow::NumericBuilder< DayTimeIntervalType >, arrow::NumericBuilder< MonthDayNanoIntervalType >, arrow::internal::DictionaryBuilderBase< Int32Builder, T >, arrow::internal::DictionaryBuilderBase< AdaptiveIntBuilder, T >, arrow::BaseBinaryBuilder< TYPE >, arrow::BaseListBuilder< TYPE >, arrow::BasicUnionBuilder, arrow::BooleanBuilder, arrow::FixedSizeBinaryBuilder, arrow::FixedSizeListBuilder, arrow::MapBuilder, arrow::NullBuilder, arrow::NumericBuilder< T >, arrow::StructBuilder, arrow::internal::AdaptiveIntBuilderBase, arrow::internal::DictionaryBuilderBase< BuilderType, T >, arrow::internal::DictionaryBuilderBase< BuilderType, NullType >

Public Functions

inline ArrayBuilder *child(int i)

For nested types.

Since the objects are owned by this class instance, we skip shared pointers and just return a raw pointer

virtual Status Resize(int64_t capacity)

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Parameters

capacity[in] the minimum number of total array values to accommodate. Must be greater than the current capacity.

Returns

Status

inline Status Reserve(int64_t additional_capacity)

Ensure that there is enough space allocated to append the indicated number of elements without any further reallocation.

Overallocation is used in order to minimize the impact of incremental Reserve() calls. Note that additional_capacity is relative to the current number of elements rather than to the current capacity, so calls to Reserve() which are not interspersed with addition of new elements may not increase the capacity.

Parameters

additional_capacity[in] the number of additional array values

Returns

Status

virtual void Reset()

Reset the builder.

virtual Status AppendNull() = 0

Append a null value to builder.

virtual Status AppendNulls(int64_t length) = 0

Append a number of null values to builder.

virtual Status AppendEmptyValue() = 0

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

virtual Status AppendEmptyValues(int64_t length) = 0

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

inline Status AppendScalar(const Scalar &scalar)

Append a value from a scalar.

inline virtual Status AppendArraySlice(const ArrayData &array, int64_t offset, int64_t length)

Append a range of values from an array.

The given array must be the same type as the builder.

Status Advance(int64_t elements)

For cases where raw data was memcpy’d into the internal buffers, allows us to advance the length of the builder.

It is your responsibility to use this function responsibly.

virtual Status FinishInternal(std::shared_ptr<ArrayData> *out) = 0

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Parameters

out[out] the finalized ArrayData object

Returns

Status

Status Finish(std::shared_ptr<Array> *out)

Return result of builder as an Array object.

The builder is reset except for DictionaryBuilder.

Parameters

out[out] the finalized Array object

Returns

Status

Result<std::shared_ptr<Array>> Finish()

Return result of builder as an Array object.

The builder is reset except for DictionaryBuilder.

Returns

The finalized Array object

virtual std::shared_ptr<DataType> type() const = 0

Return the type of the built Array.

Concrete builder subclasses

Primitive

class arrow::NullBuilder : public arrow::ArrayBuilder

Public Functions

inline virtual Status AppendNulls(int64_t length) final

Append the specified number of null elements.

inline virtual Status AppendNull() final

Append a single null element.

inline virtual Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

inline virtual Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

inline virtual Status AppendArraySlice(const ArrayData&, int64_t, int64_t length) override

Append a range of values from an array.

The given array must be the same type as the builder.

virtual Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Parameters

out[out] the finalized ArrayData object

Returns

Status

inline virtual std::shared_ptr<DataType> type() const override

Return the type of the built Array.

class arrow::BooleanBuilder : public arrow::ArrayBuilder

Public Functions

inline virtual Status AppendNulls(int64_t length) final

Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory.

inline virtual Status AppendNull() final

Append a null value to builder.

inline virtual Status AppendEmptyValue() final

Append a non-null value to builder.

The appended value is an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending a null value to a parent nested type.

inline virtual Status AppendEmptyValues(int64_t length) final

Append a number of non-null values to builder.

The appended values are an implementation detail, but the corresponding memory slot is guaranteed to be initialized. This method is useful when appending null values to a parent nested type.

inline Status Append(const bool val)

Scalar append.

inline void UnsafeAppend(const bool val)

Scalar append, without checking for capacity.

Status AppendValues(const uint8_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append a sequence of elements in one shot.

Parameters
  • values[in] a contiguous array of bytes (non-zero is 1)

  • length[in] the number of values to append

  • valid_bytes[in] an optional sequence of bytes where non-zero indicates a valid (non-null) value

Returns

Status

Status AppendValues(const uint8_t *values, int64_t length, const uint8_t *validity, int64_t offset)

Append a sequence of elements in one shot.

Parameters
  • values[in] a bitmap of values

  • length[in] the number of values to append

  • validity[in] a validity bitmap to copy (may be null)

  • offset[in] an offset into the values and validity bitmaps

Returns

Status

Status AppendValues(const uint8_t *values, int64_t length, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Parameters
  • values[in] a contiguous C array of values

  • length[in] the number of values to append

  • is_valid[in] an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Returns

Status

Status AppendValues(const std::vector<uint8_t> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Parameters
  • values[in] a std::vector of bytes

  • is_valid[in] an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Returns

Status

Status AppendValues(const std::vector<uint8_t> &values)

Append a sequence of elements in one shot.

Parameters

values[in] a std::vector of bytes

Returns

Status

Status AppendValues(const std::vector<bool> &values, const std::vector<bool> &is_valid)

Append a sequence of elements in one shot.

Parameters
  • values[in] an std::vector<bool> indicating true (1) or false

  • is_valid[in] an std::vector<bool> indicating valid (1) or null (0). Equal in length to values

Returns

Status

Status AppendValues(const std::vector<bool> &values)

Append a sequence of elements in one shot.

Parameters

values[in] an std::vector<bool> indicating true (1) or false

Returns

Status

template<typename ValuesIter>
inline Status AppendValues(ValuesIter values_begin, ValuesIter values_end)

Append a sequence of elements in one shot.

Parameters
  • values_begin[in] InputIterator to the beginning of the values

  • values_end[in] InputIterator pointing to the end of the values or null(0) values

Returns

Status

template<typename ValuesIter, typename ValidIter>
inline enable_if_t<!std::is_pointer<ValidIter>::value, Status> AppendValues(ValuesIter values_begin, ValuesIter values_end, ValidIter valid_begin)

Append a sequence of elements in one shot, with a specified nullmap.

Parameters
  • values_begin[in] InputIterator to the beginning of the values

  • values_end[in] InputIterator pointing to the end of the values

  • valid_begin[in] InputIterator with elements indication valid(1) or null(0) values

Returns

Status

inline virtual Status AppendArraySlice(const ArrayData &array, int64_t offset, int64_t length) override

Append a range of values from an array.

The given array must be the same type as the builder.

virtual Status FinishInternal(std::shared_ptr<ArrayData> *out) override

Return result of builder as an internal generic ArrayData object.

Resets builder except for dictionary builder

Parameters

out[out] the finalized ArrayData object

Returns

Status

virtual void Reset() override

Reset the builder.

virtual Status Resize(int64_t capacity) override

Ensure that enough memory has been allocated to fit the indicated number of total elements in the builder, including any that have already been appended.

Does not account for reallocations that may be due to variable size data, like binary values. To make space for incremental appends, use Reserve instead.

Parameters

capacity[in] the minimum number of total array values to accommodate. Must be greater than the current capacity.

Returns

Status

inline virtual std::shared_ptr<DataType> type() const override

Return the type of the built Array.

Warning

doxygengroup: Cannot find group “numeric-builders” in doxygen xml output for project “arrow_cpp” from directory: ../../../cpp/apidoc/xml

Temporal

Warning

doxygengroup: Cannot find group “temporal-builders” in doxygen xml output for project “arrow_cpp” from directory: ../../../cpp/apidoc/xml

Binary-like

Warning

doxygengroup: Cannot find group “binary-builders” in doxygen xml output for project “arrow_cpp” from directory: ../../../cpp/apidoc/xml

Nested

Warning

doxygengroup: Cannot find group “nested-builders” in doxygen xml output for project “arrow_cpp” from directory: ../../../cpp/apidoc/xml

Dictionary-encoded

template<typename T>
class arrow::DictionaryBuilder : public arrow::internal::DictionaryBuilderBase<AdaptiveIntBuilder, T>

A DictionaryArray builder that uses AdaptiveIntBuilder to return the smallest index size that can accommodate the dictionary indices.

Public Functions

inline Status AppendIndices(const int64_t *values, int64_t length, const uint8_t *valid_bytes = NULLPTR)

Append dictionary indices directly without modifying memo.

NOTE: Experimental API