pyarrow.Schema

class pyarrow.Schema

Bases: pyarrow.lib._Weakrefable

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(*args, **kwargs)

Initialize self.

add_metadata(self, metadata)

append(self, Field field)

Append a field at the end of the schema.

empty_table(self)

Provide an empty table according to the schema.

equals(self, Schema other, …)

Test if this schema is equal to the other

field(self, i)

Select a field by its column name or numeric index.

field_by_name(self, name)

Access a field by its name rather than the column index.

from_pandas(type cls, df[, preserve_index])

Returns implied schema from dataframe

get_all_field_indices(self, name)

Return sorted list of indices for fields with the given name

get_field_index(self, name)

Return index of field with given unique name.

insert(self, int i, Field field)

Add a field at position i to the schema.

remove(self, int i)

Remove the field at index i from the schema.

remove_metadata(self)

Create new schema without metadata, if any

serialize(self[, memory_pool])

Write Schema to Buffer as encapsulated IPC message

set(self, int i, Field field)

Replace a field at position i in the schema.

to_string(self[, truncate_metadata, …])

Return human-readable representation of Schema

with_metadata(self, metadata)

Add metadata as dict of string keys and values to Schema

Attributes

metadata

names

The schema’s field names.

pandas_metadata

Return deserialized-from-JSON pandas metadata field (if it exists)

types

The schema’s field types.

add_metadata(self, metadata)
append(self, Field field)

Append a field at the end of the schema.

In contrast to Python’s list.append() it does return a new object, leaving the original Schema unmodified.

Parameters

field (Field) –

Returns

schema (Schema) – New object with appended field.

empty_table(self)

Provide an empty table according to the schema.

Returns

table (pyarrow.Table)

equals(self, Schema other, bool check_metadata=False)

Test if this schema is equal to the other

Parameters
  • other (pyarrow.Schema) –

  • check_metadata (bool, default False) – Key/value metadata must be equal too

Returns

is_equal (bool)

field(self, i)

Select a field by its column name or numeric index.

Parameters

i (int or string) –

Returns

pyarrow.Field

field_by_name(self, name)

Access a field by its name rather than the column index.

Parameters

name (str) –

Returns

field (pyarrow.Field)

from_pandas(type cls, df, preserve_index=None)

Returns implied schema from dataframe

Parameters
  • df (pandas.DataFrame) –

  • preserve_index (bool, default True) – Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Use preserve_index=True to force it to be stored as a column.

Returns

pyarrow.Schema

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> df = pd.DataFrame({
    ...     'int': [1, 2],
    ...     'str': ['a', 'b']
    ... })
>>> pa.Schema.from_pandas(df)
int: int64
str: string
__index_level_0__: int64
get_all_field_indices(self, name)

Return sorted list of indices for fields with the given name

get_field_index(self, name)

Return index of field with given unique name. Returns -1 if not found or if duplicated

insert(self, int i, Field field)

Add a field at position i to the schema.

Parameters
  • i (int) –

  • field (Field) –

Returns

schema (Schema)

metadata
names

The schema’s field names.

Returns

list of str

pandas_metadata

Return deserialized-from-JSON pandas metadata field (if it exists)

remove(self, int i)

Remove the field at index i from the schema.

Parameters

i (int) –

Returns

schema (Schema)

remove_metadata(self)

Create new schema without metadata, if any

Returns

schema (pyarrow.Schema)

serialize(self, memory_pool=None)

Write Schema to Buffer as encapsulated IPC message

Parameters

memory_pool (MemoryPool, default None) – Uses default memory pool if not specified

Returns

serialized (Buffer)

set(self, int i, Field field)

Replace a field at position i in the schema.

Parameters
  • i (int) –

  • field (Field) –

Returns

schema (Schema)

to_string(self, truncate_metadata=True, show_field_metadata=True, show_schema_metadata=True)

Return human-readable representation of Schema

Parameters
  • truncate_metadata (boolean, default True) – Limit metadata key/value display to a single line of ~80 characters or less

  • show_field_metadata (boolean, default True) – Display Field-level KeyValueMetadata

  • show_schema_metadata (boolean, default True) – Display Schema-level KeyValueMetadata

Returns

str (the formatted output)

types

The schema’s field types.

Returns

list of DataType

with_metadata(self, metadata)

Add metadata as dict of string keys and values to Schema

Parameters

metadata (dict) – Keys and values must be string-like / coercible to bytes

Returns

schema (pyarrow.Schema)