Nullable Boolean Data Type¶

New in version 1.0.0.

Indexing with NA values¶

pandas does not allow indexing with NA values. Attempting to do so will raise a ValueError.

In [1]: s = pd.Series([1, 2, 3])

In [2]: mask = pd.array([True, False, pd.NA], dtype="boolean")

In [3]: s[mask]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-b9628bf46992> in <module>
----> 1 s[mask]

~/scipy/pandas/pandas/core/series.py in __getitem__(self, key)
    836             key = list(key)
    837 
--> 838         if com.is_bool_indexer(key):
    839             key = check_bool_indexer(self.index, key)
    840 

~/scipy/pandas/pandas/core/common.py in is_bool_indexer(key)
    142             if is_extension_array_dtype(key.dtype):
    143                 if np.any(key.isna()):
--> 144                     raise ValueError(na_msg)
    145             return True
    146     elif isinstance(key, list):

ValueError: cannot mask with array containing NA / NaN values

The missing values will need to be explicitly filled with True or False prior to using the array as a mask.

In [4]: s[mask.fillna(False)]
Out[4]: 
0    1
dtype: int64

Kleene Logical Operations¶

arrays.BooleanArray implements Kleene Logic (sometimes called three-value logic) for logical operations like & (and), | (or) and ^ (exclusive-or).

This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.

Expression	Result
`True & True`	`True`
`True & False`	`False`
`True & NA`	`NA`
`False & False`	`False`
`False & NA`	`False`
`NA & NA`	`NA`
`True \| True`	`True`
`True \| False`	`True`
`True \| NA`	`True`
`False \| False`	`False`
`False \| NA`	`NA`
`NA \| NA`	`NA`
`True ^ True`	`False`
`True ^ False`	`True`
`True ^ NA`	`NA`
`False ^ False`	`False`
`False ^ NA`	`NA`
`NA ^ NA`	`NA`

When an NA is present in an operation, the output value is NA only if the result cannot be determined solely based on the other input. For example, True | NA is True, because both True | True and True | False are True. In that case, we don’t actually need to consider the value of the NA.

On the other hand, True & NA is NA. The result depends on whether the NA really is True or False, since True & True is True, but True & False is False, so we can’t determine the output.

This differs from how np.nan behaves in logical operations. Pandas treated np.nan is always false in the output.

In or

In [5]: pd.Series([True, False, np.nan], dtype="object") | True
Out[5]: 
0     True
1     True
2    False
dtype: bool

In [6]: pd.Series([True, False, np.nan], dtype="boolean") | True
Out[6]: 
0    True
1    True
2    True
dtype: boolean

In and

In [7]: pd.Series([True, False, np.nan], dtype="object") & True
Out[7]: 
0     True
1    False
2    False
dtype: bool

In [8]: pd.Series([True, False, np.nan], dtype="boolean") & True
Out[8]: 
0     True
1    False
2     <NA>
dtype: boolean