New in version 1.0.0.
pandas does not allow indexing with NA values. Attempting to do so will raise a ValueError.
ValueError
In [1]: s = pd.Series([1, 2, 3]) In [2]: mask = pd.array([True, False, pd.NA], dtype="boolean") In [3]: s[mask] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-3-b9628bf46992> in <module> ----> 1 s[mask] ~/scipy/pandas/pandas/core/series.py in __getitem__(self, key) 836 key = list(key) 837 --> 838 if com.is_bool_indexer(key): 839 key = check_bool_indexer(self.index, key) 840 ~/scipy/pandas/pandas/core/common.py in is_bool_indexer(key) 142 if is_extension_array_dtype(key.dtype): 143 if np.any(key.isna()): --> 144 raise ValueError(na_msg) 145 return True 146 elif isinstance(key, list): ValueError: cannot mask with array containing NA / NaN values
The missing values will need to be explicitly filled with True or False prior to using the array as a mask.
In [4]: s[mask.fillna(False)] Out[4]: 0 1 dtype: int64
arrays.BooleanArray implements Kleene Logic (sometimes called three-value logic) for logical operations like & (and), | (or) and ^ (exclusive-or).
arrays.BooleanArray
&
|
^
This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.
Expression
Result
True & True
True
True & False
False
True & NA
NA
False & False
False & NA
NA & NA
True | True
True | False
True | NA
False | False
False | NA
NA | NA
True ^ True
True ^ False
True ^ NA
False ^ False
False ^ NA
NA ^ NA
When an NA is present in an operation, the output value is NA only if the result cannot be determined solely based on the other input. For example, True | NA is True, because both True | True and True | False are True. In that case, we don’t actually need to consider the value of the NA.
On the other hand, True & NA is NA. The result depends on whether the NA really is True or False, since True & True is True, but True & False is False, so we can’t determine the output.
This differs from how np.nan behaves in logical operations. Pandas treated np.nan is always false in the output.
np.nan
In or
or
In [5]: pd.Series([True, False, np.nan], dtype="object") | True Out[5]: 0 True 1 True 2 False dtype: bool In [6]: pd.Series([True, False, np.nan], dtype="boolean") | True Out[6]: 0 True 1 True 2 True dtype: boolean
In and
and
In [7]: pd.Series([True, False, np.nan], dtype="object") & True Out[7]: 0 True 1 False 2 False dtype: bool In [8]: pd.Series([True, False, np.nan], dtype="boolean") & True Out[8]: 0 True 1 False 2 <NA> dtype: boolean