This is a major release from 0.22.0 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
Round-trippable JSON format with ‘table’ orient.
Instantiation from dicts respects order for Python 3.6+.
Dependent column arguments for assign.
Merging / sorting on a combination of columns and index levels.
Extending pandas with custom types.
Excluding unobserved categories from groupby.
Changes to make output shape of DataFrame.apply consistent.
Check the API Changes and deprecations before updating.
Warning
Starting January 1, 2019, pandas feature releases will support Python 3 only. See Dropping Python 2.7 for more.
What’s new in v0.23.0
New features
JSON read/write round-trippable with orient='table'
orient='table'
.assign() accepts dependent arguments
.assign()
Merging on a combination of columns and index levels
Sorting by a combination of columns and index levels
Extending pandas with custom types (experimental)
New observed keyword for excluding unobserved categories in groupby
observed
groupby
Rolling/Expanding.apply() accepts raw=False to pass a Series to the function
raw=False
Series
DataFrame.interpolate has gained the limit_area kwarg
DataFrame.interpolate
limit_area
get_dummies now supports dtype argument
get_dummies
dtype
Timedelta mod method
.rank() handles inf values when NaN are present
.rank()
inf
NaN
Series.str.cat has gained the join kwarg
Series.str.cat
join
DataFrame.astype performs column-wise conversion to Categorical
DataFrame.astype
Categorical
Other enhancements
Backwards incompatible API changes
Dependencies have increased minimum versions
Instantiation from dicts preserves dict insertion order for python 3.6+
Deprecate Panel
pandas.core.common removals
Changes to make output of DataFrame.apply consistent
DataFrame.apply
Concatenation will no longer sort
Build changes
Index division by zero fills correctly
Extraction of matching patterns from strings
Default value for the ordered parameter of CategoricalDtype
ordered
CategoricalDtype
Better pretty-printing of DataFrames in a terminal
Datetimelike API changes
Other API changes
Deprecations
Removal of prior version deprecations/changes
Performance improvements
Documentation changes
Bug fixes
Datetimelike
Timedelta
Timezones
Offsets
Numeric
Strings
Indexing
MultiIndex
I/O
Plotting
Groupby/resample/rolling
Sparse
Reshaping
Other
Contributors
A DataFrame can now be written to and subsequently read back via JSON while preserving metadata through usage of the orient='table' argument (see GH18912 and GH9146). Previously, none of the available orient values guaranteed the preservation of dtypes and index names, amongst other metadata.
DataFrame
orient
In [1]: df = pd.DataFrame({'foo': [1, 2, 3, 4], ...: 'bar': ['a', 'b', 'c', 'd'], ...: 'baz': pd.date_range('2018-01-01', freq='d', periods=4), ...: 'qux': pd.Categorical(['a', 'b', 'c', 'c'])}, ...: index=pd.Index(range(4), name='idx')) ...: In [2]: df Out[2]: foo bar baz qux idx 0 1 a 2018-01-01 a 1 2 b 2018-01-02 b 2 3 c 2018-01-03 c 3 4 d 2018-01-04 c [4 rows x 4 columns] In [3]: df.dtypes Out[3]: foo int64 bar object baz datetime64[ns] qux category Length: 4, dtype: object In [4]: df.to_json('test.json', orient='table') In [5]: new_df = pd.read_json('test.json', orient='table') In [6]: new_df Out[6]: foo bar baz qux idx 0 1 a 2018-01-01 a 1 2 b 2018-01-02 b 2 3 c 2018-01-03 c 3 4 d 2018-01-04 c [4 rows x 4 columns] In [7]: new_df.dtypes Out[7]: foo int64 bar object baz datetime64[ns] qux category Length: 4, dtype: object
Please note that the string index is not supported with the round trip format, as it is used by default in write_json to indicate a missing index name.
write_json
In [8]: df.index.name = 'index' In [9]: df.to_json('test.json', orient='table') In [10]: new_df = pd.read_json('test.json', orient='table') In [11]: new_df Out[11]: foo bar baz qux 0 1 a 2018-01-01 a 1 2 b 2018-01-02 b 2 3 c 2018-01-03 c 3 4 d 2018-01-04 c [4 rows x 4 columns] In [12]: new_df.dtypes Out[12]: foo int64 bar object baz datetime64[ns] qux category Length: 4, dtype: object
The DataFrame.assign() now accepts dependent keyword arguments for python version later than 3.6 (see also PEP 468). Later keyword arguments may now refer to earlier ones if the argument is a callable. See the documentation here (GH14207)
DataFrame.assign()
In [13]: df = pd.DataFrame({'A': [1, 2, 3]}) In [14]: df Out[14]: A 0 1 1 2 2 3 [3 rows x 1 columns] In [15]: df.assign(B=df.A, C=lambda x: x['A'] + x['B']) Out[15]: A B C 0 1 1 2 1 2 2 4 2 3 3 6 [3 rows x 3 columns]
This may subtly change the behavior of your code when you’re using .assign() to update an existing column. Previously, callables referring to other variables being updated would get the “old” values
Previous behavior:
In [2]: df = pd.DataFrame({"A": [1, 2, 3]}) In [3]: df.assign(A=lambda df: df.A + 1, C=lambda df: df.A * -1) Out[3]: A C 0 2 -1 1 3 -2 2 4 -3
New behavior:
In [16]: df.assign(A=df.A + 1, C=lambda df: df.A * -1) Out[16]: A C 0 2 -2 1 3 -3 2 4 -4 [3 rows x 2 columns]
Strings passed to DataFrame.merge() as the on, left_on, and right_on parameters may now refer to either column names or index level names. This enables merging DataFrame instances on a combination of index levels and columns without resetting indexes. See the Merge on columns and levels documentation section. (GH14355)
DataFrame.merge()
on
left_on
right_on
In [17]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') In [18]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], ....: 'B': ['B0', 'B1', 'B2', 'B3'], ....: 'key2': ['K0', 'K1', 'K0', 'K1']}, ....: index=left_index) ....: In [19]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1') In [20]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], ....: 'D': ['D0', 'D1', 'D2', 'D3'], ....: 'key2': ['K0', 'K0', 'K0', 'K1']}, ....: index=right_index) ....: In [21]: left.merge(right, on=['key1', 'key2']) Out[21]: A B key2 C D key1 K0 A0 B0 K0 C0 D0 K1 A2 B2 K0 C1 D1 K2 A3 B3 K1 C3 D3 [3 rows x 5 columns]
Strings passed to DataFrame.sort_values() as the by parameter may now refer to either column names or index level names. This enables sorting DataFrame instances by a combination of index levels and columns without resetting indexes. See the Sorting by Indexes and Values documentation section. (GH14353)
DataFrame.sort_values()
by
# Build MultiIndex In [22]: idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 2), ....: ('b', 2), ('b', 1), ('b', 1)]) ....: In [23]: idx.names = ['first', 'second'] # Build DataFrame In [24]: df_multi = pd.DataFrame({'A': np.arange(6, 0, -1)}, ....: index=idx) ....: In [25]: df_multi Out[25]: A first second a 1 6 2 5 2 4 b 2 3 1 2 1 1 [6 rows x 1 columns] # Sort by 'second' (index) and 'A' (column) In [26]: df_multi.sort_values(by=['second', 'A']) Out[26]: A first second b 1 1 1 2 a 1 6 b 2 3 a 2 4 2 5 [6 rows x 1 columns]
Pandas now supports storing array-like objects that aren’t necessarily 1-D NumPy arrays as columns in a DataFrame or values in a Series. This allows third-party libraries to implement extensions to NumPy’s types, similar to how pandas implemented categoricals, datetimes with timezones, periods, and intervals.
As a demonstration, we’ll use cyberpandas, which provides an IPArray type for storing ip addresses.
IPArray
In [1]: from cyberpandas import IPArray In [2]: values = IPArray([ ...: 0, ...: 3232235777, ...: 42540766452641154071740215577757643572 ...: ]) ...: ...:
IPArray isn’t a normal 1-D NumPy array, but because it’s a pandas ExtensionArray, it can be stored properly inside pandas’ containers.
ExtensionArray
In [3]: ser = pd.Series(values) In [4]: ser Out[4]: 0 0.0.0.0 1 192.168.1.1 2 2001:db8:85a3::8a2e:370:7334 dtype: ip
Notice that the dtype is ip. The missing value semantics of the underlying array are respected:
ip
In [5]: ser.isna() Out[5]: 0 True 1 False 2 False dtype: bool
For more, see the extension types documentation. If you build an extension array, publicize it on our ecosystem page.
Grouping by a categorical includes the unobserved categories in the output. When grouping by multiple categorical columns, this means you get the cartesian product of all the categories, including combinations where there are no observations, which can result in a large number of groups. We have added a keyword observed to control this behavior, it defaults to observed=False for backward-compatibility. (GH14942, GH8138, GH15217, GH17594, GH8669, GH20583, GH20902)
observed=False
In [27]: cat1 = pd.Categorical(["a", "a", "b", "b"], ....: categories=["a", "b", "z"], ordered=True) ....: In [28]: cat2 = pd.Categorical(["c", "d", "c", "d"], ....: categories=["c", "d", "y"], ordered=True) ....: In [29]: df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]}) In [30]: df['C'] = ['foo', 'bar'] * 2 In [31]: df Out[31]: A B values C 0 a c 1 foo 1 a d 2 bar 2 b c 3 foo 3 b d 4 bar [4 rows x 4 columns]
To show all values, the previous behavior:
In [32]: df.groupby(['A', 'B', 'C'], observed=False).count() Out[32]: values A B C a c bar NaN foo 1.0 d bar 1.0 foo NaN y bar NaN ... ... z c foo NaN d bar NaN foo NaN y bar NaN foo NaN [18 rows x 1 columns]
To show only observed values:
In [33]: df.groupby(['A', 'B', 'C'], observed=True).count() Out[33]: values A B C a c foo 1 d bar 1 b c foo 1 d bar 1 [4 rows x 1 columns]
For pivoting operations, this behavior is already controlled by the dropna keyword:
dropna
In [34]: cat1 = pd.Categorical(["a", "a", "b", "b"], ....: categories=["a", "b", "z"], ordered=True) ....: In [35]: cat2 = pd.Categorical(["c", "d", "c", "d"], ....: categories=["c", "d", "y"], ordered=True) ....: In [36]: df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]}) In [37]: df Out[37]: A B values 0 a c 1 1 a d 2 2 b c 3 3 b d 4 [4 rows x 3 columns]
In [38]: pd.pivot_table(df, values='values', index=['A', 'B'], ....: dropna=True) ....: Out[38]: values A B a c 1 d 2 b c 3 d 4 [4 rows x 1 columns] In [39]: pd.pivot_table(df, values='values', index=['A', 'B'], ....: dropna=False) ....: Out[39]: values A B a c 1.0 d 2.0 y NaN b c 3.0 d 4.0 y NaN z c NaN d NaN y NaN [9 rows x 1 columns]
Series.rolling().apply(), DataFrame.rolling().apply(), Series.expanding().apply(), and DataFrame.expanding().apply() have gained a raw=None parameter. This is similar to DataFame.apply(). This parameter, if True allows one to send a np.ndarray to the applied function. If False a Series will be passed. The default is None, which preserves backward compatibility, so this will default to True, sending an np.ndarray. In a future version the default will be changed to False, sending a Series. (GH5071, GH20584)
Series.rolling().apply()
DataFrame.rolling().apply()
Series.expanding().apply()
DataFrame.expanding().apply()
raw=None
DataFame.apply()
True
np.ndarray
False
None
In [40]: s = pd.Series(np.arange(5), np.arange(5) + 1) In [41]: s Out[41]: 1 0 2 1 3 2 4 3 5 4 Length: 5, dtype: int64
Pass a Series:
In [42]: s.rolling(2, min_periods=1).apply(lambda x: x.iloc[-1], raw=False) Out[42]: 1 0.0 2 1.0 3 2.0 4 3.0 5 4.0 Length: 5, dtype: float64
Mimic the original behavior of passing a ndarray:
In [43]: s.rolling(2, min_periods=1).apply(lambda x: x[-1], raw=True) Out[43]: 1 0.0 2 1.0 3 2.0 4 3.0 5 4.0 Length: 5, dtype: float64
DataFrame.interpolate() has gained a limit_area parameter to allow further control of which NaN s are replaced. Use limit_area='inside' to fill only NaNs surrounded by valid values or use limit_area='outside' to fill only NaN s outside the existing valid values while preserving those inside. (GH16284) See the full documentation here.
DataFrame.interpolate()
limit_area='inside'
limit_area='outside'
In [44]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, ....: np.nan, 13, np.nan, np.nan]) ....: In [45]: ser Out[45]: 0 NaN 1 NaN 2 5.0 3 NaN 4 NaN 5 NaN 6 13.0 7 NaN 8 NaN Length: 9, dtype: float64
Fill one consecutive inside value in both directions
In [46]: ser.interpolate(limit_direction='both', limit_area='inside', limit=1) Out[46]: 0 NaN 1 NaN 2 5.0 3 7.0 4 NaN 5 11.0 6 13.0 7 NaN 8 NaN Length: 9, dtype: float64
Fill all consecutive outside values backward
In [47]: ser.interpolate(limit_direction='backward', limit_area='outside') Out[47]: 0 5.0 1 5.0 2 5.0 3 NaN 4 NaN 5 NaN 6 13.0 7 NaN 8 NaN Length: 9, dtype: float64
Fill all consecutive outside values in both directions
In [48]: ser.interpolate(limit_direction='both', limit_area='outside') Out[48]: 0 5.0 1 5.0 2 5.0 3 NaN 4 NaN 5 NaN 6 13.0 7 13.0 8 13.0 Length: 9, dtype: float64
The get_dummies() now accepts a dtype argument, which specifies a dtype for the new columns. The default remains uint8. (GH18330)
get_dummies()
In [49]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]}) In [50]: pd.get_dummies(df, columns=['c']).dtypes Out[50]: a int64 b int64 c_5 uint8 c_6 uint8 Length: 4, dtype: object In [51]: pd.get_dummies(df, columns=['c'], dtype=bool).dtypes Out[51]: a int64 b int64 c_5 bool c_6 bool Length: 4, dtype: object
mod (%) and divmod operations are now defined on Timedelta objects when operating with either timedelta-like or with numeric arguments. See the documentation here. (GH19365)
mod
divmod
In [52]: td = pd.Timedelta(hours=37) In [53]: td % pd.Timedelta(minutes=45) Out[53]: Timedelta('0 days 00:15:00')
In previous versions, .rank() would assign inf elements NaN as their ranks. Now ranks are calculated properly. (GH6945)
In [54]: s = pd.Series([-np.inf, 0, 1, np.nan, np.inf]) In [55]: s Out[55]: 0 -inf 1 0.0 2 1.0 3 NaN 4 inf Length: 5, dtype: float64
In [11]: s.rank() Out[11]: 0 1.0 1 2.0 2 3.0 3 NaN 4 NaN dtype: float64
Current behavior:
In [56]: s.rank() Out[56]: 0 1.0 1 2.0 2 3.0 3 NaN 4 4.0 Length: 5, dtype: float64
Furthermore, previously if you rank inf or -inf values together with NaN values, the calculation won’t distinguish NaN from infinity when using ‘top’ or ‘bottom’ argument.
-inf
In [57]: s = pd.Series([np.nan, np.nan, -np.inf, -np.inf]) In [58]: s Out[58]: 0 NaN 1 NaN 2 -inf 3 -inf Length: 4, dtype: float64
In [15]: s.rank(na_option='top') Out[15]: 0 2.5 1 2.5 2 2.5 3 2.5 dtype: float64
In [59]: s.rank(na_option='top') Out[59]: 0 1.5 1 1.5 2 3.5 3 3.5 Length: 4, dtype: float64
These bugs were squashed:
Bug in DataFrame.rank() and Series.rank() when method='dense' and pct=True in which percentile ranks were not being used with the number of distinct observations (GH15630)
DataFrame.rank()
Series.rank()
method='dense'
pct=True
Bug in Series.rank() and DataFrame.rank() when ascending='False' failed to return correct ranks for infinity if NaN were present (GH19538)
ascending='False'
Bug in DataFrameGroupBy.rank() where ranks were incorrect when both infinity and NaN were present (GH20561)
DataFrameGroupBy.rank()
Previously, Series.str.cat() did not – in contrast to most of pandas – align Series on their index before concatenation (see GH18657). The method has now gained a keyword join to control the manner of alignment, see examples below and here.
Series.str.cat()
pandas
In v.0.23 join will default to None (meaning no alignment), but this default will change to 'left' in a future version of pandas.
'left'
In [60]: s = pd.Series(['a', 'b', 'c', 'd']) In [61]: t = pd.Series(['b', 'd', 'e', 'c'], index=[1, 3, 4, 2]) In [62]: s.str.cat(t) Out[62]: 0 NaN 1 bb 2 cc 3 dd Length: 4, dtype: object In [63]: s.str.cat(t, join='left', na_rep='-') Out[63]: 0 a- 1 bb 2 cc 3 dd Length: 4, dtype: object
Furthermore, Series.str.cat() now works for CategoricalIndex as well (previously raised a ValueError; see GH20842).
CategoricalIndex
ValueError
DataFrame.astype() can now perform column-wise conversion to Categorical by supplying the string 'category' or a CategoricalDtype. Previously, attempting this would raise a NotImplementedError. See the Object creation section of the documentation for more details and examples. (GH12860, GH18099)
DataFrame.astype()
'category'
NotImplementedError
Supplying the string 'category' performs column-wise conversion, with only labels appearing in a given column set as categories:
In [64]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')}) In [65]: df = df.astype('category') In [66]: df['A'].dtype Out[66]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False) In [67]: df['B'].dtype Out[67]: CategoricalDtype(categories=['b', 'c', 'd'], ordered=False)
Supplying a CategoricalDtype will make the categories in each column consistent with the supplied dtype:
In [68]: from pandas.api.types import CategoricalDtype In [69]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')}) In [70]: cdt = CategoricalDtype(categories=list('abcd'), ordered=True) In [71]: df = df.astype(cdt) In [72]: df['A'].dtype Out[72]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True) In [73]: df['B'].dtype Out[73]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=True)
Unary + now permitted for Series and DataFrame as numeric operator (GH16073)
+
Better support for to_excel() output with the xlsxwriter engine. (GH16149)
to_excel()
xlsxwriter
pandas.tseries.frequencies.to_offset() now accepts leading ‘+’ signs e.g. ‘+1h’. (GH18171)
pandas.tseries.frequencies.to_offset()
MultiIndex.unique() now supports the level= argument, to get unique values from a specific index level (GH17896)
MultiIndex.unique()
level=
pandas.io.formats.style.Styler now has method hide_index() to determine whether the index will be rendered in output (GH14194)
pandas.io.formats.style.Styler
hide_index()
pandas.io.formats.style.Styler now has method hide_columns() to determine whether columns will be hidden in output (GH14194)
hide_columns()
Improved wording of ValueError raised in to_datetime() when unit= is passed with a non-convertible value (GH14350)
to_datetime()
unit=
Series.fillna() now accepts a Series or a dict as a value for a categorical dtype (GH17033)
Series.fillna()
value
pandas.read_clipboard() updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (GH17722)
pandas.read_clipboard()
Improved wording of ValueError raised in read_csv() when the usecols argument cannot match all columns. (GH17301)
read_csv()
usecols
DataFrame.corrwith() now silently drops non-numeric columns when passed a Series. Before, an exception was raised (GH18570).
DataFrame.corrwith()
IntervalIndex now supports time zone aware Interval objects (GH18537, GH18538)
IntervalIndex
Interval
Series() / DataFrame() tab completion also returns identifiers in the first level of a MultiIndex(). (GH16326)
Series()
DataFrame()
MultiIndex()
read_excel() has gained the nrows parameter (GH16645)
read_excel()
nrows
DataFrame.append() can now in more cases preserve the type of the calling dataframe’s columns (e.g. if both are CategoricalIndex) (GH18359)
DataFrame.append()
DataFrame.to_json() and Series.to_json() now accept an index argument which allows the user to exclude the index from the JSON output (GH17394)
DataFrame.to_json()
Series.to_json()
index
IntervalIndex.to_tuples() has gained the na_tuple parameter to control whether NA is returned as a tuple of NA, or NA itself (GH18756)
IntervalIndex.to_tuples()
na_tuple
Categorical.rename_categories, CategoricalIndex.rename_categories and Series.cat.rename_categories can now take a callable as their argument (GH18862)
Categorical.rename_categories
CategoricalIndex.rename_categories
Series.cat.rename_categories
Interval and IntervalIndex have gained a length attribute (GH18789)
length
Resampler objects now have a functioning pipe method. Previously, calls to pipe were diverted to the mean method (GH17905).
Resampler
pipe
mean
is_scalar() now returns True for DateOffset objects (GH18943).
is_scalar()
DateOffset
DataFrame.pivot() now accepts a list for the values= kwarg (GH17160).
DataFrame.pivot()
values=
Added pandas.api.extensions.register_dataframe_accessor(), pandas.api.extensions.register_series_accessor(), and pandas.api.extensions.register_index_accessor(), accessor for libraries downstream of pandas to register custom accessors like .cat on pandas objects. See Registering Custom Accessors for more (GH14781).
pandas.api.extensions.register_dataframe_accessor()
pandas.api.extensions.register_series_accessor()
pandas.api.extensions.register_index_accessor()
.cat
IntervalIndex.astype now supports conversions between subtypes when passed an IntervalDtype (GH19197)
IntervalIndex.astype
IntervalDtype
IntervalIndex and its associated constructor methods (from_arrays, from_breaks, from_tuples) have gained a dtype parameter (GH19262)
from_arrays
from_breaks
from_tuples
Added pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing() and pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing() (GH17015)
pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing()
pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing()
For subclassed DataFrames, DataFrame.apply() will now preserve the Series subclass (if defined) when passing the data to the applied function (GH19822)
DataFrames
DataFrame.apply()
DataFrame.from_dict() now accepts a columns argument that can be used to specify the column names when orient='index' is used (GH18529)
DataFrame.from_dict()
columns
orient='index'
Added option display.html.use_mathjax so MathJax can be disabled when rendering tables in Jupyter notebooks (GH19856, GH19824)
display.html.use_mathjax
Jupyter
DataFrame.replace() now supports the method parameter, which can be used to specify the replacement method when to_replace is a scalar, list or tuple and value is None (GH19632)
DataFrame.replace()
method
to_replace
Timestamp.month_name(), DatetimeIndex.month_name(), and Series.dt.month_name() are now available (GH12805)
Timestamp.month_name()
DatetimeIndex.month_name()
Series.dt.month_name()
Timestamp.day_name() and DatetimeIndex.day_name() are now available to return day names with a specified locale (GH12806)
Timestamp.day_name()
DatetimeIndex.day_name()
DataFrame.to_sql() now performs a multi-value insert if the underlying connection supports itk rather than inserting row by row. SQLAlchemy dialects supporting multi-value inserts include: mysql, postgresql, sqlite and any dialect with supports_multivalues_insert. (GH14315, GH8953)
DataFrame.to_sql()
SQLAlchemy
mysql
postgresql
sqlite
supports_multivalues_insert
read_html() now accepts a displayed_only keyword argument to controls whether or not hidden elements are parsed (True by default) (GH20027)
read_html()
displayed_only
read_html() now reads all <tbody> elements in a <table>, not just the first. (GH20690)
<tbody>
<table>
quantile() and quantile() now accept the interpolation keyword, linear by default (GH20497)
quantile()
interpolation
linear
zip compression is supported via compression=zip in DataFrame.to_pickle(), Series.to_pickle(), DataFrame.to_csv(), Series.to_csv(), DataFrame.to_json(), Series.to_json(). (GH17778)
compression=zip
DataFrame.to_pickle()
Series.to_pickle()
DataFrame.to_csv()
Series.to_csv()
WeekOfMonth constructor now supports n=0 (GH20517).
WeekOfMonth
n=0
DataFrame and Series now support matrix multiplication (@) operator (GH10259) for Python>=3.5
@
Updated DataFrame.to_gbq() and pandas.read_gbq() signature and documentation to reflect changes from the Pandas-GBQ library version 0.4.0. Adds intersphinx mapping to Pandas-GBQ library. (GH20564)
DataFrame.to_gbq()
pandas.read_gbq()
Added new writer for exporting Stata dta files in version 117, StataWriter117. This format supports exporting strings with lengths up to 2,000,000 characters (GH16450)
StataWriter117
to_hdf() and read_hdf() now accept an errors keyword argument to control encoding error handling (GH20835)
to_hdf()
read_hdf()
errors
cut() has gained the duplicates='raise'|'drop' option to control whether to raise on duplicated edges (GH20947)
cut()
duplicates='raise'|'drop'
date_range(), timedelta_range(), and interval_range() now return a linearly spaced index if start, stop, and periods are specified, but freq is not. (GH20808, GH20983, GH20976)
date_range()
timedelta_range()
interval_range()
start
stop
periods
freq
We have updated our minimum supported versions of dependencies (GH15184). If installed, we now require:
Package
Minimum Version
Required
Issue
python-dateutil
2.5.0
X
GH15184
openpyxl
2.4.0
beautifulsoup4
4.2.1
GH20082
setuptools
24.2.0
GH20698
Until Python 3.6, dicts in Python had no formally defined ordering. For Python version 3.6 and later, dicts are ordered by insertion order, see PEP 468. Pandas will use the dict’s insertion order, when creating a Series or DataFrame from a dict and you’re using Python version 3.6 or higher. (GH19884)
Previous behavior (and current behavior if on Python < 3.6):
In [16]: pd.Series({'Income': 2000, ....: 'Expenses': -1500, ....: 'Taxes': -200, ....: 'Net result': 300}) Out[16]: Expenses -1500 Income 2000 Net result 300 Taxes -200 dtype: int64
Note the Series above is ordered alphabetically by the index values.
New behavior (for Python >= 3.6):
In [74]: pd.Series({'Income': 2000, ....: 'Expenses': -1500, ....: 'Taxes': -200, ....: 'Net result': 300}) ....: Out[74]: Income 2000 Expenses -1500 Taxes -200 Net result 300 Length: 4, dtype: int64
Notice that the Series is now ordered by insertion order. This new behavior is used for all relevant pandas types (Series, DataFrame, SparseSeries and SparseDataFrame).
SparseSeries
SparseDataFrame
If you wish to retain the old behavior while using Python >= 3.6, you can use .sort_index():
.sort_index()
In [75]: pd.Series({'Income': 2000, ....: 'Expenses': -1500, ....: 'Taxes': -200, ....: 'Net result': 300}).sort_index() ....: Out[75]: Expenses -1500 Income 2000 Net result 300 Taxes -200 Length: 4, dtype: int64
Panel was deprecated in the 0.20.x release, showing as a DeprecationWarning. Using Panel will now show a FutureWarning. The recommended way to represent 3-D data are with a MultiIndex on a DataFrame via the to_frame() or with the xarray package. Pandas provides a to_xarray() method to automate this conversion (GH13563, GH18324).
Panel
DeprecationWarning
FutureWarning
to_frame()
to_xarray()
In [75]: import pandas._testing as tm In [76]: p = tm.makePanel() In [77]: p Out[77]: <class 'pandas.core.panel.Panel'> Dimensions: 3 (items) x 3 (major_axis) x 4 (minor_axis) Items axis: ItemA to ItemC Major_axis axis: 2000-01-03 00:00:00 to 2000-01-05 00:00:00 Minor_axis axis: A to D
Convert to a MultiIndex DataFrame
In [78]: p.to_frame() Out[78]: ItemA ItemB ItemC major minor 2000-01-03 A 0.469112 0.721555 0.404705 B -1.135632 0.271860 -1.039268 C 0.119209 0.276232 -1.344312 D -2.104569 0.113648 -0.109050 2000-01-04 A -0.282863 -0.706771 0.577046 B 1.212112 -0.424972 -0.370647 C -1.044236 -1.087401 0.844885 D -0.494929 -1.478427 1.643563 2000-01-05 A -1.509059 -1.039575 -1.715002 B -0.173215 0.567020 -1.157892 C -0.861849 -0.673690 1.075770 D 1.071804 0.524988 -1.469388 [12 rows x 3 columns]
Convert to an xarray DataArray
In [79]: p.to_xarray() Out[79]: <xarray.DataArray (items: 3, major_axis: 3, minor_axis: 4)> array([[[ 0.469112, -1.135632, 0.119209, -2.104569], [-0.282863, 1.212112, -1.044236, -0.494929], [-1.509059, -0.173215, -0.861849, 1.071804]], [[ 0.721555, 0.27186 , 0.276232, 0.113648], [-0.706771, -0.424972, -1.087401, -1.478427], [-1.039575, 0.56702 , -0.67369 , 0.524988]], [[ 0.404705, -1.039268, -1.344312, -0.10905 ], [ 0.577046, -0.370647, 0.844885, 1.643563], [-1.715002, -1.157892, 1.07577 , -1.469388]]]) Coordinates: * items (items) object 'ItemA' 'ItemB' 'ItemC' * major_axis (major_axis) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05 * minor_axis (minor_axis) object 'A' 'B' 'C' 'D'
The following error & warning messages are removed from pandas.core.common (GH13634, GH19769):
pandas.core.common
PerformanceWarning
UnsupportedFunctionCall
UnsortedIndexError
AbstractMethodError
These are available from import from pandas.errors (since 0.19.0).
pandas.errors
DataFrame.apply() was inconsistent when applying an arbitrary user-defined-function that returned a list-like with axis=1. Several bugs and inconsistencies are resolved. If the applied function returns a Series, then pandas will return a DataFrame; otherwise a Series will be returned, this includes the case where a list-like (e.g. tuple or list is returned) (GH16353, GH17437, GH17970, GH17348, GH17892, GH18573, GH17602, GH18775, GH18901, GH18919).
axis=1
tuple
list
In [76]: df = pd.DataFrame(np.tile(np.arange(3), 6).reshape(6, -1) + 1, ....: columns=['A', 'B', 'C']) ....: In [77]: df Out[77]: A B C 0 1 2 3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 [6 rows x 3 columns]
Previous behavior: if the returned shape happened to match the length of original columns, this would return a DataFrame. If the return shape did not match, a Series with lists was returned.
In [3]: df.apply(lambda x: [1, 2, 3], axis=1) Out[3]: A B C 0 1 2 3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 In [4]: df.apply(lambda x: [1, 2], axis=1) Out[4]: 0 [1, 2] 1 [1, 2] 2 [1, 2] 3 [1, 2] 4 [1, 2] 5 [1, 2] dtype: object
New behavior: When the applied function returns a list-like, this will now always return a Series.
In [78]: df.apply(lambda x: [1, 2, 3], axis=1) Out[78]: 0 [1, 2, 3] 1 [1, 2, 3] 2 [1, 2, 3] 3 [1, 2, 3] 4 [1, 2, 3] 5 [1, 2, 3] Length: 6, dtype: object In [79]: df.apply(lambda x: [1, 2], axis=1) Out[79]: 0 [1, 2] 1 [1, 2] 2 [1, 2] 3 [1, 2] 4 [1, 2] 5 [1, 2] Length: 6, dtype: object
To have expanded columns, you can use result_type='expand'
result_type='expand'
In [80]: df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand') Out[80]: 0 1 2 0 1 2 3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 [6 rows x 3 columns]
To broadcast the result across the original columns (the old behaviour for list-likes of the correct length), you can use result_type='broadcast'. The shape must match the original columns.
result_type='broadcast'
In [81]: df.apply(lambda x: [1, 2, 3], axis=1, result_type='broadcast') Out[81]: A B C 0 1 2 3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 [6 rows x 3 columns]
Returning a Series allows one to control the exact return structure and column names:
In [82]: df.apply(lambda x: pd.Series([1, 2, 3], index=['D', 'E', 'F']), axis=1) Out[82]: D E F 0 1 2 3 1 1 2 3 2 1 2 3 3 1 2 3 4 1 2 3 5 1 2 3 [6 rows x 3 columns]
In a future version of pandas pandas.concat() will no longer sort the non-concatenation axis when it is not already aligned. The current behavior is the same as the previous (sorting), but now a warning is issued when sort is not specified and the non-concatenation axis is not aligned (GH4588).
pandas.concat()
sort
In [83]: df1 = pd.DataFrame({"a": [1, 2], "b": [1, 2]}, columns=['b', 'a']) In [84]: df2 = pd.DataFrame({"a": [4, 5]}) In [85]: pd.concat([df1, df2]) Out[85]: b a 0 1.0 1 1 2.0 2 0 NaN 4 1 NaN 5 [4 rows x 2 columns]
To keep the previous behavior (sorting) and silence the warning, pass sort=True
sort=True
In [86]: pd.concat([df1, df2], sort=True) Out[86]: a b 0 1 1.0 1 2 2.0 0 4 NaN 1 5 NaN [4 rows x 2 columns]
To accept the future behavior (no sorting), pass sort=False
sort=False
Note that this change also applies to DataFrame.append(), which has also received a sort keyword for controlling this behavior.
Building pandas for development now requires cython >= 0.24 (GH18613)
cython >= 0.24
Building from source now explicitly requires setuptools in setup.py (GH18113)
setup.py
Updated conda recipe to be in compliance with conda-build 3.0+ (GH18002)
Division operations on Index and subclasses will now fill division of positive numbers by zero with np.inf, division of negative numbers by zero with -np.inf and 0 / 0 with np.nan. This matches existing Series behavior. (GH19322, GH19347)
Index
np.inf
-np.inf
np.nan
In [6]: index = pd.Int64Index([-1, 0, 1]) In [7]: index / 0 Out[7]: Int64Index([0, 0, 0], dtype='int64') # Previous behavior yielded different results depending on the type of zero in the divisor In [8]: index / 0.0 Out[8]: Float64Index([-inf, nan, inf], dtype='float64') In [9]: index = pd.UInt64Index([0, 1]) In [10]: index / np.array([0, 0], dtype=np.uint64) Out[10]: UInt64Index([0, 0], dtype='uint64') In [11]: pd.RangeIndex(1, 5) / 0 ZeroDivisionError: integer division or modulo by zero
In [87]: index = pd.Int64Index([-1, 0, 1]) # division by zero gives -infinity where negative, # +infinity where positive, and NaN for 0 / 0 In [88]: index / 0 Out[88]: Float64Index([-inf, nan, inf], dtype='float64') # The result of division by zero should not depend on # whether the zero is int or float In [89]: index / 0.0 Out[89]: Float64Index([-inf, nan, inf], dtype='float64') In [90]: index = pd.UInt64Index([0, 1]) In [91]: index / np.array([0, 0], dtype=np.uint64) Out[91]: Float64Index([nan, inf], dtype='float64') In [92]: pd.RangeIndex(1, 5) / 0 Out[92]: Float64Index([inf, inf, inf, inf], dtype='float64')
By default, extracting matching patterns from strings with str.extract() used to return a Series if a single group was being extracted (a DataFrame if more than one group was extracted). As of Pandas 0.23.0 str.extract() always returns a DataFrame, unless expand is set to False. Finally, None was an accepted value for the expand parameter (which was equivalent to False), but now raises a ValueError. (GH11386)
str.extract()
expand
In [1]: s = pd.Series(['number 10', '12 eggs']) In [2]: extracted = s.str.extract(r'.*(\d\d).*') In [3]: extracted Out [3]: 0 10 1 12 dtype: object In [4]: type(extracted) Out [4]: pandas.core.series.Series
In [93]: s = pd.Series(['number 10', '12 eggs']) In [94]: extracted = s.str.extract(r'.*(\d\d).*') In [95]: extracted Out[95]: 0 0 10 1 12 [2 rows x 1 columns] In [96]: type(extracted) Out[96]: pandas.core.frame.DataFrame
To restore previous behavior, simply set expand to False:
In [97]: s = pd.Series(['number 10', '12 eggs']) In [98]: extracted = s.str.extract(r'.*(\d\d).*', expand=False) In [99]: extracted Out[99]: 0 10 1 12 Length: 2, dtype: object In [100]: type(extracted) Out[100]: pandas.core.series.Series
The default value of the ordered parameter for CategoricalDtype has changed from False to None to allow updating of categories without impacting ordered. Behavior should remain consistent for downstream objects, such as Categorical (GH18790)
categories
In previous versions, the default value for the ordered parameter was False. This could potentially lead to the ordered parameter unintentionally being changed from True to False when users attempt to update categories if ordered is not explicitly specified, as it would silently default to False. The new behavior for ordered=None is to retain the existing value of ordered.
ordered=None
In [2]: from pandas.api.types import CategoricalDtype In [3]: cat = pd.Categorical(list('abcaba'), ordered=True, categories=list('cba')) In [4]: cat Out[4]: [a, b, c, a, b, a] Categories (3, object): [c < b < a] In [5]: cdt = CategoricalDtype(categories=list('cbad')) In [6]: cat.astype(cdt) Out[6]: [a, b, c, a, b, a] Categories (4, object): [c < b < a < d]
Notice in the example above that the converted Categorical has retained ordered=True. Had the default value for ordered remained as False, the converted Categorical would have become unordered, despite ordered=False never being explicitly specified. To change the value of ordered, explicitly pass it to the new dtype, e.g. CategoricalDtype(categories=list('cbad'), ordered=False).
ordered=True
ordered=False
CategoricalDtype(categories=list('cbad'), ordered=False)
Note that the unintentional conversion of ordered discussed above did not arise in previous versions due to separate bugs that prevented astype from doing any type of category to category conversion (GH10696, GH18593). These bugs have been fixed in this release, and motivated changing the default value of ordered.
astype
Previously, the default value for the maximum number of columns was pd.options.display.max_columns=20. This meant that relatively wide data frames would not fit within the terminal width, and pandas would introduce line breaks to display these 20 columns. This resulted in an output that was relatively difficult to read:
pd.options.display.max_columns=20
If Python runs in a terminal, the maximum number of columns is now determined automatically so that the printed data frame fits within the current terminal width (pd.options.display.max_columns=0) (GH17023). If Python runs as a Jupyter kernel (such as the Jupyter QtConsole or a Jupyter notebook, as well as in many IDEs), this value cannot be inferred automatically and is thus set to 20 as in previous versions. In a terminal, this results in a much nicer output:
pd.options.display.max_columns=0
Note that if you don’t like the new default, you can always set this option yourself. To revert to the old setting, you can run this line:
pd.options.display.max_columns = 20
The default Timedelta constructor now accepts an ISO 8601 Duration string as an argument (GH19040)
ISO 8601 Duration
Subtracting NaT from a Series with dtype='datetime64[ns]' returns a Series with dtype='timedelta64[ns]' instead of dtype='datetime64[ns]' (GH18808)
NaT
dtype='datetime64[ns]'
dtype='timedelta64[ns]'
Addition or subtraction of NaT from TimedeltaIndex will return TimedeltaIndex instead of DatetimeIndex (GH19124)
TimedeltaIndex
DatetimeIndex
DatetimeIndex.shift() and TimedeltaIndex.shift() will now raise NullFrequencyError (which subclasses ValueError, which was raised in older versions) when the index object frequency is None (GH19147)
DatetimeIndex.shift()
TimedeltaIndex.shift()
NullFrequencyError
Addition and subtraction of NaN from a Series with dtype='timedelta64[ns]' will raise a TypeError instead of treating the NaN as NaT (GH19274)
TypeError
NaT division with datetime.timedelta will now return NaN instead of raising (GH17876)
datetime.timedelta
Operations between a Series with dtype dtype='datetime64[ns]' and a PeriodIndex will correctly raises TypeError (GH18850)
PeriodIndex
Subtraction of Series with timezone-aware dtype='datetime64[ns]' with mis-matched timezones will raise TypeError instead of ValueError (GH18817)
Timestamp will no longer silently ignore unused or invalid tz or tzinfo keyword arguments (GH17690)
Timestamp
tz
tzinfo
Timestamp will no longer silently ignore invalid freq arguments (GH5168)
CacheableOffset and WeekDay are no longer available in the pandas.tseries.offsets module (GH17830)
CacheableOffset
WeekDay
pandas.tseries.offsets
pandas.tseries.frequencies.get_freq_group() and pandas.tseries.frequencies.DAYS are removed from the public API (GH18034)
pandas.tseries.frequencies.get_freq_group()
pandas.tseries.frequencies.DAYS
Series.truncate() and DataFrame.truncate() will raise a ValueError if the index is not sorted instead of an unhelpful KeyError (GH17935)
Series.truncate()
DataFrame.truncate()
KeyError
Series.first and DataFrame.first will now raise a TypeError rather than NotImplementedError when index is not a DatetimeIndex (GH20725).
Series.first
DataFrame.first
Series.last and DataFrame.last will now raise a TypeError rather than NotImplementedError when index is not a DatetimeIndex (GH20725).
Series.last
DataFrame.last
Restricted DateOffset keyword arguments. Previously, DateOffset subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (GH17176, GH18226).
pandas.merge() provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (GH15800)
pandas.merge()
For DatetimeIndex and TimedeltaIndex with freq=None, addition or subtraction of integer-dtyped array or Index will raise NullFrequencyError instead of TypeError (GH19895)
freq=None
Timestamp constructor now accepts a nanosecond keyword or positional argument (GH18898)
DatetimeIndex will now raise an AttributeError when the tz attribute is set after instantiation (GH3746)
AttributeError
DatetimeIndex with a pytz timezone will now return a consistent pytz timezone (GH18595)
pytz
Series.astype() and Index.astype() with an incompatible dtype will now raise a TypeError rather than a ValueError (GH18231)
Series.astype()
Index.astype()
Series construction with an object dtyped tz-aware datetime and dtype=object specified, will now return an object dtyped Series, previously this would infer the datetime dtype (GH18231)
object
dtype=object
A Series of dtype=category constructed from an empty dict will now have categories of dtype=object rather than dtype=float64, consistently with the case in which an empty list is passed (GH18515)
dtype=category
dict
dtype=float64
All-NaN levels in a MultiIndex are now assigned float rather than object dtype, promoting consistency with Index (GH17929).
float
Levels names of a MultiIndex (when not None) are now required to be unique: trying to create a MultiIndex with repeated names will raise a ValueError (GH18872)
Both construction and renaming of Index/MultiIndex with non-hashable name/names will now raise TypeError (GH20527)
name
names
Index.map() can now accept Series and dictionary input objects (GH12756, GH18482, GH18509).
Index.map()
DataFrame.unstack() will now default to filling with np.nan for object columns. (GH12815)
DataFrame.unstack()
IntervalIndex constructor will raise if the closed parameter conflicts with how the input data is inferred to be closed (GH18421)
closed
Inserting missing values into indexes will work for all types of indexes and automatically insert the correct type of missing value (NaN, NaT, etc.) regardless of the type passed in (GH18295)
When created with duplicate labels, MultiIndex now raises a ValueError. (GH17464)
Series.fillna() now raises a TypeError instead of a ValueError when passed a list, tuple or DataFrame as a value (GH18293)
pandas.DataFrame.merge() no longer casts a float column to object when merging on int and float columns (GH16572)
pandas.DataFrame.merge()
int
pandas.merge() now raises a ValueError when trying to merge on incompatible data types (GH9780)
The default NA value for UInt64Index has changed from 0 to NaN, which impacts methods that mask with NA, such as UInt64Index.where() (GH18398)
UInt64Index
UInt64Index.where()
Refactored setup.py to use find_packages instead of explicitly listing out all subpackages (GH18535)
find_packages
Rearranged the order of keyword arguments in read_excel() to align with read_csv() (GH16672)
wide_to_long() previously kept numeric-like suffixes as object dtype. Now they are cast to numeric if possible (GH17627)
wide_to_long()
In read_excel(), the comment argument is now exposed as a named parameter (GH18735)
comment
The options html.border and mode.use_inf_as_null were deprecated in prior versions, these will now show FutureWarning rather than a DeprecationWarning (GH19003)
html.border
mode.use_inf_as_null
IntervalIndex and IntervalDtype no longer support categorical, object, and string subtypes (GH19016)
IntervalDtype now returns True when compared against 'interval' regardless of subtype, and IntervalDtype.name now returns 'interval' regardless of subtype (GH18980)
'interval'
IntervalDtype.name
KeyError now raises instead of ValueError in drop(), drop(), drop(), drop() when dropping a non-existent element in an axis with duplicates (GH19186)
drop()
Series.to_csv() now accepts a compression argument that works in the same way as the compression argument in DataFrame.to_csv() (GH18958)
compression
Set operations (union, difference…) on IntervalIndex with incompatible index types will now raise a TypeError rather than a ValueError (GH19329)
DateOffset objects render more simply, e.g. <DateOffset: days=1> instead of <DateOffset: kwds={'days': 1}> (GH19403)
<DateOffset: days=1>
<DateOffset: kwds={'days': 1}>
Categorical.fillna now validates its value and method keyword arguments. It now raises when both or none are specified, matching the behavior of Series.fillna() (GH19682)
Categorical.fillna
pd.to_datetime('today') now returns a datetime, consistent with pd.Timestamp('today'); previously pd.to_datetime('today') returned a .normalized() datetime (GH19935)
pd.to_datetime('today')
pd.Timestamp('today')
.normalized()
Series.str.replace() now takes an optional regex keyword which, when set to False, uses literal string replacement rather than regex replacement (GH16808)
Series.str.replace()
DatetimeIndex.strftime() and PeriodIndex.strftime() now return an Index instead of a numpy array to be consistent with similar accessors (GH20127)
DatetimeIndex.strftime()
PeriodIndex.strftime()
Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (GH19714, GH20391).
DataFrame.to_dict() with orient='index' no longer casts int columns to float for a DataFrame with only int and float columns (GH18580)
DataFrame.to_dict()
A user-defined-function that is passed to Series.rolling().aggregate(), DataFrame.rolling().aggregate(), or its expanding cousins, will now always be passed a Series, rather than a np.array; .apply() only has the raw keyword, see here. This is consistent with the signatures of .aggregate() across pandas (GH20584)
Series.rolling().aggregate()
DataFrame.rolling().aggregate()
np.array
.apply()
raw
.aggregate()
Rolling and Expanding types raise NotImplementedError upon iteration (GH11704).
Series.from_array and SparseSeries.from_array are deprecated. Use the normal constructor Series(..) and SparseSeries(..) instead (GH18213).
Series.from_array
SparseSeries.from_array
Series(..)
SparseSeries(..)
DataFrame.as_matrix is deprecated. Use DataFrame.values instead (GH18458).
DataFrame.as_matrix
DataFrame.values
Series.asobject, DatetimeIndex.asobject, PeriodIndex.asobject and TimeDeltaIndex.asobject have been deprecated. Use .astype(object) instead (GH18572)
Series.asobject
DatetimeIndex.asobject
PeriodIndex.asobject
TimeDeltaIndex.asobject
.astype(object)
Grouping by a tuple of keys now emits a FutureWarning and is deprecated. In the future, a tuple passed to 'by' will always refer to a single key that is the actual tuple, instead of treating the tuple as multiple keys. To retain the previous behavior, use a list instead of a tuple (GH18314)
'by'
Series.valid is deprecated. Use Series.dropna() instead (GH18800).
Series.valid
Series.dropna()
read_excel() has deprecated the skip_footer parameter. Use skipfooter instead (GH18836)
skip_footer
skipfooter
ExcelFile.parse() has deprecated sheetname in favor of sheet_name for consistency with read_excel() (GH20920).
ExcelFile.parse()
sheetname
sheet_name
The is_copy attribute is deprecated and will be removed in a future version (GH18801).
is_copy
IntervalIndex.from_intervals is deprecated in favor of the IntervalIndex constructor (GH19263)
IntervalIndex.from_intervals
DataFrame.from_items is deprecated. Use DataFrame.from_dict() instead, or DataFrame.from_dict(OrderedDict()) if you wish to preserve the key order (GH17320, GH17312)
DataFrame.from_items
DataFrame.from_dict(OrderedDict())
Indexing a MultiIndex or a FloatIndex with a list containing some missing keys will now show a FutureWarning, which is consistent with other types of indexes (GH17758).
FloatIndex
The broadcast parameter of .apply() is deprecated in favor of result_type='broadcast' (GH18577)
broadcast
The reduce parameter of .apply() is deprecated in favor of result_type='reduce' (GH18577)
reduce
result_type='reduce'
The order parameter of factorize() is deprecated and will be removed in a future release (GH19727)
order
factorize()
Timestamp.weekday_name, DatetimeIndex.weekday_name, and Series.dt.weekday_name are deprecated in favor of Timestamp.day_name(), DatetimeIndex.day_name(), and Series.dt.day_name() (GH12806)
Timestamp.weekday_name
DatetimeIndex.weekday_name
Series.dt.weekday_name
Series.dt.day_name()
pandas.tseries.plotting.tsplot is deprecated. Use Series.plot() instead (GH18627)
pandas.tseries.plotting.tsplot
Series.plot()
Index.summary() is deprecated and will be removed in a future version (GH18217)
Index.summary()
NDFrame.get_ftype_counts() is deprecated and will be removed in a future version (GH18243)
NDFrame.get_ftype_counts()
The convert_datetime64 parameter in DataFrame.to_records() has been deprecated and will be removed in a future version. The NumPy bug motivating this parameter has been resolved. The default value for this parameter has also changed from True to None (GH18160).
convert_datetime64
DataFrame.to_records()
Series.rolling().apply(), DataFrame.rolling().apply(), Series.expanding().apply(), and DataFrame.expanding().apply() have deprecated passing an np.array by default. One will need to pass the new raw parameter to be explicit about what is passed (GH20584)
The data, base, strides, flags and itemsize properties of the Series and Index classes have been deprecated and will be removed in a future version (GH20419).
data
base
strides
flags
itemsize
DatetimeIndex.offset is deprecated. Use DatetimeIndex.freq instead (GH20716)
DatetimeIndex.offset
DatetimeIndex.freq
Floor division between an integer ndarray and a Timedelta is deprecated. Divide by Timedelta.value instead (GH19761)
Timedelta.value
Setting PeriodIndex.freq (which was not guaranteed to work correctly) is deprecated. Use PeriodIndex.asfreq() instead (GH20678)
PeriodIndex.freq
PeriodIndex.asfreq()
Index.get_duplicates() is deprecated and will be removed in a future version (GH20239)
Index.get_duplicates()
The previous default behavior of negative indices in Categorical.take is deprecated. In a future version it will change from meaning missing values to meaning positional indices from the right. The future behavior is consistent with Series.take() (GH20664).
Categorical.take
Series.take()
Passing multiple axes to the axis parameter in DataFrame.dropna() has been deprecated and will be removed in a future version (GH20987)
axis
DataFrame.dropna()
Warnings against the obsolete usage Categorical(codes, categories), which were emitted for instance when the first two arguments to Categorical() had different dtypes, and recommended the use of Categorical.from_codes, have now been removed (GH8074)
Categorical(codes, categories)
Categorical()
Categorical.from_codes
The levels and labels attributes of a MultiIndex can no longer be set directly (GH4039).
levels
labels
pd.tseries.util.pivot_annual has been removed (deprecated since v0.19). Use pivot_table instead (GH18370)
pd.tseries.util.pivot_annual
pivot_table
pd.tseries.util.isleapyear has been removed (deprecated since v0.19). Use .is_leap_year property in Datetime-likes instead (GH18370)
pd.tseries.util.isleapyear
.is_leap_year
pd.ordered_merge has been removed (deprecated since v0.19). Use pd.merge_ordered instead (GH18459)
pd.ordered_merge
pd.merge_ordered
The SparseList class has been removed (GH14007)
SparseList
The pandas.io.wb and pandas.io.data stub modules have been removed (GH13735)
pandas.io.wb
pandas.io.data
Categorical.from_array has been removed (GH13854)
Categorical.from_array
The freq and how parameters have been removed from the rolling/expanding/ewm methods of DataFrame and Series (deprecated since v0.18). Instead, resample before calling the methods. (GH18601 & GH18668)
how
rolling
expanding
ewm
DatetimeIndex.to_datetime, Timestamp.to_datetime, PeriodIndex.to_datetime, and Index.to_datetime have been removed (GH8254, GH14096, GH14113)
DatetimeIndex.to_datetime
Timestamp.to_datetime
PeriodIndex.to_datetime
Index.to_datetime
read_csv() has dropped the skip_footer parameter (GH13386)
read_csv() has dropped the as_recarray parameter (GH13373)
as_recarray
read_csv() has dropped the buffer_lines parameter (GH13360)
buffer_lines
read_csv() has dropped the compact_ints and use_unsigned parameters (GH13323)
compact_ints
use_unsigned
The Timestamp class has dropped the offset attribute in favor of freq (GH13593)
offset
The Series, Categorical, and Index classes have dropped the reshape method (GH13012)
reshape
pandas.tseries.frequencies.get_standard_freq has been removed in favor of pandas.tseries.frequencies.to_offset(freq).rule_code (GH13874)
pandas.tseries.frequencies.get_standard_freq
pandas.tseries.frequencies.to_offset(freq).rule_code
The freqstr keyword has been removed from pandas.tseries.frequencies.to_offset in favor of freq (GH13874)
freqstr
pandas.tseries.frequencies.to_offset
The Panel4D and PanelND classes have been removed (GH13776)
Panel4D
PanelND
The Panel class has dropped the to_long and toLong methods (GH19077)
to_long
toLong
The options display.line_with and display.height are removed in favor of display.width and display.max_rows respectively (GH4391, GH19107)
display.line_with
display.height
display.width
display.max_rows
The labels attribute of the Categorical class has been removed in favor of Categorical.codes (GH7768)
Categorical.codes
The flavor parameter have been removed from func:to_sql method (GH13611)
flavor
The modules pandas.tools.hashing and pandas.util.hashing have been removed (GH16223)
pandas.tools.hashing
pandas.util.hashing
The top-level functions pd.rolling_*, pd.expanding_* and pd.ewm* have been removed (Deprecated since v0.18). Instead, use the DataFrame/Series methods rolling, expanding and ewm (GH18723)
pd.rolling_*
pd.expanding_*
pd.ewm*
Imports from pandas.core.common for functions such as is_datetime64_dtype are now removed. These are located in pandas.api.types. (GH13634, GH19769)
is_datetime64_dtype
pandas.api.types
The infer_dst keyword in Series.tz_localize(), DatetimeIndex.tz_localize() and DatetimeIndex have been removed. infer_dst=True is equivalent to ambiguous='infer', and infer_dst=False to ambiguous='raise' (GH7963).
infer_dst
Series.tz_localize()
DatetimeIndex.tz_localize()
infer_dst=True
ambiguous='infer'
infer_dst=False
ambiguous='raise'
When .resample() was changed from an eager to a lazy operation, like .groupby() in v0.18.0, we put in place compatibility (with a FutureWarning), so operations would continue to work. This is now fully removed, so a Resampler will no longer forward compat operations (GH20554)
.resample()
.groupby()
Remove long deprecated axis=None parameter from .replace() (GH20271)
axis=None
.replace()
Indexers on Series or DataFrame no longer create a reference cycle (GH17956)
Added a keyword argument, cache, to to_datetime() that improved the performance of converting duplicate datetime arguments (GH11665)
cache
DateOffset arithmetic performance is improved (GH18218)
Converting a Series of Timedelta objects to days, seconds, etc… sped up through vectorization of underlying methods (GH18092)
Improved performance of .map() with a Series/dict input (GH15081)
.map()
Series/dict
The overridden Timedelta properties of days, seconds and microseconds have been removed, leveraging their built-in Python versions instead (GH18242)
Series construction will reduce the number of copies made of the input data in certain cases (GH17449)
Improved performance of Series.dt.date() and DatetimeIndex.date() (GH18058)
Series.dt.date()
DatetimeIndex.date()
Improved performance of Series.dt.time() and DatetimeIndex.time() (GH18461)
Series.dt.time()
DatetimeIndex.time()
Improved performance of IntervalIndex.symmetric_difference() (GH18475)
IntervalIndex.symmetric_difference()
Improved performance of DatetimeIndex and Series arithmetic operations with Business-Month and Business-Quarter frequencies (GH18489)
Series() / DataFrame() tab completion limits to 100 values, for better performance. (GH18587)
Improved performance of DataFrame.median() with axis=1 when bottleneck is not installed (GH16468)
DataFrame.median()
Improved performance of MultiIndex.get_loc() for large indexes, at the cost of a reduction in performance for small ones (GH18519)
MultiIndex.get_loc()
Improved performance of MultiIndex.remove_unused_levels() when there are no unused levels, at the cost of a reduction in performance when there are (GH19289)
MultiIndex.remove_unused_levels()
Improved performance of Index.get_loc() for non-unique indexes (GH19478)
Index.get_loc()
Improved performance of pairwise .rolling() and .expanding() with .cov() and .corr() operations (GH17917)
.rolling()
.expanding()
.cov()
.corr()
Improved performance of pandas.core.groupby.GroupBy.rank() (GH15779)
pandas.core.groupby.GroupBy.rank()
Improved performance of variable .rolling() on .min() and .max() (GH19521)
.min()
.max()
Improved performance of pandas.core.groupby.GroupBy.ffill() and pandas.core.groupby.GroupBy.bfill() (GH11296)
pandas.core.groupby.GroupBy.ffill()
pandas.core.groupby.GroupBy.bfill()
Improved performance of pandas.core.groupby.GroupBy.any() and pandas.core.groupby.GroupBy.all() (GH15435)
pandas.core.groupby.GroupBy.any()
pandas.core.groupby.GroupBy.all()
Improved performance of pandas.core.groupby.GroupBy.pct_change() (GH19165)
pandas.core.groupby.GroupBy.pct_change()
Improved performance of Series.isin() in the case of categorical dtypes (GH20003)
Series.isin()
Improved performance of getattr(Series, attr) when the Series has certain index types. This manifested in slow printing of large Series with a DatetimeIndex (GH19764)
getattr(Series, attr)
Fixed a performance regression for GroupBy.nth() and GroupBy.last() with some object columns (GH19283)
GroupBy.nth()
GroupBy.last()
Improved performance of pandas.core.arrays.Categorical.from_codes() (GH18501)
pandas.core.arrays.Categorical.from_codes()
Thanks to all of the contributors who participated in the Pandas Documentation Sprint, which took place on March 10th. We had about 500 participants from over 30 locations across the world. You should notice that many of the API docstrings have greatly improved.
There were too many simultaneous contributions to include a release note for each improvement, but this GitHub search should give you an idea of how many docstrings were improved.
Special thanks to Marc Garcia for organizing the sprint. For more information, read the NumFOCUS blogpost recapping the sprint.
Changed spelling of “numpy” to “NumPy”, and “python” to “Python”. (GH19017)
Consistency when introducing code samples, using either colon or period. Rewrote some sentences for greater clarity, added more dynamic references to functions, methods and classes. (GH18941, GH18948, GH18973, GH19017)
Added a reference to DataFrame.assign() in the concatenate section of the merging documentation (GH18665)
A class of bugs were introduced in pandas 0.21 with CategoricalDtype that affects the correctness of operations like merge, concat, and indexing when comparing multiple unordered Categorical arrays that have the same categories, but in a different order. We highly recommend upgrading or manually aligning your categories before doing these operations.
merge
concat
Bug in Categorical.equals returning the wrong result when comparing two unordered Categorical arrays with the same categories, but in a different order (GH16603)
Categorical.equals
Bug in pandas.api.types.union_categoricals() returning the wrong result when for unordered categoricals with the categories in a different order. This affected pandas.concat() with Categorical data (GH19096).
pandas.api.types.union_categoricals()
Bug in pandas.merge() returning the wrong result when joining on an unordered Categorical that had the same categories but in a different order (GH19551)
Bug in CategoricalIndex.get_indexer() returning the wrong result when target was an unordered Categorical that had the same categories as self but in a different order (GH19551)
CategoricalIndex.get_indexer()
target
self
Bug in Index.astype() with a categorical dtype where the resultant index is not converted to a CategoricalIndex for all types of index (GH18630)
Bug in Series.astype() and Categorical.astype() where an existing categorical data does not get updated (GH10696, GH18593)
Categorical.astype()
Bug in Series.str.split() with expand=True incorrectly raising an IndexError on empty strings (GH20002).
Series.str.split()
expand=True
Bug in Index constructor with dtype=CategoricalDtype(...) where categories and ordered are not maintained (GH19032)
dtype=CategoricalDtype(...)
Bug in Series constructor with scalar and dtype=CategoricalDtype(...) where categories and ordered are not maintained (GH19565)
Bug in Categorical.__iter__ not converting to Python types (GH19909)
Categorical.__iter__
Bug in pandas.factorize() returning the unique codes for the uniques. This now returns a Categorical with the same dtype as the input (GH19721)
pandas.factorize()
uniques
Bug in pandas.factorize() including an item for missing values in the uniques return value (GH19721)
Bug in Series.take() with categorical data interpreting -1 in indices as missing value markers, rather than the last element of the Series (GH20664)
-1
Bug in Series.__sub__() subtracting a non-nanosecond np.datetime64 object from a Series gave incorrect results (GH7996)
Series.__sub__()
np.datetime64
Bug in DatetimeIndex, TimedeltaIndex addition and subtraction of zero-dimensional integer arrays gave incorrect results (GH19012)
Bug in DatetimeIndex and TimedeltaIndex where adding or subtracting an array-like of DateOffset objects either raised (np.array, pd.Index) or broadcast incorrectly (pd.Series) (GH18849)
pd.Index
pd.Series
Bug in Series.__add__() adding Series with dtype timedelta64[ns] to a timezone-aware DatetimeIndex incorrectly dropped timezone information (GH13905)
Series.__add__()
timedelta64[ns]
Adding a Period object to a datetime or Timestamp object will now correctly raise a TypeError (GH17983)
Period
datetime
Bug in Timestamp where comparison with an array of Timestamp objects would result in a RecursionError (GH15183)
RecursionError
Bug in Series floor-division where operating on a scalar timedelta raises an exception (GH18846)
timedelta
Bug in DatetimeIndex where the repr was not showing high-precision time values at the end of a day (e.g., 23:59:59.999999999) (GH19030)
Bug in .astype() to non-ns timedelta units would hold the incorrect dtype (GH19176, GH19223, GH12425)
.astype()
Bug in subtracting Series from NaT incorrectly returning NaT (GH19158)
Bug in Series.truncate() which raises TypeError with a monotonic PeriodIndex (GH17717)
Bug in pct_change() using periods and freq returned different length outputs (GH7292)
pct_change()
Bug in comparison of DatetimeIndex against None or datetime.date objects raising TypeError for == and != comparisons instead of all-False and all-True, respectively (GH19301)
datetime.date
==
!=
Bug in Timestamp and to_datetime() where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising OutOfBoundsDatetime (GH19382)
OutOfBoundsDatetime
Bug in Timestamp.floor() DatetimeIndex.floor() where time stamps far in the future and past were not rounded correctly (GH19206)
Timestamp.floor()
DatetimeIndex.floor()
Bug in to_datetime() where passing an out-of-bounds datetime with errors='coerce' and utc=True would raise OutOfBoundsDatetime instead of parsing to NaT (GH19612)
errors='coerce'
utc=True
Bug in DatetimeIndex and TimedeltaIndex addition and subtraction where name of the returned object was not always set consistently. (GH19744)
Bug in DatetimeIndex and TimedeltaIndex addition and subtraction where operations with numpy arrays raised TypeError (GH19847)
Bug in DatetimeIndex and TimedeltaIndex where setting the freq attribute was not fully supported (GH20678)
Bug in Timedelta.__mul__() where multiplying by NaT returned NaT instead of raising a TypeError (GH19819)
Timedelta.__mul__()
Bug in Series with dtype='timedelta64[ns]' where addition or subtraction of TimedeltaIndex had results cast to dtype='int64' (GH17250)
dtype='int64'
Bug in Series with dtype='timedelta64[ns]' where addition or subtraction of TimedeltaIndex could return a Series with an incorrect name (GH19043)
Bug in Timedelta.__floordiv__() and Timedelta.__rfloordiv__() dividing by many incompatible numpy objects was incorrectly allowed (GH18846)
Timedelta.__floordiv__()
Timedelta.__rfloordiv__()
Bug where dividing a scalar timedelta-like object with TimedeltaIndex performed the reciprocal operation (GH19125)
Bug in TimedeltaIndex where division by a Series would return a TimedeltaIndex instead of a Series (GH19042)
Bug in Timedelta.__add__(), Timedelta.__sub__() where adding or subtracting a np.timedelta64 object would return another np.timedelta64 instead of a Timedelta (GH19738)
Timedelta.__add__()
Timedelta.__sub__()
np.timedelta64
Bug in Timedelta.__floordiv__(), Timedelta.__rfloordiv__() where operating with a Tick object would raise a TypeError instead of returning a numeric value (GH19738)
Tick
Bug in Period.asfreq() where periods near datetime(1, 1, 1) could be converted incorrectly (GH19643, GH19834)
Period.asfreq()
datetime(1, 1, 1)
Bug in Timedelta.total_seconds() causing precision errors, for example Timedelta('30S').total_seconds()==30.000000000000004 (GH19458)
Timedelta.total_seconds()
Timedelta('30S').total_seconds()==30.000000000000004
Bug in Timedelta.__rmod__() where operating with a numpy.timedelta64 returned a timedelta64 object instead of a Timedelta (GH19820)
Timedelta.__rmod__()
numpy.timedelta64
timedelta64
Multiplication of TimedeltaIndex by TimedeltaIndex will now raise TypeError instead of raising ValueError in cases of length mis-match (GH19333)
Bug in indexing a TimedeltaIndex with a np.timedelta64 object which was raising a TypeError (GH20393)
Bug in creating a Series from an array that contains both tz-naive and tz-aware values will result in a Series whose dtype is tz-aware instead of object (GH16406)
Bug in comparison of timezone-aware DatetimeIndex against NaT incorrectly raising TypeError (GH19276)
Bug in DatetimeIndex.astype() when converting between timezone aware dtypes, and converting from timezone aware to naive (GH18951)
DatetimeIndex.astype()
Bug in comparing DatetimeIndex, which failed to raise TypeError when attempting to compare timezone-aware and timezone-naive datetimelike objects (GH18162)
Bug in localization of a naive, datetime string in a Series constructor with a datetime64[ns, tz] dtype (GH174151)
datetime64[ns, tz]
Timestamp.replace() will now handle Daylight Savings transitions gracefully (GH18319)
Timestamp.replace()
Bug in tz-aware DatetimeIndex where addition/subtraction with a TimedeltaIndex or array with dtype='timedelta64[ns]' was incorrect (GH17558)
Bug in DatetimeIndex.insert() where inserting NaT into a timezone-aware index incorrectly raised (GH16357)
DatetimeIndex.insert()
Bug in DataFrame constructor, where tz-aware Datetimeindex and a given column name will result in an empty DataFrame (GH19157)
Bug in Timestamp.tz_localize() where localizing a timestamp near the minimum or maximum valid values could overflow and return a timestamp with an incorrect nanosecond value (GH12677)
Timestamp.tz_localize()
Bug when iterating over DatetimeIndex that was localized with fixed timezone offset that rounded nanosecond precision to microseconds (GH19603)
Bug in DataFrame.diff() that raised an IndexError with tz-aware values (GH18578)
DataFrame.diff()
IndexError
Bug in melt() that converted tz-aware dtypes to tz-naive (GH15785)
melt()
Bug in Dataframe.count() that raised an ValueError, if Dataframe.dropna() was called for a single column with timezone-aware values. (GH13407)
Dataframe.count()
Dataframe.dropna()
Bug in WeekOfMonth and Week where addition and subtraction did not roll correctly (GH18510, GH18672, GH18864)
Week
Bug in WeekOfMonth and LastWeekOfMonth where default keyword arguments for constructor raised ValueError (GH19142)
LastWeekOfMonth
Bug in FY5253Quarter, LastWeekOfMonth where rollback and rollforward behavior was inconsistent with addition and subtraction behavior (GH18854)
FY5253Quarter
Bug in FY5253 where datetime addition and subtraction incremented incorrectly for dates on the year-end but not normalized to midnight (GH18854)
FY5253
Bug in FY5253 where date offsets could incorrectly raise an AssertionError in arithmetic operations (GH14774)
AssertionError
Bug in Series constructor with an int or float list where specifying dtype=str, dtype='str' or dtype='U' failed to convert the data elements to strings (GH16605)
dtype=str
dtype='str'
dtype='U'
Bug in Index multiplication and division methods where operating with a Series would return an Index object instead of a Series object (GH19042)
Bug in the DataFrame constructor in which data containing very large positive or very large negative numbers was causing OverflowError (GH18584)
OverflowError
Bug in Index constructor with dtype='uint64' where int-like floats were not coerced to UInt64Index (GH18400)
dtype='uint64'
Bug in DataFrame flex arithmetic (e.g. df.add(other, fill_value=foo)) with a fill_value other than None failed to raise NotImplementedError in corner cases where either the frame or other has length zero (GH19522)
df.add(other, fill_value=foo)
fill_value
other
Multiplication and division of numeric-dtyped Index objects with timedelta-like scalars returns TimedeltaIndex instead of raising TypeError (GH19333)
Bug where NaN was returned instead of 0 by Series.pct_change() and DataFrame.pct_change() when fill_method is not None (GH19873)
Series.pct_change()
DataFrame.pct_change()
fill_method
Bug in Series.str.get() with a dictionary in the values and the index not in the keys, raising KeyError (GH20671)
Series.str.get()
Bug in Index construction from list of mixed type tuples (GH18505)
Bug in Index.drop() when passing a list of both tuples and non-tuples (GH18304)
Index.drop()
Bug in DataFrame.drop(), Panel.drop(), Series.drop(), Index.drop() where no KeyError is raised when dropping a non-existent element from an axis that contains duplicates (GH19186)
DataFrame.drop()
Panel.drop()
Series.drop()
Bug in indexing a datetimelike Index that raised ValueError instead of IndexError (GH18386).
Index.to_series() now accepts index and name kwargs (GH18699)
Index.to_series()
DatetimeIndex.to_series() now accepts index and name kwargs (GH18699)
DatetimeIndex.to_series()
Bug in indexing non-scalar value from Series having non-unique Index will return value flattened (GH17610)
Bug in indexing with iterator containing only missing keys, which raised no error (GH20748)
Fixed inconsistency in .ix between list and scalar keys when the index has integer dtype and does not include the desired keys (GH20753)
.ix
Bug in __setitem__ when indexing a DataFrame with a 2-d boolean ndarray (GH18582)
__setitem__
Bug in str.extractall when there were no matches empty Index was returned instead of appropriate MultiIndex (GH19034)
str.extractall
Bug in IntervalIndex where empty and purely NA data was constructed inconsistently depending on the construction method (GH18421)
Bug in IntervalIndex.symmetric_difference() where the symmetric difference with a non-IntervalIndex did not raise (GH18475)
Bug in IntervalIndex where set operations that returned an empty IntervalIndex had the wrong dtype (GH19101)
Bug in DataFrame.drop_duplicates() where no KeyError is raised when passing in columns that don’t exist on the DataFrame (GH19726)
DataFrame.drop_duplicates()
Bug in Index subclasses constructors that ignore unexpected keyword arguments (GH19348)
Bug in Index.difference() when taking difference of an Index with itself (GH20040)
Index.difference()
Bug in DataFrame.first_valid_index() and DataFrame.last_valid_index() in presence of entire rows of NaNs in the middle of values (GH20499).
DataFrame.first_valid_index()
DataFrame.last_valid_index()
Bug in IntervalIndex where some indexing operations were not supported for overlapping or non-monotonic uint64 data (GH20636)
uint64
Bug in Series.is_unique where extraneous output in stderr is shown if Series contains objects with __ne__ defined (GH20661)
Series.is_unique
__ne__
Bug in .loc assignment with a single-element list-like incorrectly assigns as a list (GH19474)
.loc
Bug in partial string indexing on a Series/DataFrame with a monotonic decreasing DatetimeIndex (GH19362)
Series/DataFrame
Bug in performing in-place operations on a DataFrame with a duplicate Index (GH17105)
Bug in IntervalIndex.get_loc() and IntervalIndex.get_indexer() when used with an IntervalIndex containing a single interval (GH17284, GH20921)
IntervalIndex.get_loc()
IntervalIndex.get_indexer()
Bug in .loc with a uint64 indexer (GH20722)
Bug in MultiIndex.__contains__() where non-tuple keys would return True even if they had been dropped (GH19027)
MultiIndex.__contains__()
Bug in MultiIndex.set_labels() which would cause casting (and potentially clipping) of the new labels if the level argument is not 0 or a list like [0, 1, … ] (GH19057)
MultiIndex.set_labels()
level
Bug in MultiIndex.get_level_values() which would return an invalid index on level of ints with missing values (GH17924)
MultiIndex.get_level_values()
Bug in MultiIndex.unique() when called on empty MultiIndex (GH20568)
Bug in MultiIndex.unique() which would not preserve level names (GH20570)
Bug in MultiIndex.remove_unused_levels() which would fill nan values (GH18417)
Bug in MultiIndex.from_tuples() which would fail to take zipped tuples in python3 (GH18434)
MultiIndex.from_tuples()
Bug in MultiIndex.get_loc() which would fail to automatically cast values between float and int (GH18818, GH15994)
Bug in MultiIndex.get_loc() which would cast boolean to integer labels (GH19086)
Bug in MultiIndex.get_loc() which would fail to locate keys containing NaN (GH18485)
Bug in MultiIndex.get_loc() in large MultiIndex, would fail when levels had different dtypes (GH18520)
Bug in indexing where nested indexers having only numpy arrays are handled incorrectly (GH19686)
read_html() now rewinds seekable IO objects after parse failure, before attempting to parse with a new parser. If a parser errors and the object is non-seekable, an informative error is raised suggesting the use of a different parser (GH17975)
DataFrame.to_html() now has an option to add an id to the leading <table> tag (GH8496)
DataFrame.to_html()
Bug in read_msgpack() with a non existent file is passed in Python 2 (GH15296)
read_msgpack()
Bug in read_csv() where a MultiIndex with duplicate columns was not being mangled appropriately (GH18062)
Bug in read_csv() where missing values were not being handled properly when keep_default_na=False with dictionary na_values (GH19227)
keep_default_na=False
na_values
Bug in read_csv() causing heap corruption on 32-bit, big-endian architectures (GH20785)
Bug in read_sas() where a file with 0 variables gave an AttributeError incorrectly. Now it gives an EmptyDataError (GH18184)
read_sas()
EmptyDataError
Bug in DataFrame.to_latex() where pairs of braces meant to serve as invisible placeholders were escaped (GH18667)
DataFrame.to_latex()
Bug in DataFrame.to_latex() where a NaN in a MultiIndex would cause an IndexError or incorrect output (GH14249)
Bug in DataFrame.to_latex() where a non-string index-level name would result in an AttributeError (GH19981)
Bug in DataFrame.to_latex() where the combination of an index name and the index_names=False option would result in incorrect output (GH18326)
Bug in DataFrame.to_latex() where a MultiIndex with an empty string as its name would result in incorrect output (GH18669)
Bug in DataFrame.to_latex() where missing space characters caused wrong escaping and produced non-valid latex in some cases (GH20859)
Bug in read_json() where large numeric values were causing an OverflowError (GH18842)
read_json()
Bug in DataFrame.to_parquet() where an exception was raised if the write destination is S3 (GH19134)
DataFrame.to_parquet()
Interval now supported in DataFrame.to_excel() for all Excel file types (GH19242)
DataFrame.to_excel()
Timedelta now supported in DataFrame.to_excel() for all Excel file types (GH19242, GH9155, GH19900)
Bug in pandas.io.stata.StataReader.value_labels() raising an AttributeError when called on very old files. Now returns an empty dict (GH19417)
pandas.io.stata.StataReader.value_labels()
Bug in read_pickle() when unpickling objects with TimedeltaIndex or Float64Index created with pandas prior to version 0.20 (GH19939)
read_pickle()
Float64Index
Bug in pandas.io.json.json_normalize() where sub-records are not properly normalized if any sub-records values are NoneType (GH20030)
pandas.io.json.json_normalize()
Bug in usecols parameter in read_csv() where error is not raised correctly when passing a string. (GH20529)
Bug in HDFStore.keys() when reading a file with a soft link causes exception (GH20523)
HDFStore.keys()
Bug in HDFStore.select_column() where a key which is not a valid store raised an AttributeError instead of a KeyError (GH17912)
HDFStore.select_column()
Better error message when attempting to plot but matplotlib is not installed (GH19810).
DataFrame.plot() now raises a ValueError when the x or y argument is improperly formed (GH18671)
DataFrame.plot()
x
y
Bug in DataFrame.plot() when x and y arguments given as positions caused incorrect referenced columns for line, bar and area plots (GH20056)
Bug in formatting tick labels with datetime.time() and fractional seconds (GH18478).
datetime.time()
Series.plot.kde() has exposed the args ind and bw_method in the docstring (GH18461). The argument ind may now also be an integer (number of sample points).
Series.plot.kde()
ind
bw_method
DataFrame.plot() now supports multiple columns to the y argument (GH19699)
Bug when grouping by a single column and aggregating with a class like list or tuple (GH18079)
Fixed regression in DataFrame.groupby() which would not emit an error when called with a tuple key not in the index (GH18798)
DataFrame.groupby()
Bug in DataFrame.resample() which silently ignored unsupported (or mistyped) options for label, closed and convention (GH19303)
DataFrame.resample()
label
convention
Bug in DataFrame.groupby() where tuples were interpreted as lists of keys rather than as keys (GH17979, GH18249)
Bug in DataFrame.groupby() where aggregation by first/last/min/max was causing timestamps to lose precision (GH19526)
first
last
min
max
Bug in DataFrame.transform() where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (GH19200)
DataFrame.transform()
Bug in DataFrame.groupby() passing the on= kwarg, and subsequently using .apply() (GH17813)
Bug in DataFrame.resample().aggregate not raising a KeyError when aggregating a non-existent column (GH16766, GH19566)
DataFrame.resample().aggregate
Bug in DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() when skipna was passed (GH19806)
DataFrameGroupBy.cumsum()
DataFrameGroupBy.cumprod()
skipna
Bug in DataFrame.resample() that dropped timezone information (GH13238)
Bug in DataFrame.groupby() where transformations using np.all and np.any were raising a ValueError (GH20653)
np.all
np.any
Bug in DataFrame.resample() where ffill, bfill, pad, backfill, fillna, interpolate, and asfreq were ignoring loffset. (GH20744)
ffill
bfill
pad
backfill
fillna
interpolate
asfreq
loffset
Bug in DataFrame.groupby() when applying a function that has mixed data types and the user supplied function can fail on the grouping column (GH20949)
Bug in DataFrameGroupBy.rolling().apply() where operations performed against the associated DataFrameGroupBy object could impact the inclusion of the grouped item(s) in the result (GH14013)
DataFrameGroupBy.rolling().apply()
DataFrameGroupBy
Bug in which creating a SparseDataFrame from a dense Series or an unsupported type raised an uncontrolled exception (GH19374)
Bug in SparseDataFrame.to_csv causing exception (GH19384)
SparseDataFrame.to_csv
Bug in SparseSeries.memory_usage which caused segfault by accessing non sparse elements (GH19368)
SparseSeries.memory_usage
Bug in constructing a SparseArray: if data is a scalar and index is defined it will coerce to float64 regardless of scalar’s dtype. (GH19163)
SparseArray
float64
Bug in DataFrame.merge() where referencing a CategoricalIndex by name, where the by kwarg would KeyError (GH20777)
Bug in DataFrame.stack() which fails trying to sort mixed type levels under Python 3 (GH18310)
DataFrame.stack()
Bug in DataFrame.unstack() which casts int to float if columns is a MultiIndex with unused levels (GH17845)
Bug in DataFrame.unstack() which raises an error if index is a MultiIndex with unused labels on the unstacked level (GH18562)
Fixed construction of a Series from a dict containing NaN as key (GH18480)
Fixed construction of a DataFrame from a dict containing NaN as key (GH18455)
Disabled construction of a Series where len(index) > len(data) = 1, which previously would broadcast the data item, and now raises a ValueError (GH18819)
Suppressed error in the construction of a DataFrame from a dict containing scalar values when the corresponding keys are not included in the passed index (GH18600)
Fixed (changed from object to float64) dtype of DataFrame initialized with axes, no data, and dtype=int (GH19646)
dtype=int
Bug in Series.rank() where Series containing NaT modifies the Series inplace (GH18521)
Bug in cut() which fails when using readonly arrays (GH18773)
Bug in DataFrame.pivot_table() which fails when the aggfunc arg is of type string. The behavior is now consistent with other methods like agg and apply (GH18713)
DataFrame.pivot_table()
aggfunc
agg
apply
Bug in DataFrame.merge() in which merging using Index objects as vectors raised an Exception (GH19038)
Bug in DataFrame.stack(), DataFrame.unstack(), Series.unstack() which were not returning subclasses (GH15563)
Series.unstack()
Bug in timezone comparisons, manifesting as a conversion of the index to UTC in .concat() (GH18523)
.concat()
Bug in concat() when concatenating sparse and dense series it returns only a SparseDataFrame. Should be a DataFrame. (GH18914, GH18686, and GH16874)
concat()
Improved error message for DataFrame.merge() when there is no common merge key (GH19427)
Bug in DataFrame.join() which does an outer instead of a left join when being called with multiple DataFrames and some have non-unique indices (GH19624)
DataFrame.join()
outer
left
Series.rename() now accepts axis as a kwarg (GH18589)
Series.rename()
Bug in rename() where an Index of same-length tuples was converted to a MultiIndex (GH19497)
rename()
Comparisons between Series and Index would return a Series with an incorrect name, ignoring the Index’s name attribute (GH19582)
Bug in qcut() where datetime and timedelta data with NaT present raised a ValueError (GH19768)
qcut()
Bug in DataFrame.iterrows(), which would infers strings not compliant to ISO8601 to datetimes (GH19671)
DataFrame.iterrows()
Bug in Series constructor with Categorical where a ValueError is not raised when an index of different length is given (GH19342)
Bug in DataFrame.astype() where column metadata is lost when converting to categorical or a dictionary of dtypes (GH19920)
Bug in cut() and qcut() where timezone information was dropped (GH19872)
Bug in Series constructor with a dtype=str, previously raised in some cases (GH19853)
Bug in get_dummies(), and select_dtypes(), where duplicate column names caused incorrect behavior (GH20848)
select_dtypes()
Bug in isna(), which cannot handle ambiguous typed lists (GH20675)
isna()
Bug in concat() which raises an error when concatenating TZ-aware dataframes and all-NaT dataframes (GH12396)
Bug in concat() which raises an error when concatenating empty TZ-aware series (GH18447)
Improved error message when attempting to use a Python keyword as an identifier in a numexpr backed query (GH18221)
numexpr
Bug in accessing a pandas.get_option(), which raised KeyError rather than OptionError when looking up a non-existent option key in some cases (GH19789)
pandas.get_option()
OptionError
Bug in testing.assert_series_equal() and testing.assert_frame_equal() for Series or DataFrames with differing unicode data (GH20503)
testing.assert_series_equal()
testing.assert_frame_equal()
A total of 328 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Aaron Critchley
AbdealiJK +
Adam Hooper +
Albert Villanova del Moral
Alejandro Giacometti +
Alejandro Hohmann +
Alex Rychyk
Alexander Buchkovsky
Alexander Lenail +
Alexander Michael Schade
Aly Sivji +
Andreas Költringer +
Andrew
Andrew Bui +
András Novoszáth +
Andy Craze +
Andy R. Terrel
Anh Le +
Anil Kumar Pallekonda +
Antoine Pitrou +
Antonio Linde +
Antonio Molina +
Antonio Quinonez +
Armin Varshokar +
Artem Bogachev +
Avi Sen +
Azeez Oluwafemi +
Ben Auffarth +
Bernhard Thiel +
Bhavesh Poddar +
BielStela +
Blair +
Bob Haffner
Brett Naul +
Brock Mendel
Bryce Guinta +
Carlos Eduardo Moreira dos Santos +
Carlos García Márquez +
Carol Willing
Cheuk Ting Ho +
Chitrank Dixit +
Chris
Chris Burr +
Chris Catalfo +
Chris Mazzullo
Christian Chwala +
Cihan Ceyhan +
Clemens Brunner
Colin +
Cornelius Riemenschneider
Crystal Gong +
DaanVanHauwermeiren
Dan Dixey +
Daniel Frank +
Daniel Garrido +
Daniel Sakuma +
DataOmbudsman +
Dave Hirschfeld
Dave Lewis +
David Adrián Cañones Castellano +
David Arcos +
David C Hall +
David Fischer
David Hoese +
David Lutz +
David Polo +
David Stansby
Dennis Kamau +
Dillon Niederhut
Dimitri +
Dr. Irv
Dror Atariah
Eric Chea +
Eric Kisslinger
Eric O. LEBIGOT (EOL) +
FAN-GOD +
Fabian Retkowski +
Fer Sar +
Gabriel de Maeztu +
Gianpaolo Macario +
Giftlin Rajaiah
Gilberto Olimpio +
Gina +
Gjelt +
Graham Inggs +
Grant Roch
Grant Smith +
Grzegorz Konefał +
Guilherme Beltramini
HagaiHargil +
Hamish Pitkeathly +
Hammad Mashkoor +
Hannah Ferchland +
Hans
Haochen Wu +
Hissashi Rocha +
Iain Barr +
Ibrahim Sharaf ElDen +
Ignasi Fosch +
Igor Conrado Alves de Lima +
Igor Shelvinskyi +
Imanflow +
Ingolf Becker
Israel Saeta Pérez
Iva Koevska +
Jakub Nowacki +
Jan F-F +
Jan Koch +
Jan Werkmann
Janelle Zoutkamp +
Jason Bandlow +
Jaume Bonet +
Jay Alammar +
Jeff Reback
JennaVergeynst
Jimmy Woo +
Jing Qiang Goh +
Joachim Wagner +
Joan Martin Miralles +
Joel Nothman
Joeun Park +
John Cant +
Johnny Metz +
Jon Mease
Jonas Schulze +
Jongwony +
Jordi Contestí +
Joris Van den Bossche
José F. R. Fonseca +
Jovixe +
Julio Martinez +
Jörg Döpfert
KOBAYASHI Ittoku +
Kate Surta +
Kenneth +
Kevin Kuhl
Kevin Sheppard
Krzysztof Chomski
Ksenia +
Ksenia Bobrova +
Kunal Gosar +
Kurtis Kerstein +
Kyle Barron +
Laksh Arora +
Laurens Geffert +
Leif Walsh
Liam Marshall +
Liam3851 +
Licht Takeuchi
Liudmila +
Ludovico Russo +
Mabel Villalba +
Manan Pal Singh +
Manraj Singh
Marc +
Marc Garcia
Marco Hemken +
Maria del Mar Bibiloni +
Mario Corchero +
Mark Woodbridge +
Martin Journois +
Mason Gallo +
Matias Heikkilä +
Matt Braymer-Hayes
Matt Kirk +
Matt Maybeno +
Matthew Kirk +
Matthew Rocklin +
Matthew Roeschke
Matthias Bussonnier +
Max Mikhaylov +
Maxim Veksler +
Maximilian Roos
Maximiliano Greco +
Michael Penkov
Michael Röttger +
Michael Selik +
Michael Waskom
Mie~~~
Mike Kutzma +
Ming Li +
Mitar +
Mitch Negus +
Montana Low +
Moritz Münst +
Mortada Mehyar
Myles Braithwaite +
Nate Yoder
Nicholas Ursa +
Nick Chmura
Nikos Karagiannakis +
Nipun Sadvilkar +
Nis Martensen +
Noah +
Noémi Éltető +
Olivier Bilodeau +
Ondrej Kokes +
Onno Eberhard +
Paul Ganssle +
Paul Mannino +
Paul Reidy
Paulo Roberto de Oliveira Castro +
Pepe Flores +
Peter Hoffmann
Phil Ngo +
Pietro Battiston
Pranav Suri +
Priyanka Ojha +
Pulkit Maloo +
README Bot +
Ray Bell +
Riccardo Magliocchetti +
Ridhwan Luthra +
Robert Meyer
Robin
Robin Kiplang’at +
Rohan Pandit +
Rok Mihevc +
Rouz Azari
Ryszard T. Kaleta +
Sam Cohan
Sam Foo
Samir Musali +
Samuel Sinayoko +
Sangwoong Yoon
SarahJessica +
Sharad Vijalapuram +
Shubham Chaudhary +
SiYoungOh +
Sietse Brouwer
Simone Basso +
Stefania Delprete +
Stefano Cianciulli +
Stephen Childs +
StephenVoland +
Stijn Van Hoey +
Sven
Talitha Pumar +
Tarbo Fukazawa +
Ted Petrou +
Thomas A Caswell
Tim Hoffmann +
Tim Swast
Tom Augspurger
Tommy +
Tulio Casagrande +
Tushar Gupta +
Tushar Mittal +
Upkar Lidder +
Victor Villas +
Vince W +
Vinícius Figueiredo +
Vipin Kumar +
WBare
Wenhuan +
Wes Turner
William Ayd
Wilson Lin +
Xbar
Yaroslav Halchenko
Yee Mey
Yeongseon Choe +
Yian +
Yimeng Zhang
ZhuBaohe +
Zihao Zhao +
adatasetaday +
akielbowicz +
akosel +
alinde1 +
amuta +
bolkedebruin
cbertinato
cgohlke
charlie0389 +
chris-b1
csfarkas +
dajcs +
deflatSOCO +
derestle-htwg
discort
dmanikowski-reef +
donK23 +
elrubio +
fivemok +
fjdiod
fjetter +
froessler +
gabrielclow
gfyoung
ghasemnaddaf
h-vetinari +
himanshu awasthi +
ignamv +
jayfoad +
jazzmuesli +
jbrockmendel
jen w +
jjames34 +
joaoavf +
joders +
jschendel
juan huguet +
l736x +
luzpaz +
mdeboc +
miguelmorin +
miker985
miquelcamprodon +
orereta +
ottiP +
peterpanmj +
rafarui +
raph-m +
readyready15728 +
rmihael +
samghelms +
scriptomation +
sfoo +
stefansimik +
stonebig
tmnhat2001 +
tomneep +
topper-123
tv3141 +
verakai +
xpvpc +
zhanghui +