This is a minor release from 0.15.1 and includes a large number of bug fixes along with several new features, enhancements, and performance improvements. A small number of API changes were necessary to fix existing bugs. We recommend that all users upgrade to this version.
Enhancements
API Changes
Performance Improvements
Bug Fixes
Indexing in MultiIndex beyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (GH2646)
MultiIndex
In [1]: df = pd.DataFrame({'jim':[0, 0, 1, 1], ...: 'joe':['x', 'x', 'z', 'y'], ...: 'jolie':np.random.rand(4)}).set_index(['jim', 'joe']) ...: In [2]: df Out[2]: jolie jim joe 0 x 0.126970 x 0.966718 1 z 0.260476 y 0.897237 [4 rows x 1 columns] In [3]: df.index.lexsort_depth Out[3]: 1 # in prior versions this would raise a KeyError # will now show a PerformanceWarning In [4]: df.loc[(1, 'z')] Out[4]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns] # lexically sorting In [5]: df2 = df.sort_index() In [6]: df2 Out[6]: jolie jim joe 0 x 0.126970 x 0.966718 1 y 0.897237 z 0.260476 [4 rows x 1 columns] In [7]: df2.index.lexsort_depth Out[7]: 2 In [8]: df2.loc[(1,'z')] Out[8]: jolie jim joe 1 z 0.260476 [1 rows x 1 columns]
Bug in unique of Series with category dtype, which returned all categories regardless whether they were “used” or not (see GH8559 for the discussion). Previous behaviour was to return all categories:
category
In [3]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [4]: cat Out[4]: [a, b, a] Categories (3, object): [a < b < c] In [5]: cat.unique() Out[5]: array(['a', 'b', 'c'], dtype=object)
Now, only the categories that do effectively occur in the array are returned:
In [9]: cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) In [10]: cat.unique() Out[10]: [a, b] Categories (2, object): [a, b]
Series.all and Series.any now support the level and skipna parameters. Series.all, Series.any, Index.all, and Index.any no longer support the out and keepdims parameters, which existed for compatibility with ndarray. Various index types no longer support the all and any aggregation functions and will now raise TypeError. (GH8302).
Series.all
Series.any
level
skipna
Index.all
Index.any
out
keepdims
all
any
TypeError
Allow equality comparisons of Series with a categorical dtype and object dtype; previously these would raise TypeError (GH8938)
Bug in NDFrame: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute named y existed, data.y would return the attribute, while data.y = z would update the column (GH8994)
NDFrame
y
data.y
data.y = z
In [11]: data = pd.DataFrame({'x': [1, 2, 3]}) In [12]: data.y = 2 In [13]: data['y'] = [2, 4, 6] In [14]: data Out[14]: x y 0 1 2 1 2 4 2 3 6 [3 rows x 2 columns] # this assignment was inconsistent In [15]: data.y = 5
Old behavior:
In [6]: data.y Out[6]: 2 In [7]: data['y'].values Out[7]: array([5, 5, 5])
New behavior:
In [16]: data.y Out[16]: 5 In [17]: data['y'].values Out[17]: array([2, 4, 6])
Timestamp('now') is now equivalent to Timestamp.now() in that it returns the local time rather than UTC. Also, Timestamp('today') is now equivalent to Timestamp.today() and both have tz as a possible argument. (GH9000)
Timestamp('now')
Timestamp.now()
Timestamp('today')
Timestamp.today()
tz
Fix negative step support for label-based slices (GH8753)
In [1]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) Out[1]: a 0 b 1 c 2 dtype: int64 In [2]: s.loc['c':'a':-1] Out[2]: c 2 dtype: int64
In [18]: s = pd.Series(np.arange(3), ['a', 'b', 'c']) In [19]: s.loc['c':'a':-1] Out[19]: c 2 b 1 a 0 Length: 3, dtype: int64
Categorical enhancements:
Categorical
Added ability to export Categorical data to Stata (GH8633). See here for limitations of categorical variables exported to Stata data files.
Added flag order_categoricals to StataReader and read_stata to select whether to order imported categorical data (GH8836). See here for more information on importing categorical variables from Stata data files.
order_categoricals
StataReader
read_stata
Added ability to export Categorical data to to/from HDF5 (GH7621). Queries work the same as if it was an object array. However, the category dtyped data is stored in a more efficient manner. See here for an example and caveats w.r.t. prior versions of pandas.
Added support for searchsorted() on Categorical class (GH8420).
searchsorted()
Other enhancements:
Added the ability to specify the SQL type of columns when writing a DataFrame to a database (GH8778). For example, specifying to use the sqlalchemy String type instead of the default Text type for string columns:
String
Text
from sqlalchemy.types import String data.to_sql('data_dtype', engine, dtype={'Col_1': String}) # noqa F821
Series.all and Series.any now support the level and skipna parameters (GH8302):
In [20]: s = pd.Series([False, True, False], index=[0, 0, 1]) In [21]: s.any(level=0) Out[21]: 0 True 1 False Length: 2, dtype: bool
Panel now supports the all and any aggregation functions. (GH8302):
Panel
>>> p = pd.Panel(np.random.rand(2, 5, 4) > 0.1) >>> p.all() 0 1 2 3 0 True True True True 1 True False True True 2 True True True True 3 False True False True 4 True True True True
Added support for utcfromtimestamp(), fromtimestamp(), and combine() on Timestamp class (GH5351).
utcfromtimestamp()
fromtimestamp()
combine()
Added Google Analytics (pandas.io.ga) basic documentation (GH8835). See here.
Timedelta arithmetic returns NotImplemented in unknown cases, allowing extensions by custom classes (GH8813).
Timedelta
NotImplemented
Timedelta now supports arithmetic with numpy.ndarray objects of the appropriate dtype (numpy 1.8 or newer only) (GH8884).
numpy.ndarray
Added Timedelta.to_timedelta64() method to the public API (GH8884).
Timedelta.to_timedelta64()
Added gbq.generate_bq_schema() function to the gbq module (GH8325).
gbq.generate_bq_schema()
Series now works with map objects the same way as generators (GH8909).
Series
Added context manager to HDFStore for automatic closing (GH8791).
HDFStore
to_datetime gains an exact keyword to allow for a format to not require an exact match for a provided format string (if its False). exact defaults to True (meaning that exact matching is still the default) (GH8904)
to_datetime
exact
False
True
Added axvlines boolean option to parallel_coordinates plot function, determines whether vertical lines will be printed, default is True
axvlines
Added ability to read table footers to read_html (GH8552)
to_sql now infers data types of non-NA values for columns that contain NA values and have dtype object (GH8778).
to_sql
object
Reduce memory usage when skiprows is an integer in read_csv (GH8681)
Performance boost for to_datetime conversions with a passed format=, and the exact=False (GH8904)
format=
exact=False
Bug in concat of Series with category dtype which were coercing to object. (GH8641)
Bug in Timestamp-Timestamp not returning a Timedelta type and datelike-datelike ops with timezones (GH8865)
Made consistent a timezone mismatch exception (either tz operated with None or incompatible timezone), will now return TypeError rather than ValueError (a couple of edge cases only), (GH8865)
ValueError
Bug in using a pd.Grouper(key=...) with no level/axis or level only (GH8795, GH8866)
pd.Grouper(key=...)
Report a TypeError when invalid/no parameters are passed in a groupby (GH8015)
Bug in packaging pandas with py2app/cx_Freeze (GH8602, GH8831)
py2app/cx_Freeze
Bug in groupby signatures that didn’t include *args or **kwargs (GH8733).
groupby
io.data.Options now raises RemoteDataError when no expiry dates are available from Yahoo and when it receives no data from Yahoo (GH8761), (GH8783).
io.data.Options
RemoteDataError
Unclear error message in csv parsing when passing dtype and names and the parsed data is a different data type (GH8833)
Bug in slicing a MultiIndex with an empty list and at least one boolean indexer (GH8781)
io.data.Options now raises RemoteDataError when no expiry dates are available from Yahoo (GH8761).
Timedelta kwargs may now be numpy ints and floats (GH8757).
Fixed several outstanding bugs for Timedelta arithmetic and comparisons (GH8813, GH5963, GH5436).
sql_schema now generates dialect appropriate CREATE TABLE statements (GH8697)
sql_schema
CREATE TABLE
slice string method now takes step into account (GH8754)
slice
Bug in BlockManager where setting values with different type would break block integrity (GH8850)
BlockManager
Bug in DatetimeIndex when using time object as key (GH8667)
DatetimeIndex
time
Bug in merge where how='left' and sort=False would not preserve left frame order (GH7331)
merge
how='left'
sort=False
Bug in MultiIndex.reindex where reindexing at level would not reorder labels (GH4088)
MultiIndex.reindex
Bug in certain operations with dateutil timezones, manifesting with dateutil 2.3 (GH8639)
Regression in DatetimeIndex iteration with a Fixed/Local offset timezone (GH8890)
Bug in to_datetime when parsing a nanoseconds using the %f format (GH8989)
%f
Fix: The font size was only set on x axis if vertical or the y axis if horizontal. (GH8765)
Fixed division by 0 when reading big csv files in python 3 (GH8621)
Bug in outputting a MultiIndex with to_html,index=False which would add an extra column (GH8452)
to_html,index=False
Imported categorical variables from Stata files retain the ordinal information in the underlying data (GH8836).
Defined .size attribute across NDFrame objects to provide compat with numpy >= 1.9.1; buggy with np.array_split (GH8846)
.size
np.array_split
Skip testing of histogram plots for matplotlib <= 1.2 (GH8648).
Bug where get_data_google returned object dtypes (GH3995)
get_data_google
Bug in DataFrame.stack(..., dropna=False) when the DataFrame’s columns is a MultiIndex whose labels do not reference all its levels. (GH8844)
DataFrame.stack(..., dropna=False)
columns
labels
levels
Bug in that Option context applied on __enter__ (GH8514)
__enter__
Bug in resample that causes a ValueError when resampling across multiple days and the last offset is not calculated from the start of the range (GH8683)
Bug where DataFrame.plot(kind='scatter') fails when checking if an np.array is in the DataFrame (GH8852)
DataFrame.plot(kind='scatter')
Bug in pd.infer_freq/DataFrame.inferred_freq that prevented proper sub-daily frequency inference when the index contained DST days (GH8772).
pd.infer_freq/DataFrame.inferred_freq
Bug where index name was still used when plotting a series with use_index=False (GH8558).
use_index=False
Bugs when trying to stack multiple columns, when some (or all) of the level names are numbers (GH8584).
Bug in MultiIndex where __contains__ returns wrong result if index is not lexically sorted or unique (GH7724)
__contains__
BUG CSV: fix problem with trailing white space in skipped rows, (GH8679), (GH8661), (GH8983)
Regression in Timestamp does not parse ‘Z’ zone designator for UTC (GH8771)
Timestamp
Bug in StataWriter the produces writes strings with 244 characters irrespective of actual size (GH8969)
Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (GH8965)
Bug in DataReader returns object dtype if there are missing values (GH8980)
Bug in plotting if sharex was enabled and index was a timeseries, would show labels on multiple axes (GH3964).
Bug where passing a unit to the TimedeltaIndex constructor applied the to nano-second conversion twice. (GH9011).
Bug in plotting of a period-like array (GH9012)
A total of 49 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Aaron Staple
Angelos Evripiotis +
Artemy Kolchinsky
Benoit Pointet +
Brian Jacobowski +
Charalampos Papaloizou +
Chris Warth +
David Stephens
Fabio Zanini +
Francesc Via +
Henry Kleynhans +
Jake VanderPlas +
Jan Schulz
Jeff Reback
Jeff Tratner
Joris Van den Bossche
Kevin Sheppard
Matt Suggit +
Matthew Brett
Phillip Cloud
Rupert Thompson +
Scott E Lasley +
Stephan Hoyer
Stephen Simmons +
Sylvain Corlay +
Thomas Grainger +
Tiago Antao +
Tom Augspurger
Trent Hauck
Victor Chaves +
Victor Salgado +
Vikram Bhandoh +
WANG Aiyong
Will Holmgren +
behzad nouri
broessli +
charalampos papaloizou +
immerrr
jnmclarty
jreback
mgilbert +
onesandzeroes
peadarcoyle +
rockg
seth-p
sinhrks
unutbu
wavedatalab +
Åsmund Hjulstad +