closes#14930
Author: Jeff Reback <jeff@reback.net>
Closes#14933 from jreback/perf and squashes the following commits:
dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue
(cherry picked from commit 07c83eedba)
IdnexError and KeyError now bubble up appropriately.
closes#14554
Author: Chris Ham <chris@christopher-ham.com>
Closes#14912 from clham/gh14554-b and squashes the following commits:
458c0cc [Chris Ham] CLN: Resubmit of GH14700. Fixes GH14554. Errors other than IndexingError and KeyError now bubble up appropriately.
(cherry picked from commit 3ccb50131b)
closes#14894
Fix usage of fast_multiget with index which was always throwing an
exception that was then caught; add ASV that show slight improvement
Author: Nate Yoder <nate@whistle.com>
Closes#14895 from nateyoder/series_dict_index and squashes the following commits:
56be091 [Nate Yoder] Update whatsnew and fix pep8 issue
5f05fdc [Nate Yoder] Fix usage of fast_multiget with index which was always throwing an exception that was then caught; add ASV that show slight improvement
(cherry picked from commit e503d40ace)
closes#14872
Author: Rodolfo Fernandez <opensourceworkAR@users.noreply.github.com>
Closes#14905 from RodolfoRFR/pandas-14872-e and squashes the following commits:
18802b4 [Rodolfo Fernandez] added 'self' to test_dtype_utc function in pandas/tests/series/test_missing
e0c6c7c [Rodolfo Fernandez] added line to whatsnew v0.19.2 and test to test_missing.py in series folder
e4ba7e0 [Rodolfo Fernandez] removed all references to _DATELIKE_DTYPES from /pandas/core/missing.py
5d37ce8 [Rodolfo Fernandez] added is_datetime64tz_dtype and changed evaluation from 'values' to dtype
19eecb2 [Rodolfo Fernandez] fixed style errors using flake8
59b91a1 [Rodolfo Fernandez] test modified
5a59eac [Rodolfo Fernandez] test modified
bc68bf7 [Rodolfo Fernandez] test modified
ba83fc8 [Rodolfo Fernandez] test
b7358de [Rodolfo Fernandez] bug fixed
(cherry picked from commit f3c5a427cc)
Patches the following behaviour when `na_values` is passed in as a
dictionary: 1. Prevent aliasing in case `na_values` was defined in
a broader scope. 2. Respect column indices as keys when doing NA
conversions. Closes#14203.
Author: gfyoung <gfyoung17@gmail.com>
Closes#14751 from gfyoung/csv-na-values-patching and squashes the following commits:
cac422c [gfyoung] BUG: Respect column indices for dict-like na_values
1439c27 [gfyoung] BUG: Prevent aliasing of dict na_values
(cherry picked from commit dd8cba2767)
closes#13936
Author: Christopher C. Aycock <christopher.aycock@twosigma.com>
Closes#14783 from chrisaycock/GH13936 and squashes the following commits:
ffcf0c2 [Christopher C. Aycock] Added test to reject float16; fixed typos
1f208a8 [Christopher C. Aycock] Use tuple representation instead of strings
77eb47b [Christopher C. Aycock] Merge master branch into GH13936
89256f0 [Christopher C. Aycock] Test 8-bit integers and raise error on 16-bit floats; add comments
0ad1687 [Christopher C. Aycock] Fixed whatsnew
2bce3cc [Christopher C. Aycock] Revert dict back to PyObjectHashTable in response to code review
fafbb02 [Christopher C. Aycock] Updated benchmarks to reflect new ASV setup
5eeb7d9 [Christopher C. Aycock] Merge master into GH13936
c33c4cb [Christopher C. Aycock] Merge branch 'master' into GH13936
46cc309 [Christopher C. Aycock] Update documentation
f01142c [Christopher C. Aycock] Merge master branch
75157fc [Christopher C. Aycock] merge_asof() has type specializations and can take multiple 'by' parameters (#13936)
(cherry picked from commit e7df7516ff)
BUG: Fixed KDE plot to ignore missing values
closes#14821
* fixed kde plot to ignore the missing values
* added comment to elaborate the changes made
* added a release note in whatsnew/0.19.2
* added test to check for missing values and cleaned up whatsnew doc
* added comment to refer the issue
* modified to fit lint checks
* replaced ._xorig with .get_xdata()
(cherry picked from commit 033d34596f)
closes#10381
Author: Pietro Battiston <me@pietrobattiston.it>
Closes#14812 from toobaz/to_hdf_min_itemsize and squashes the following commits:
c07f1e4 [Pietro Battiston] Whatsnew
38b8fcc [Pietro Battiston] Tests for previous commit
c838afa [Pietro Battiston] BUG: set min_itemsize even when there is no need to validate (#10381)
(cherry picked from commit e833096244)
closes#14844
Author: Christopher C. Aycock <christopher.aycock@twosigma.com>
Closes#14845 from chrisaycock/GH14844 and squashes the following commits:
97b73a8 [Christopher C. Aycock] BUG: Allow TZ-aware DatetimeIndex in merge_asof() (#14844)
(cherry picked from commit e991141f3c)
closes#11847
Changed the way
in which the original data frame is copied (dropped use of .values,
since it does not preserve dtypes).
Author: Pawel Kordek <pawel.kordek@gmail.com>
Closes#14053 from kordek/#11847 and squashes the following commits:
6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object
(cherry picked from commit d531718749)
* BUG: we don't like hash collisions in siphash
xref #14767
* This should be a 64-bit int, not an 8-bit int
* fix tests
(cherry picked from commit 51f725f7e8)
closes#7626
Subsets of tabular files with different "shapes"
will now load when a valid skiprows/nrows is given as an argument -
Conditions
for error: 1) There are different "shapes" within a tabular data
file, i.e. different numbers of columns. 2) A "narrower" set of
columns is followed by a "wider" (more columns) one, and the narrower
set is laid out such that the end of a 262144-byte block occurs within
it. Issue summary: The C engine for parsing files reads in 262144
bytes at a time. Previously, the "start_lines" variable in
tokenizer.c/tokenize_bytes() was set incorrectly to the first line in
that chunk, rather than the overall first row requested. This lead to
incorrect logic on when to stop reading when nrows is supplied by the
user. This always happened but only caused a crash when a wider set of
columns followed in the file. In other cases, extra rows were read in
but then harmlessly discarded. This pull request always uses the
first requested row for comparisons, so only nrows will be parsed
when supplied.
Author: Jeff Carey <jeff.carey@gmail.com>
Closes#14747 from jeffcarey/fix/7626 and squashes the following commits:
cac1bac [Jeff Carey] Removed duplicative test
6f1965a [Jeff Carey] BUG: Corrects stopping logic when nrows argument is supplied (Fixes#7626)
(cherry picked from commit 4378f82967)
Conflicts:
pandas/io/tests/parser/c_parser_only.py
closes#11412
Author: Pietro Battiston <me@pietrobattiston.it>
Closes#14728 from toobaz/minitemsizefix and squashes the following commits:
e25cd1f [Pietro Battiston] Whatsnew
b9bb88f [Pietro Battiston] Tests for previous commit
6406ee8 [Pietro Battiston] BUG: Ensure min_itemsize is always a list
(cherry picked from commit 53bf1b27c7)
closes#14776
Author: Jeff Reback <jeff@reback.net>
Closes#14777 from jreback/mi_sort and squashes the following commits:
cf31905 [Jeff Reback] BUG: Bug in a groupby of a non-lexsorted MultiIndex and multiple grouping levels
(cherry picked from commit f23010aa93)
closes#14435
Author: Chris <cbartak@gmail.com>
Closes#14791 from chris-b1/hdf-mi-datacolumns and squashes the following commits:
5d32610 [Chris] BUG: multi-index HDFStore data_columns=True
(cherry picked from commit 27fcd811f5)
When .replace is called with
`dict`, replacements are done per value. Current impl try to soft
convert the dtype in every replacement, but it is enough to be done in
the final replacement.
Author: sinhrks <sinhrks@gmail.com>
Closes#12745 from sinhrks/replace_perf and squashes the following commits:
ffc59b0 [sinhrks] PERF: Improve replace perf
(cherry picked from commit e299560dff)
xref #14729
Author: Jeff Reback <jeff@reback.net>
Closes#14767 from jreback/hashing_object and squashes the following commits:
9a5a5d4 [Jeff Reback] ERR: raise on python in object hashing, only supporting strings, nulls
(cherry picked from commit de1132d878)
Python's native CSV library does not respect the
skipfooter parameter, so if one of those skipped
rows is malformed, it will still raise an error.
Closes gh-13879.
(cherry picked from commit dfeae396c8)
If there is a field counts mismatch, check whether
a multi-char sep was used in conjunction with quotes.
Currently, that setup is not respected and can result
in improper line breaks.
Closes gh-13374.
(cherry picked from commit d8e427bda0)