python - pandas DataFrame very slow when initializing with numpy structured array -
i have numpy
structured array
has integers , floats, use initialize pandas
dataframe
:
in [497]: x = np.ones(100000000, dtype=[('f0', '<i8'), ('f1', '<f8'),('f2','<i8'),('f3', '<f8'),('f4', '<f8'),('f5', '<f8'),('f6', '<f8'),('f7', '<f8')]) in [498]: %timeit pd.dataframe(x) slowest run took 4.07 times longer fastest. mean intermediate result being cached in [498]: 1 loops, best of 3: 2min 26s per loop in [499]: xx=x.view(np.float64).reshape(x.shape + (-1,)) in [500]: %timeit pd.dataframe(xx) 1 loops, best of 3: 256 ms per loop
as can seen code above, initializing dataframe
structured array
slow. however, if change data continuous float numpy array, fast. still need dataframe
have mixture of floats , integers.
after more tests, realized dataframe copying whole structured array
(this not occur when using structured array
float view initialization). found more info here: https://github.com/pydata/pandas/issues/9216
is there anyway speed initialization , avoid copying? open alternate methods, data coming structured array
.
Comments
Post a Comment