python - pandas DataFrame very slow when initializing with numpy structured array -

- April 15, 2014

i have numpy structured array has integers , floats, use initialize pandas dataframe:

in [497]: x = np.ones(100000000, dtype=[('f0', '<i8'), ('f1', '<f8'),('f2','<i8'),('f3', '<f8'),('f4', '<f8'),('f5', '<f8'),('f6', '<f8'),('f7', '<f8')])  in [498]: %timeit pd.dataframe(x) slowest run took 4.07 times longer fastest. mean intermediate result being cached   in [498]: 1 loops, best of 3: 2min 26s per loop   in [499]: xx=x.view(np.float64).reshape(x.shape + (-1,))  in [500]: %timeit pd.dataframe(xx) 1 loops, best of 3: 256 ms per loop

as can seen code above, initializing dataframe structured array slow. however, if change data continuous float numpy array, fast. still need dataframe have mixture of floats , integers.

after more tests, realized dataframe copying whole structured array (this not occur when using structured array float view initialization). found more info here: https://github.com/pydata/pandas/issues/9216

is there anyway speed initialization , avoid copying? open alternate methods, data coming structured array.

Search This Blog

Image

python - pandas DataFrame very slow when initializing with numpy structured array -

Comments

Post a Comment

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -