python - CPickle performance mystery -
i have python code have continuous loop pickle load inside. have 200 pickle files in loop each 80 mb each on ssd drive.
when ran code experienced performance of pickle load fluctuates continuously: @ 0,2s @ times "pauses" 4-6s debasing overall benchmark of process.
what problem?
def unpickle(filename): fo = open(filename, 'r') contents = cpickle.load(fo) fo.close() return contents xd in self.x: tt = time() xdf = unpickle(xd) tt = time() - tt print tt out:
1.87527704239 4.30886101723 0.259668111801 0.234542131424 0.228765964508 0.214528799057 0.213661909103 0.215914011002 0.217473983765 0.225739002228 the way created pickle files: have pandas dataframe column: 'name','source','level','image','path','is_train'. main data regarding size 'image'. pickle with:
def pickle(filename, data): open(filename, 'w') fo: cpickle.dump(data, fo, protocol=cpickle.highest_protocol)
your question terribly unclear (in particular, should giving enough information reproduce test case ourselves), feels gc pauses or memory defragmentation.
pickle terribly inefficient format, , processing 16 gigabytes' worth of bound cause serious memory thrashing.
Comments
Post a Comment