python - CPickle performance mystery -


i have python code have continuous loop pickle load inside. have 200 pickle files in loop each 80 mb each on ssd drive.

when ran code experienced performance of pickle load fluctuates continuously: @ 0,2s @ times "pauses" 4-6s debasing overall benchmark of process.

what problem?

def unpickle(filename):     fo = open(filename, 'r')     contents = cpickle.load(fo)     fo.close()     return contents  xd in self.x:     tt = time()                     xdf = unpickle(xd)     tt = time() - tt     print tt 

out:

1.87527704239 4.30886101723 0.259668111801 0.234542131424 0.228765964508 0.214528799057 0.213661909103 0.215914011002 0.217473983765 0.225739002228 

the way created pickle files: have pandas dataframe column: 'name','source','level','image','path','is_train'. main data regarding size 'image'. pickle with:

def pickle(filename, data):     open(filename, 'w') fo:         cpickle.dump(data, fo, protocol=cpickle.highest_protocol) 

your question terribly unclear (in particular, should giving enough information reproduce test case ourselves), feels gc pauses or memory defragmentation.

pickle terribly inefficient format, , processing 16 gigabytes' worth of bound cause serious memory thrashing.


Comments

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -