Wednesday, August 18, 2010

Which one is better among ZODB, Pickle and Shelve???

I have worked with all three per my requirement.
I am having very huge data file min 10GB, and this contains user record with a field payments. And user id is repeatitive, i mean there might be 1k users with same user id but having different payments in data file, so i have to do sum operation on this file so that for single user id, there is summation of payments.

I tried three approach here to do my work :

ZODB Approach:
I thought ZODB might help me to do this operation, i store user id as key and their payment as value, and basd on that i can update it through loop. But, this only works for small files, for bigger file, it hangs up the system, it forces all memory and CPU to use python. It seems impossible to do other operation while ZODB program runs.

Pickle Approach: What i google about Pickle/cPickle, found it good for object serialization. So i thought lets try it for my work. I implement the same algo(ZODB Approach key-value), but, Pickle ends with "memory error". My system is having 40GB of RAM and 100GB Swap. And also i found that pickling is taking 98% memory and 40% CPU as average to do my work.

Shelve Approach: Shelve is using Pickle as base but it did my job well. I dont know why Shelve works but not Pickle.I follow the same algo (ZODB Approach way). The only drawbake here i found is, while I/O operation is going on to shelve file it slows down other I/O. It takes 3-4% memory and 1% CPU uses as average.

So, my choice is to Shelve for bigger file but ZODB/Pickle rocks for data files having size less than 4 GB.

No comments: