Tuesday, August 11, 2009

Python cPickle

Your Basic Pickle
Serialization, also called pickling or flattening, converts structured data into a data stream format. Essentially, this means that structures such as lists, tuples, functions, and classes are preserved using ASCII characters between data values. The pickle data format is standardized, so strings serialized with pickle can be deserialized with cPickle and vice versa.

The main difference between cPickle and pickle is performance. The cPickle module is many times faster to execute because it's written in C and because its methods are functions instead of classes. While this improves performance, it also means that the cPickle methods can't be extended or customized, whereas pickle classes can.

Serving Up Condiments: A cPickle Example
First, the condiments.py script informs Python that we'll be using the cPickle module:
import cPickle

Next, I define the object I want to serialise and store. In this case, it's a list of condiments I've got in the fridge:
inFridge = ["ketchup", "mustard", "relish"]
print inFridge


That print statement's output will display the following:
['ketchup', 'mustard', 'relish']

I want to save my results in a file called fridge.txt, so the script creates a file handler and opens the file for writing:
FILE = open("fridge.txt", 'w')

Now the magic happens. I call the cPickle command, dump, to pickle my data and dump the results to the file:
cPickle.dump(inFridge, FILE)

I'm finished for now, so the script closes the file:
FILE.close()

I now have a file, fridge.txt, that contains the following:
(lp1
S'ketchup'
p2
aS'mustard'
p3
aS'relish'
p4
a.

The pickle and cPickle modules have an option to save the information in a binary format; however, I've used the default ASCII because it is human-readable.

Now I've looked in my kitchen and realised I also have pickles. I can add them to my list and reserialize it, and cPickle will remember what's contained there without duplicating the information:

inFridge.append("pickles")
print inFridge

The output of my print command now displays:
['ketchup', 'mustard', 'relish', 'pickles']

That looks right, so I have my script repickle the inFridge list and add it to the file:
FILE = open("fridge.txt", 'w')
cPickle.dump(inFridge, FILE)
FILE.close()

To get my information out of the file and back into a useable list, I simply open the file for reading and use the cPickle.load command to unpickle it. For the purposes of demonstration, I've used a new variable, inFridgeFile, to store the results:
FILE = open("fridge.txt", 'r')
inFridgeFile = cPickle.load(FILE)
FILE.close()
print inFridgeFile

The output of the print command displays:
['ketchup', 'mustard', 'relish', 'pickles']

When I repickled my list, cPickle recognized my original contents and didn't duplicate them. The inFridgeFile variable contains my information restored to its original list format.

To Put A Lid On It
You have various options for serializing your data in Python, including pickle and cPickle. My cPickle example showed that this implementation is truly handy, especially since these commands will preserve your original object and allow modifications to be made even after it has been processed. This functionality will keep you from being stuck in a pickle next time you're saving or transmitting objects.

References:
http://news.zdnet.co.uk/software/0,1000000121,2120888-2,00.htm

2 comments:

Anonymous said...

Thank you for this clear explanation.

Yasmen R. El-Shaa'rawy said...

You welcome!