Python3 pickling

Recently I was converting some old Python2 code to Python3 and I ran across a problem pickling and unpickling.

I guess I would say it wasn’t a major problem because I found the solution fairly quickly with a bit of googling around.

Still, I think the problem and its solution are worth a quick note.  Others will stumble across this problem in the future, especially because there are code examples floating around (in printed books and online posts) that will lead new Python programmers to make this very same mistake.

So let’s talk about pickling.

Suppose you want to “pickle” an object — dump it to a pickle file for persistent storage.

When you pickle an object, you do two things.

  • You open the file that you want to use as the pickle file. The open(…) returns a file handle object.
  • You pass the object that you want to pickle, and the file handle object, to pickle.

Your code might look something like this. Note that this code is wrong. See below.

fileHandle = open(pickleFileName, "w")
pickle.dump(objectToBePickled, fileHandle)

When I wrote code like this, I got back this error message:

Pickler(file, protocol, fix_imports=fix_imports).dump(obj)
TypeError: must be str, not bytes

Talk about a crappy error message!!!

After banging my head against the wall for a while, I googled around and quickly found a very helpful answer on StackOverflow.

The bottom line is that a Python pickle file is (and always has been) a byte stream. Which means that you should always open a pickle file in binary mode: “wb” to write it, and “rb” to read it. The Python docs contain correct example code.

My old code worked just fine running under Python2 (on Windows).  But with Python3′s new strict separation of strings and bytes, it broke. Changing “w” to “wb”, and “r” to “rb”, fixed it. 


One person who posted a question about this problem on the Python forum was aware of the issue, but confused because he was trying to pickle a string.

import pickle
a = "blah"
file = open('state', 'w')
pickle.dump(a,file)

I know of one easy way to solve this is to change the operation argument from ‘w’ to ‘wb’ but I AM using a string not bytes! And none of the examples use ‘wb’ (I figured that out separately) so I want to have an understanding of what is going on here.

Basically, regardless of the kind of object that you are pickling (even a string object), the object will be converted to a bytes representation and pickled as a byte stream. Which means that you always need to use “rb” and “wb”, regardless of the kind of object that you are pickling.

Newline conversion in Python 3

I use Python on both Windows and Unix.  Occasionally when running on Windows  I need to read in a file containing Windows newlines and write it out with Unix/Linux newlines.  And sometimes when running on Unix, I need to run the newline conversion in the other direction.

Prior to Python 3, the accepted way to do this was to read data from the file in binary mode, convert the newline characters in the data, and then write the data out again in binary mode. The Tools/Scripts directory contained two scripts (crlf.py and lfcr.py) with illustrative examples. Here, for instance is the key code from crlf.py (Windows to Unix conversion)

        data = open(filename, "rb").read()
        newdata = data.replace("\r\n", "\n")
        if newdata != data:
            f = open(filename, "wb")
            f.write(newdata)
            f.close()

But if you try to do that with Python 3+, it won’t work.

The key to what will work is the new “newline” argument for the built-in file open() function. It is documented here.

The key point from that documentation is this:

newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

So now when I want to convert a file from Windows-style newlines to Linux-style newlines, I do this:

filename = "NameOfFileToBeConverted"
fileContents = open(filename,"r").read()
f = open(filename,"w", newline="\n")
f.write(fileContents)
f.close()

How do I reverse a string in Python 3?

With the improved support for Unicode in Python3, more and more folks will be working with languages (Arabic, Hebrew, etc.) that read right-to-left rather than left-to-right. So more and more folks will have a need to reverse a string.

Unfortunately, Python doesn’t have a built-in function, nor do string objects have a built-in method, to do what they will want.  The obvious techniques don’t work. This:

        try:
            print(1)
            s = "a b c"
            s = reverse(s)
            print(s)
        except Exception as e:
            print(e)

        try:
            print(2)
            s = "a b c"
            s = reversed(s)
            print(s)
        except Exception as e:
            print(e)

        try:
            print(3)
            s = "a b c"
            s.reverse()
            print(s)
        except Exception as e:
            print(e)

        try:
            print(4)
            s = "a b c"
            s.reversed()
            print(s)
        except Exception as e:
            print(e)

produces this output

        
        1
        name 'reverse' is not defined
        2
        <reversed object at 0x00BAB5F0>
        3
        'str' object has no attribute 'reverse'
        4
        'str' object has no attribute 'reversed'

Fortunately, the solution is not too difficult. A little one-line function will do the trick.

I call the function “rev” rather than “reverse” on the chance that Python will eventually acquire its own builtin function named “reverse”.

        def rev(s): return s[::-1]

In a comment, Michael Watkins has noted another possible implementation of the “rev” function.

        def rev(s): ''.join(reversed(s))

Now

        try:
            print(5)
            s = "a b c"
            s = rev(s)
            print(s)
        except Exception as e:
            print(e)

produces

        5
        c b a