Python3 pickling

Recently I was converting some old Python2 code to Python3 and I ran across a problem pickling and unpickling.

I guess I would say it wasn’t a major problem because I found the solution fairly quickly with a bit of googling around.

Still, I think the problem and its solution are worth a quick note.  Others will stumble across this problem in the future, especially because there are code examples floating around (in printed books and online posts) that will lead new Python programmers to make this very same mistake.

So let’s talk about pickling.

Suppose you want to “pickle” an object — dump it to a pickle file for persistent storage.

When you pickle an object, you do two things.

  • You open the file that you want to use as the pickle file. The open(…) returns a file handle object.
  • You pass the object that you want to pickle, and the file handle object, to pickle.

Your code might look something like this. Note that this code is wrong. See below.

fileHandle = open(pickleFileName, "w")
pickle.dump(objectToBePickled, fileHandle)

When I wrote code like this, I got back this error message:

Pickler(file, protocol, fix_imports=fix_imports).dump(obj)
TypeError: must be str, not bytes

Talk about a crappy error message!!!

After banging my head against the wall for a while, I googled around and quickly found a very helpful answer on StackOverflow.

The bottom line is that a Python pickle file is (and always has been) a byte stream. Which means that you should always open a pickle file in binary mode: “wb” to write it, and “rb” to read it. The Python docs contain correct example code.

My old code worked just fine running under Python2 (on Windows).  But with Python3’s new strict separation of strings and bytes, it broke. Changing “w” to “wb”, and “r” to “rb”, fixed it. 


One person who posted a question about this problem on the Python forum was aware of the issue, but confused because he was trying to pickle a string.

import pickle
a = "blah"
file = open('state', 'w')
pickle.dump(a,file)

I know of one easy way to solve this is to change the operation argument from ‘w’ to ‘wb’ but I AM using a string not bytes! And none of the examples use ‘wb’ (I figured that out separately) so I want to have an understanding of what is going on here.

Basically, regardless of the kind of object that you are pickling (even a string object), the object will be converted to a bytes representation and pickled as a byte stream. Which means that you always need to use “rb” and “wb”, regardless of the kind of object that you are pickling.

Newline conversion in Python 3

I use Python on both Windows and Unix.  Occasionally when running on Windows  I need to read in a file containing Windows newlines and write it out with Unix/Linux newlines.  And sometimes when running on Unix, I need to run the newline conversion in the other direction.

Prior to Python 3, the accepted way to do this was to read data from the file in binary mode, convert the newline characters in the data, and then write the data out again in binary mode. The Tools/Scripts directory contained two scripts (crlf.py and lfcr.py) with illustrative examples. Here, for instance is the key code from crlf.py (Windows to Unix conversion)

        data = open(filename, "rb").read()
        newdata = data.replace("\r\n", "\n")
        if newdata != data:
            f = open(filename, "wb")
            f.write(newdata)
            f.close()

But if you try to do that with Python 3+, it won’t work.

The key to what will work is the new “newline” argument for the built-in file open() function. It is documented here.

The key point from that documentation is this:

newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

  • On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

So now when I want to convert a file from Windows-style newlines to Linux-style newlines, I do this:

filename = "NameOfFileToBeConverted"
fileContents = open(filename,"r").read()
f = open(filename,"w", newline="\n")
f.write(fileContents)
f.close()

How do I reverse a string in Python 3?

With the improved support for Unicode in Python3, more and more folks will be working with languages (Arabic, Hebrew, etc.) that read right-to-left rather than left-to-right. So more and more folks will have a need to reverse a string.

Unfortunately, Python doesn’t have a built-in function, nor do string objects have a built-in method, to do what they will want.  The obvious techniques don’t work. This:

        try:
            print(1)
            s = "a b c"
            s = reverse(s)
            print(s)
        except Exception as e:
            print(e)

        try:
            print(2)
            s = "a b c"
            s = reversed(s)
            print(s)
        except Exception as e:
            print(e)

        try:
            print(3)
            s = "a b c"
            s.reverse()
            print(s)
        except Exception as e:
            print(e)

        try:
            print(4)
            s = "a b c"
            s.reversed()
            print(s)
        except Exception as e:
            print(e)

produces this output

        
        1
        name 'reverse' is not defined
        2
        <reversed object at 0x00BAB5F0>
        3
        'str' object has no attribute 'reverse'
        4
        'str' object has no attribute 'reversed'

Fortunately, the solution is not too difficult. A little one-line function will do the trick.

I call the function “rev” rather than “reverse” on the chance that Python will eventually acquire its own builtin function named “reverse”.

        def rev(s): return s[::-1]

In a comment, Michael Watkins has noted another possible implementation of the “rev” function.

        def rev(s): ''.join(reversed(s))

Now

        try:
            print(5)
            s = "a b c"
            s = rev(s)
            print(s)
        except Exception as e:
            print(e)

produces

        5
        c b a

Moving to Python 3.0 (part3)

Goto start of series | Goto next in series

I would like for easygui to run under both Python 2.6 and under version 3.0. But in the move to 3.0, changes have been made (both to the language and to the the standard library) that make it difficult for the same code to run under both versions.

The first thing that I did was to side-step the issues around the new print function.

  • I let 2to3 convert all of my old print statements to calls to the new print function.
  • I did a global text change of “print(” to “writeln(“.
  • I wrote a little writeln function.
def write(*args):
  args = [str(arg) for arg in args]
  args = " ".join(args)
  sys.stdout.write(args)

def writeln(*args):
  write(*args)
  sys.stdout.write("\n")

The result is code that uses neither the old print statement nor the new print function. Where I previously had

print a, b, c

I now have

writeln(a,b,c)

In the future, when the day comes when everybody is running version 3+, I will simply change “writeln” back to “print” and I will have completely standard Python code.

A bit more tricky was the fact that in the standard library, “Tkinter” was renamed to “tkinter” and “tkFileDialog” was renamed to “tkinter.filedialog“.

To make code that will run under both 2.6 and 3.0, you have to find out which version of Python you’re running and then execute code that is appropriate to that version.

The python documentation for sys.hexversion says

sys.hexversion contains the version number encoded as a single integer. This is guaranteed to increase with each version, including proper support for non-production releases. For example, to test that the Python interpreter is at least version 1.5.2, use:

if sys.hexversion >= 0x010502F0:
# use some advanced feature
...
else:
# use an alternative implementation or warn the user
...

So here’s what I did.

if sys.hexversion >= 0x030000F0:
 runningPython3 = True
else:
 runningPython3 = False

if runningPython3:
 from tkinter import *
 import tkinter.filedialog as tk_FileDialog
else:
 from Tkinter import *
 import tkFileDialog as tk_FileDialog

In my code, I changed all remaining occurrences of “tkFileDialog ” to “tk_FileDialog“.

Now I have code that runs under both version 2.6 and under version 3.0.

I was pretty lucky; my situation wasn’t too complicated. I’m sure there are other folks for whom the transition will be much more difficult. But if you’re doing some fairly simple and basic stuff with Python, what worked for me might be enough to work for you.

Moving to Python 3.0 (part2)

Goto start of series| Goto next in series

Not too long ago, I posted this query on comp.lang.py

I'd like to install both 2.6 and 3.0 together on the same Windows
(Vista) machine, so I can test programs under both versions.

Is it possible to install both versions on the same Windows machine in
such a way that they are both easily available and don't interfere with
one another?  I'm concerned, for example, that if I install both, the
second installation will over-write some Python entry in the registry.

I received four replies. They all were very helpful, and I recommend that you look at them for yourself: http://groups.google.com/group/comp.lang.python/browse_thread/thread/8dcb36c8d4c8e607#

Since what I want to do is pretty simple, for me they boil down to:

It's easy - the registry isn't used except to associate files.
The associations are made with the most-recently-installed version.

I currently have 2.4, 2.5, 2.6 and 3.0 on my Windows machine.

-- Steve Holden

and

Use batch scripts to setup your PATH and PYTHONPATH. This will not
solve the file association problem, but you can probably set up your
"SEND TO" folder to handle the different versions.

-- Dutch Masters

Setting up the batch file was easy. Here’s an example batch file called p30.bat. You can easily adapt it to run Python 2.5 or 2.6 or whatever.

@echo off
::---------------------------------------------------------
:: Name of this batch file is p30.bat
::---------------------------------------------------------

::---------------------------------------------------------
:: set up the constant PYDIR that controls what version
:: of Python we want this batch file to run.
::---------------------------------------------------------
:: To create p25.bat, p26.bat, p31.bat, etc., copy this file
:: to a file with the new name and change only this line.
::---------------------------------------------------------
SET PYDIR=c:\Python30

::---------------------------------------------------------
:: set PYTHONPATH so it includes the site-packages directory
:: of the right version of Python
::---------------------------------------------------------
:: For convenience, I include C:\pyapps, where I keep some
:: Python utilities, but you don't need to do that.
::---------------------------------------------------------
SET PYTHONPATH=C:\pyapps;%PYDIR%\Lib\site-packages

::---------------------------------------------------------
:: Save the PATH setting, so we can restore it later
::---------------------------------------------------------
SET SAVEPATH=%PATH%

::---------------------------------------------------------
:: reset PATH so it includes the PYDIR directory, where
:: the Python executable lives.
::---------------------------------------------------------
PATH=%PYDIR%;%PATH%

::---------------------------------------------------------
:: run the desired Python executable, passing it at least
:: the name of the Python script file that we want to run.
::---------------------------------------------------------
python.exe %1 %2 %3 %4 %5 %6 %7 %8 %9

::---------------------------------------------------------
:: restore the PATH setting
::---------------------------------------------------------
PATH=%SAVEPATH%

::---------------------------------------------------------
:: clean up the environment variables that we created
::---------------------------------------------------------
SET SAVEPATH=
SET PYTHONPATH=
SET PYDIR=

Now, to test easygui under 2.6 and 3.0, I can simply do this:

p26 easygui.py

or

p30 easygui.py

Moving to Python 3.0 (part1)

Goto next in series

This week I am working as tech support in the blogger lounge at CES. So far our bloggers have proved to be pretty tech-savvy: support requests have been intermittent. Since my time between support requests is pretty open, I thought I’d take this opportunity to experiment with converting some of my Python apps to Python 3.0. I started with Easygui.

After installing Python3, I went to the \Python30\Tools\Scripts folder, where I found 2to3.py. (As you can see by the backslashes in the path, I’m running under Windows.)

To make life easier, I copied 2to3.py into the same folder as easygui.py. Then I simply ran 2to3.py like this.

C:\CES2009\ferg\pyapps>python 2to3.py
At least one file or directory argument required.
Use --help to show usage.

OK. I’ll use –help and see what I get.

C:\CES2009\ferg\pyapps>python 2to3.py --help
Usage: refactor.py [options] file|dir ...

Options:
  -h, --help            show this help message and exit
  -d, --doctests_only   Fix up doctests only
  -f FIX, --fix=FIX     Each FIX specifies a transformation; default: all
  -x NOFIX, --nofix=NOFIX
                        Prevent a fixer from being run.
  -l, --list-fixes      List available transformations (fixes/fix_*.py)
  -p, --print-function  Modify the grammar so that print() is a function
  -v, --verbose         More verbose logging
  -w, --write           Write back modified files
  -n, --nobackups       Don't write backups for modified files.

This is pretty straight-forward. You supply a filename (or a directory name) and 2to3 converts your Python files. So I try

C:\CES2009\ferg\pyapps>python 2to3.py easygui.py

This spews out a long list of things that were converted, showing lines that were deleted and lines that were inserted. There are a lot of conversions of the print statement to the new print function.

-       print "Running Tk version:", TkVersion
+       print("Running Tk version:", TkVersion)

I notice a few changes in the names for using Tkinter.

-from Tkinter import *
+from tkinter import *
-import tkFileDialog
+import tkinter.filedialog

That’s about it.

From the help information, I note that running 2to3 this way doesn’t actually do anything. You need the -w option to write the modified file back to disk. And from the -n option it looks like 2to3 will automatically make a backup copy of the input file unless you explicitly tell it not to. Naturally, though, I make my own backup copy, and then run 2to3 with the -w option.

C:\CES2009\ferg\pyapps>python 2to3.py -w easygui.py

Things run smoothly. I check easygui.py. The changes were made. So then I see if I can run the changed file using Python3.

C:\CES2009\ferg\pyapps>python easygui.py

and I get:

Traceback (most recent call last):
  File "easygui.py", line 1612, in
    _test()
  File "easygui.py", line 1320, in _test
    , choices=choices)
  File "easygui.py", line 703, in choicebox
    return __choicebox(msg,title,choices,buttons)
  File "easygui.py", line 824, in __choicebox
    choices.sort( lambda x,y: cmp(x.lower(), y.lower())) # case-insensitive sort
TypeError: must use keyword argument for key function

Ah. It looks like there have been changes to the sort method for sequences. So I check the docs and find that:

The sort() method takes optional arguments for controlling the comparisons. Each must be specified as a keyword argument.

key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None.

So I change

choices.sort( lambda x,y: cmp(x.lower(), y.lower()))

to

choices.sort(key=str.lower)

and try again. Bingo! Success!

I don’t think we’re home free yet, but that’s a good beginning.