Gotcha — Mutable default arguments

Goto start of series

Note: examples are coded in Python 2.x, but the basic point of the post applies to all versions of Python.

There’s a Python gotcha that bites everybody as they learn Python. In fact, I think it was Tim Peters who suggested that every programmer gets caught by it exactly two times. It is call the mutable defaults trap. Programmers are usually bit by the mutable defaults trap when coding class methods, but I’d like to begin with explaining it in functions, and then move on to talk about class methods.

Mutable defaults for function arguments

The gotcha occurs when you are coding default values for the arguments to a function or a method. Here is an example for a function named foobar:

def foobar(arg_string = "abc", arg_list = []):
    ...

Here’s what most beginning Python programmers believe will happen when foobar is called without any arguments:

A new string object containing “abc” will be created and bound to the “arg_string” variable name. A new, empty list object will be created and bound to the “arg_list” variable name. In short, if the arguments are omitted by the caller, the foobar will always get “abc” and [] in its arguments.

This, however, is not what will happen. Here’s why.

The objects that provide the default values are not created at the time that foobar is called. They are created at the time that the statement that defines the function is executed. (See the discussion at Default arguments in Python: two easy blunders: “Expressions in default arguments are calculated when the function is defined, not when it’s called.”)

If foobar, for example, is contained in a module named foo_module, then the statement that defines foobar will probably be executed at the time when foo_module is imported.

When the def statement that creates foobar is executed:

  • A new function object is created, bound to the name foobar, and stored in the namespace of foo_module.
  • Within the foobar function object, for each argument with a default value, an object is created to hold the default object. In the case of foobar, a string object containing “abc” is created as the default for the arg_string argument, and an empty list object is created as the default for the arg_list argument.

After that, whenever foobar is called without arguments, arg_string will be bound to the default string object, and arg_list will be bound to the default list object. In such a case, arg_string will always be “abc”, but arg_list may or may not be an empty list. Here’s why.

There is a crucial difference between a string object and a list object. A string object is immutable, whereas a list object is mutable. That means that the default for arg_string can never be changed, but the default for arg_list can be changed.

Let’s see how the default for arg_list can be changed. Here is a program. It invokes foobar four times. Each time that foobar is invoked it displays the values of the arguments that it receives, then adds something to each of the arguments.

def foobar(arg_string="abc", arg_list = []): 
    print arg_string, arg_list 
    arg_string = arg_string + "xyz" 
    arg_list.append("F")

for i in range(4): 
    foobar()

The output of this program is:

abc [] 
abc ['F'] 
abc ['F', 'F'] 
abc ['F', 'F', 'F']

As you can see, the first time through, the argument have exactly the default that we expect. On the second and all subsequent passes, the arg_string value remains unchanged — just what we would expect from an immutable object. The line

arg_string = arg_string + "xyz"

creates a new object — the string “abcxyz” — and binds the name “arg_string” to that new object, but it doesn’t change the default object for the arg_string argument.

But the case is quite different with arg_list, whose value is a list — a mutable object. On each pass, we append a member to the list, and the list grows. On the fourth invocation of foobar — that is, after three earlier invocations — arg_list contains three members.

The Solution
This behavior is not a wart in the Python language. It really is a feature, not a bug. There are times when you really do want to use mutable default arguments. One thing they can do (for example) is retain a list of results from previous invocations, something that might be very handy.

But for most programmers — especially beginning Pythonistas — this behavior is a gotcha. So for most cases we adopt the following rules.

  1. Never use a mutable object — that is: a list, a dictionary, or a class instance — as the default value of an argument.
  2. Ignore rule 1 only if you really, really, REALLY know what you’re doing.

So… we plan always to follow rule #1. Now, the question is how to do it… how to code foobar in order to get the behavior that we want.

Fortunately, the solution is straightforward. The mutable objects used as defaults are replaced by None, and then the arguments are tested for None.

def foobar(arg_string="abc", arg_list = None): 
    if arg_list is None: arg_list = [] 
    ...

Another solution that you will sometimes see is this:

def foobar(arg_string="abc", arg_list=None): 
    arg_list = arg_list or [] 
    ...

This solution, however, is not equivalent to the first, and should be avoided. See Learning Python p. 123 for a discussion of the differences. Thanks to Lloyd Kvam for pointing this out to me.

And of course, in some situations the best solution is simply not to supply a default for the argument.

Mutable defaults for method arguments

Now let’s look at how the mutable arguments gotcha presents itself when a class method is given a mutable default for one of its arguments. Here is a complete program.

# (1) define a class for company employees 
class Employee:
    def __init__ (self, arg_name, arg_dependents=[]): 
        # an employee has two attributes: a name, and a list of his dependents 
        self.name = arg_name 
        self.dependents = arg_dependents
    
    def addDependent(self, arg_name): 
        # an employee can add a dependent by getting married or having a baby 
        self.dependents.append(arg_name)
    
    def show(self): 
        print
        print "My name is.......: ", self.name 
        print "My dependents are: ", str(self.dependents)
#--------------------------------------------------- 
#   main routine -- hire employees for the company 
#---------------------------------------------------

# (2) hire a married employee, with dependents 
joe = Employee("Joe Smith", ["Sarah Smith", "Suzy Smith"])

# (3) hire a couple of unmarried employess, without dependents 
mike = Employee("Michael Nesmith") 
barb = Employee("Barbara Bush")

# (4) mike gets married and acquires a dependent 
mike.addDependent("Nancy Nesmith")

# (5) now have our employees tell us about themselves 
joe.show() 
mike.show() 
barb.show()

Let’s look at what happens when this program is run.

  1. First, the code that defines the Employee class is run.
  2. Then we hire Joe. Joe has two dependents, so that fact is recorded at the time that the joe object is created.
  3. Next we hire Mike and Barb.
  4. Then Mike acquires a dependent.
  5. Finally, the last three statements of the program ask each employee to tell us about himself.

Here is the result.

My name is.......:  Joe Smith 
My dependents are:  ['Sarah Smith', 'Suzy Smith']

My name is.......:  Michael Nesmith 
My dependents are:  ['Nancy Nesmith']

My name is.......:  Barbara Bush 
My dependents are:  ['Nancy Nesmith']

Joe is just fine. But somehow, when Mike acquired Nancy as his dependent, Barb also acquired Nancy as a dependent. This of course is wrong. And we’re now in a position to understand what is causing the program to behave this way.

When the code that defines the Employee class is run, objects for the class definition, the method definitions, and the default values for each argument are created. The constructor has an argument arg_dependents whose default value is an empty list, so an empty list object is created and attached to the __init__ method as the default value for arg_dependents.

When we hire Joe, he already has a list of dependents, which is passed in to the Employee constructor — so the arg_dependents attribute does not use the default empty list object.

Next we hire Mike and Barb. Since they have no dependents, the default value for arg_dependents is used. Remember — this is the empty list object that was created when the code that defined the Employee class was run. So in both cases, the empty list is bound to the arg_dependents argument, and then — again in both cases — it is bound to the self.dependents attribute. The result is that after Mike and Barb are hired, the self.dependents attribute of both Mike and Barb point to the same object — the default empty list object.

When Michael gets married, and Nancy Nesmith is added to his self.dependents list, Barb also acquires Nancy as a dependent, because Barb’s self.dependents variable name is bound to the same list object as Mike’s self.dependents variable name.

So this is what happens when mutuable objects are used as defaults for arguments in class methods. If the defaults are used when the method is called, different class instances end up sharing references to the same object.

And that is why you should never, never, NEVER use a list or a dictionary as a default value for an argument to a class method. Unless, of course, you really, really, REALLY know what you’re doing.

Gotcha — forgetting parentheses

Goto start of series

In Python, omitting the trailing parentheses from the end of a method call (one that takes no arguments) is not a syntax error. The place where this most frequently bites me is with the “close” method on file objects. Suppose you have an output file called “foo” and you want to close it. The correct way to do this is:

foo.close()

However, if you accidentally omit the trailing parentheses, and code this:

foo.close

Python will not report a syntax error, because this is not an error in Python. In Python, this is a perfectly legitimate statement that returns the method object “close”. (Remember that methods are first-class objects in Python.) If you do this in the Python interpreter, you will get a message like this:

<built-in method close of file object at 0x007E6AE0>

The nastiness about this gotcha is that if you fail to code the trailing parentheses on a “close” method for an output file, the output file will not be closed properly. The file’s output buffer will not be flushed out to disk, and the part of the output stream that was still left in the output buffer will be lost. After your program finishes, part of your output file will be missing, and you won’t know why.

The best way of dealing with this gotcha is just to be aware that it can be a problem, and to be alert. Be careful to code the parenthese on your method calls, and especially careful to code them on calls to the “close” method of file objects.

And if you find yourself with an output file that seems to be inexplicably truncated, your first thought should be to check for missing parentheses in the file.close() statement that closes the file.

Programs like PyChecker and PyLint may be able to detect this kind of error, which is one good reason to use them.

Gotcha — backslashes in Windows filenames

Goto start of series

Once upon a time there was a beautiful Windows programmer named Red Ridinghood.

One day, Red’s supervisor told her that they were going to start building a new application called GrandmasHouse. The feature list for the application was so long that they would never have attempted to get to GrandmasHouse if they hadn’t learned about a shortcut through Python Woods that would make the journey much shorter.

So Red started working her way through Python, and indeed found the going quick and easy. She loved the woods, and was happy to be traveling in them.

There was only one problem. Her programs did a lot of file manipulation, and so she had to do a lot of coding of filenames. Windows filenames used a backslash as a separator, but within Python the backslash had the magic power of an escape character, so every time she wanted a backslash in a filename she had to code two backslashes, like this:

myServer = "\\\\aServer" # ==&gt; \\aServer
myFilename = myServer + "\\aSharename\\aDirName\\aFilename"

This feature of Python got very old very quickly. Red started calling it The Wolf, and it was the one part of Python that she hated.

One day as she was walking through the forest, she came to a clearing. In the clearing was a charming little pub, and inside the pub she met a tall, dark, and handsome stranger named Rawstrings.

Rawstrings said he could save her from The Wolf. All she had to do, he said, was to put an “r” in front of her quoted string literals. This would change them from escaped strings into raw strings. The backslash would lose its powers as an escape character, and become just an ordinary character. For example, with raw strings, you could code

r"\t"

and you wouldn’t get a string contining a single tab character — you would get a string containing the backslash character followed by “t”.

So instead of coding

myServer = "\\\\aServer"

Red could just code

myServer = r"\\aServer"

Red was seduced by the things that Rawstrings was telling her, and she began to spend a lot of time in his company.

Then one day, she coded

myDirname = r"c:\aDirname\"

and her program blew up with the following message:

myDirname = r"c:\aDirname\" ^ SyntaxError: invalid token 

After some experimenting, she discovered that — contrary to what Rawstrings had told her — the backslash seemingly hadn’t lost all of its magic powers after all. For example, she could code:

aString = r"abc\"xyz"
print aString

When she did this, it seemed perfectly legal. The double-quote just before “xyz” did not close the raw string at all. Somehow the backslash seemed to protect it — it wasn’t recognized as the closing delimiter of the raw string, but was included in the string. When she coded

print aString

she got

abc\"xyz

It was this protective character that the backslash had acquired that made

myDirname = r"c:\aDirname\"

blow up. The final backslash was protecting the closing double-quote, so it was not being recognized as a closing quote. And since there was nothing after the double-quote, the raw string was not closed, and she got an error. She tried coding the raw string with two backslashes at the end — as if the backslash was an escape character —

myDirname = r"c:\aDirname\\"

but that didn’t do it either. Instead of getting the single closing backslash that she wanted, she got two backslashes at the end:

c:\aDirname\\

She was in despair. She couldn’t figure out any way to use raw strings to put a single backslash at the end of a string, and she didn’t want to have to go back to fighting The Wolf.

Fortunately, at this point she confided her troubles to Paul Woodman, a co-worker who had started exploring Python a few months earlier. Here is what he told her.

In raw strings, backslashes do not have the magical power of escape characters that they do in regular strings. But they don’t lose all of their magical powers.

In raw strings — as you discovered — backslashes have the magical power of protection characters. Basically, this means that a backslash protects any character that follows it from being recognized as the closing delimiter of the raw string.

Coming from a Windows programming background, you assumed that support for raw strings was a feature whose purpose was to make the work of coding Windows filenames easier by removing the magical escape character powers from the backslash. And you were surprised to discover that raw strings aren’t truly raw in the way that you expected — raw in the sense that the backslash had no magical powers.

The reason for the special powers of backslashes in raw strings is that — contrary to what you assumed — raw strings were not developed to make it easier for Windows programmers to code filenames containing backslash characters. In fact, raw strings were originally developed to make the work of coding regular expressions easier. In raw strings, the backslash has the magical power of a protection character because that is just the kind of behavior it needs to have in order to make it easier to code regular expressions. The feature that you can’t end a raw string with a single backslash is not a bug. It is a feature, because it is not legal to end a regular expression with a single backslash (or an odd number of backslashes).

Unfortunately for you, this power makes it impossible to create a raw string that ends in a single backslash, or in an odd number of backslashes. So raw expressions won’t do what you want them to, namely save you from The Wolf.

But don’t despair! There is a way…

In Python, there are a number of functions in the os.path module that change forward slashes in a string to the appropriate filename separator for the platform that you are on. One of these function is os.path.normpath()The trick is to enter all of your filename strings using forward slashes, and then let os.path.normpath() change them to backslashes for you, this way.

myDirname = os.path.normpath("c:/aDirname/")

It takes a bit of practice to get into the habit of specifying filenames this way, but you’ll find that you adapt to it surprisingly easily, and you’ll find it a lot easier than struggling with The Wolf.

Red was super happy to hear this. She transferred to Woodman’s project team, and they all coded happily ever after!

Gotcha — backslashes are escape characters

Goto start of series

This is a language feature that is so common on Unix that Unix programmers never think twice about it. Certainly, a Unix programmer would never consider it to be a gotcha. But for someone coming from a Windows background, it may very well be unfamiliar.

The gotcha may occur when you try to code a Windows filename like this:

myFilename = "c:\newproject\typenames.txt"
myFile = open(myFilename, "r")

and — even though the input file exists — when you run your program, you get the error message

IOError: [Errno 2] No such file or directory:
'c:\newproject\typenames.txt'

To find out what’s going on, you put in some debugging code:

myFilename = "c:\newproject\typenames.txt"
print "(" + myFilename + ")"

And what you see printed on the console is:

(c:
ewproject       ypenames.txt)

What has happened is that you forgot that in Python (as in most languages that evolved in a Unix environment) in quoted string literals the backslash has the magical power of an escape character. This means that a backslash isn’t interpreted as a backslash, but as a signal that the next character is to be given a special interpretation. So when you coded

myFilename = "c:\newproject\typenames.txt"

the “\n” that begins “\newproject” was interpreted as the newline character, and the “\t” that begins “\typenames.txt” was interpreted as the tab character. That’s why, when you printed the filename, you got the result that you did. And it is why Python couldn’t find your file — because no file with the name

c:(newline)ewproject(tab)ypenames.txt

could be found.To put a backslash into a string, you need to code two backslashes — that is, the escape character followed by a backslash. So to get the filename that you wanted, you needed to code

myFilename = "c:\\newproject\\typenames.txt"

And under some circumstances, if Python prints information to the console, you will see the two backslashes rather than one. For example, this is part of the difference between the repr() function and the str() function.

myFilename = "c:\\newproject\\typenames.txt"
print repr(myFilename), str(myFilename)

produces

'c:\\newproject\\typenames.txt' c:\newproject\typenames.txt

Escape characters are documented in the Python Language Reference Manual. If they are new to you, you will find them disconcerting for a while, but you will gradually grow to appreciate their power.

Python Gotchas

What is a “gotcha”?The word “gotcha” started out as the expression “Got you!” This is something that someone who speaks idiomatic American English might say when he succeeds in playing a trick or prank on someone else. “I really got you with that trick!”

The expression “Got you!” is pronounced “Got ya!” or “Got cha!”.

Among computer programmers, a “gotcha” has become a term for a feature of a programming language that is likely to play tricks on you to display behavior that is different than what you expect.

Just as a fly or a mosquito can “bite” you, we say that a gotcha can “bite” you.

About this Page

This is a page devoted to Python “gotchas”. Python is a very clean and intuitive language, so it hasn’t got many gotchas, but it still has a few that often bite beginning Python programmers. My hope is that if you are warned in advance about these gotchas, you won’t be bit quite so hard!

Note that a gotcha isn’t necessarily a problem in the language itself. Rather, it is a situation in which there is a mismatch between the programmer’s expections of how the language will work, and the way the language actually does work. Often, the source of a gotcha lies not in the language, but in the programmer. Part of what creates a programmer’s expectations is his own personal background. A programmer with a Windows or mainframe background, or a background in COBOL or the Algol-based family of languages (PL/1, Pascal, etc.), is especially prone to experiencing gotchas in Python, a language that evolved in a Unix environment and incorporates a number of conventions of the C family of programming languages (C, C++, Java).

If you’re such a programmer, don’t worry. There aren’t many Python gotchas. Keep learning Python. It is a great language, and you’ll soon come to love it.

Other posts about Python Gotchas

Lists of Python Gotchas