Gotcha — backslashes are escape characters

Goto start of series

This is a language feature that is so common on Unix that Unix programmers never think twice about it. Certainly, a Unix programmer would never consider it to be a gotcha. But for someone coming from a Windows background, it may very well be unfamiliar.

The gotcha may occur when you try to code a Windows filename like this:

myFilename = "c:\newproject\typenames.txt"
myFile = open(myFilename, "r")

and — even though the input file exists — when you run your program, you get the error message

IOError: [Errno 2] No such file or directory:
'c:\newproject\typenames.txt'

To find out what’s going on, you put in some debugging code:

myFilename = "c:\newproject\typenames.txt"
print "(" + myFilename + ")"

And what you see printed on the console is:

(c:
ewproject       ypenames.txt)

What has happened is that you forgot that in Python (as in most languages that evolved in a Unix environment) in quoted string literals the backslash has the magical power of an escape character. This means that a backslash isn’t interpreted as a backslash, but as a signal that the next character is to be given a special interpretation. So when you coded

myFilename = "c:\newproject\typenames.txt"

the “\n” that begins “\newproject” was interpreted as the newline character, and the “\t” that begins “\typenames.txt” was interpreted as the tab character. That’s why, when you printed the filename, you got the result that you did. And it is why Python couldn’t find your file — because no file with the name

c:(newline)ewproject(tab)ypenames.txt

could be found.To put a backslash into a string, you need to code two backslashes — that is, the escape character followed by a backslash. So to get the filename that you wanted, you needed to code

myFilename = "c:\\newproject\\typenames.txt"

And under some circumstances, if Python prints information to the console, you will see the two backslashes rather than one. For example, this is part of the difference between the repr() function and the str() function.

myFilename = "c:\\newproject\\typenames.txt"
print repr(myFilename), str(myFilename)

produces

'c:\\newproject\\typenames.txt' c:\newproject\typenames.txt

Escape characters are documented in the Python Language Reference Manual. If they are new to you, you will find them disconcerting for a while, but you will gradually grow to appreciate their power.

About these ads

4 thoughts on “Gotcha — backslashes are escape characters

  1. I’m not really sure that I would pose this as a Windows versus Unix background. I know that several years ago on Windows a recommended practice was to always use forward slashes inside path names in source code. In fact this is required practice in standard C for #include directives, although many if not all compilers for Windows do allow backslash characters. So, I would really call this a C language background bias even though obviously C was developed for Unix. I’m not sure if all of the Unix shells and utilities of the time got backslash escapes from C or vice versa.

    For what it is worth, using forward slashes in path names apparently works in Python just as well.

    In either case, this is still an issue for new programmers with no prior programming experience, which I believe has always been a focus of concern for Python.

  2. Try using 'raw' strings, e.g., r'C:\somefile.txt' or r'([A-Z]+)\.\1'

    Raw strings treat \'s as \'s, making them well adapted for windows filenames and regular expressions.

  3. >>>s='\\'
    >>>s
    '\\'
    >>>len(s)
    1
    >>>print s
    '\'
    >>>ss='s="\\"'
    >>>exec(ss)
    Traceback (most recent call last):
      File "", line 1, in 
      File "", line 1
        s="\"
            ^
    

    The escape character has been correctly escaped, and is now in string ss. However, when we exec(ss), we need to escape the escape character again, to get an escaped escaped escape character:

    >>>exec('s="\\\\"')
    >>>len(s)
    1
    >>>print s
    \

    Generally speaking, a generic text string that you have in your program will probably already be in its correct representation: the exception is raw input characters, which you would normally parse and decode before feeding into exec() anyway. So the safe use of exec() is:

    >>>exec(mystring.encode("string_escape")

    or, from my example:

    >>>exec('s="\\"'.encode("string_escape"))

    This allows for the implicit .decode(“string_escape”) that exec() does before executing the string.

Comments are closed.