Gotcha — backslashes in Windows filenames

Goto start of series

Once upon a time there was a beautiful Windows programmer named Red Ridinghood.

One day, Red’s supervisor told her that they were going to start building a new application called GrandmasHouse. The feature list for the application was so long that they would never have attempted to get to GrandmasHouse if they hadn’t learned about a shortcut through Python Woods that would make the journey much shorter.

So Red started working her way through Python, and indeed found the going quick and easy. She loved the woods, and was happy to be traveling in them.

There was only one problem. Her programs did a lot of file manipulation, and so she had to do a lot of coding of filenames. Windows filenames used a backslash as a separator, but within Python the backslash had the magic power of an escape character, so every time she wanted a backslash in a filename she had to code two backslashes, like this:

myServer = "\\\\aServer" # ==> \\aServer
myFilename = myServer + "\\aSharename\\aDirName\\aFilename"

This feature of Python got very old very quickly. Red started calling it The Wolf, and it was the one part of Python that she hated.

One day as she was walking through the forest, she came to a clearing. In the clearing was a charming little pub, and inside the pub she met a tall, dark, and handsome stranger named Rawstrings.

Rawstrings said he could save her from The Wolf. All she had to do, he said, was to put an “r” in front of her quoted string literals. This would change them from escaped strings into raw strings. The backslash would lose its powers as an escape character, and become just an ordinary character. For example, with raw strings, you could code

r"\t"

and you wouldn’t get a string contining a single tab character — you would get a string containing the backslash character followed by “t”.

So instead of coding

myServer = "\\\\aServer"

Red could just code

myServer = r"\\aServer"

Red was seduced by the things that Rawstrings was telling her, and she began to spend a lot of time in his company.

Then one day, she coded

myDirname = r"c:\aDirname\"

and her program blew up with the following message:

myDirname = r"c:\aDirname\" ^ SyntaxError: invalid token 

After some experimenting, she discovered that — contrary to what Rawstrings had told her — the backslash seemingly hadn’t lost all of its magic powers after all. For example, she could code:

aString = r"abc\"xyz"
print aString

When she did this, it seemed perfectly legal. The double-quote just before “xyz” did not close the raw string at all. Somehow the backslash seemed to protect it — it wasn’t recognized as the closing delimiter of the raw string, but was included in the string. When she coded

print aString

she got

abc\"xyz

It was this protective character that the backslash had acquired that made

myDirname = r"c:\aDirname\"

blow up. The final backslash was protecting the closing double-quote, so it was not being recognized as a closing quote. And since there was nothing after the double-quote, the raw string was not closed, and she got an error. She tried coding the raw string with two backslashes at the end — as if the backslash was an escape character —

myDirname = r"c:\aDirname\\"

but that didn’t do it either. Instead of getting the single closing backslash that she wanted, she got two backslashes at the end:

c:\aDirname\\

She was in despair. She couldn’t figure out any way to use raw strings to put a single backslash at the end of a string, and she didn’t want to have to go back to fighting The Wolf.

Fortunately, at this point she confided her troubles to Paul Woodman, a co-worker who had started exploring Python a few months earlier. Here is what he told her.

In raw strings, backslashes do not have the magical power of escape characters that they do in regular strings. But they don’t lose all of their magical powers.

In raw strings — as you discovered — backslashes have the magical power of protection characters. Basically, this means that a backslash protects any character that follows it from being recognized as the closing delimiter of the raw string.

Coming from a Windows programming background, you assumed that support for raw strings was a feature whose purpose was to make the work of coding Windows filenames easier by removing the magical escape character powers from the backslash. And you were surprised to discover that raw strings aren’t truly raw in the way that you expected — raw in the sense that the backslash had no magical powers.

The reason for the special powers of backslashes in raw strings is that — contrary to what you assumed — raw strings were not developed to make it easier for Windows programmers to code filenames containing backslash characters. In fact, raw strings were originally developed to make the work of coding regular expressions easier. In raw strings, the backslash has the magical power of a protection character because that is just the kind of behavior it needs to have in order to make it easier to code regular expressions. The feature that you can’t end a raw string with a single backslash is not a bug. It is a feature, because it is not legal to end a regular expression with a single backslash (or an odd number of backslashes).

Unfortunately for you, this power makes it impossible to create a raw string that ends in a single backslash, or in an odd number of backslashes. So raw expressions won’t do what you want them to, namely save you from The Wolf.

But don’t despair! There is a way…

In Python, there are a number of functions in the os.path module that change forward slashes in a string to the appropriate filename separator for the platform that you are on. One of these function is os.path.normpath()The trick is to enter all of your filename strings using forward slashes, and then let os.path.normpath() change them to backslashes for you, this way.

myDirname = os.path.normpath("c:/aDirname/")

It takes a bit of practice to get into the habit of specifying filenames this way, but you’ll find that you adapt to it surprisingly easily, and you’ll find it a lot easier than struggling with The Wolf.

Red was super happy to hear this. She transferred to Woodman’s project team, and they all coded happily ever after!

About these ads

11 thoughts on “Gotcha — backslashes in Windows filenames

  1. Just what I needed for some scripts that have to run on Windows and Linux.

    Except my scripts have to run from inside a jython implementation which doesn’t have visibility of either ‘os’ or ‘javaos’ when running.

    There’s one in the eye for compatibility.

  2. RE: “Except my scripts have to run from inside a jython implementation which doesn’t have visibility of either ‘os’ or ‘javaos’ when running.”

    Jython itself does support os.path. But some earlier implementations were broken or just crappy. See for example this post on Nabble.

    Maybe you can upgrade your version of jython.

  3. Re: “Jython itself does support os.path.”

    The os.class module in the jython.jar file included with the Grinder is inaccessible.

  4. Couldn’t you use r”’c:\path\”’ instead? I don’t have a windows machine, so I can’t test.

  5. Actually, you do not need to go through all of this at all.
    Slashes or backslashes, Python will use correct one for you based on
    the system it runs on:

    os.path.join(“c:”, “aDirname”, “”)

    will correctly produce c:\\aDirName\\ string

  6. Maybe I’m confused/lucky, but why use backslashes at all? Not only do all windows shells seem to handle forward slashes just fine, but my python shells like it, too.

  7. The file path is really stuipid in windows. I would replace every backslash to ‘/’. Or something would go worng. I guess is not a good way.

  8. An entertaining and well-written post. However, on my system (Win7-64 using Python 2.7), I can code the following and it works fine:
    code imgDir = “s:/imgTemp/”
    imgName = (imgDir +”testImage2.jpg”)

Comments are closed.