Learning Subversion: the mystery of .svn

If you are googling for “Subversion command line tutorial introduction for beginners”, read this first! This is for all Subversion newbies.

After using PVCS for many years, our office recently started moving to Subversion. Which means that recently I started trying to learn Subversion.

I was pressed for time. I was in a hurry. I was looking for something that would get me up and running quickly.

First, I got a copy of the free online Subversion documentation Version Control with Subversion.

Second, I got a copy of Mike Mason’s excellent Pragmatic Version Control Using Subversion (2nd ed.).

Third, I googled the Web looking for the kinds of things that you’d expect: Subversion tutorial introduction beginning beginners commands. And I found some good stuff.

But even after reading many of the online Subversion tutorials, I still could not grok Subversion. Different commands seemed to be doing the same thing, and the tutorials used a lot of terms that were never defined or explained: “versioned”, “unversioned”, “under version control”, and so on.

Gradually, I realized the problem. Many of the online tutorials and introductions try to explain how to use Subversion without explaining how Subversion works. They tell you what commands to issue, and when, but they don’t tell you why you are issuing the command at this particular time, or what the command is doing under the covers.

So I had to dig deeper.

What I found was that there was one particular piece of information missing from most of the tutorials and introductions that I found. If you don’t have that piece, nothing about Subversion makes much sense. With it, all of the other pieces of the puzzle fall into place.

So the purpose of this post is to tell you — the Subversion newbie — about that piece.


How Subversion Works

The basic unit of work for Subversion is a project.

A project is basically a directory.

Technically, a project is a subtree: a directory, including all of its files and subdirectories, and all of those subdirectories’ files and subdirectories, etc. But in order to keep things simple, I will talk as if a project is just a directory.

When you are working on a Subversion project, there are actually two directories that you are working with.

  • There is the repository, which is a directory (controlled by Subversion and running on a server somewhere) that contains the master copy of the project directory.
  • There is your own personal workingCopy, which is a directory (controlled by you) that exists on the file system of your own machine (that is, on the hard drive of your own PC).

But (and this is the piece that was missing) a workingCopy directory is not an ordinary directory.

The use of the expression “working copy” is one of the most confusing things about Subversion tutorials and even the Subversion documentation itself. When you encounter the expression “working copy” you assume that you are dealing with an ordinary filesystem directory that is being used to hold a copy of the files in your project. Not so!

In the context of Subversion, “working copy” is a very specific term of art — a Subversion-specific technical term. That is why in this post I avoid the expression “working copy” and instead use workingCopy.

So what is a Subversion workingCopy directory?

A workingCopy directory is a directory that has a hidden subdirectory called “.svn”.

The hidden .svn directory is what Subversion calls an “administrative directory”.

Note the leading period in “.svn”. On Unix systems, a directory whose name begins with a dot is a “hidden” (or “dotfile”) directory.

On your PC, the project’s top-level workingCopy directory has a hidden .svn subdirectory. And each of the subdirectories of the workingCopy directory (if it has any), and each of their subdirectories (if they have any), and so on, has its own hidden .svn subdirectory.

Having a hidden .svn subdirectory is what makes an ordinary file system directory into a Subversion workingCopy directory, a directory that Subversion can recognize and manage.

So, for a project named “ProjectX” the workingCopy directory will be named “ProjectX”. It might look like this:

	ProjectX [DIRECTORY]
		projectx.py
		projectx_constants.py
		.svn [DIRECTORY]

What is in a .svn subdirectory? What does a Subversion administrative directory contain?

The Subversion documentation says this about workingCopy directories:

A Subversion working copy is an ordinary directory tree on your local system, containing a collection of files. You can edit these files however you wish, and if they’re source code files, you can compile your program from them in the usual way. …

A working copy also contains some extra files, created and maintained by Subversion, to help it carry out these commands. In particular, each directory in your working copy contains a subdirectory named .svn, also known as the working copy’s administrative directory. The files in each administrative directory help Subversion recognize which files contain unpublished changes, and which files are out of date with respect to others’ work.

Here’s another clue: a passage from Pragmatic Version Control Using Subversion:

Subversion has a highly efficient network protocol and stores pristine copies of your working files locally, allowing a user to see what changes they’ve made without even contacting the server [where the central repository is stored].

So now we know what a Subversion administrative directory contains.

The .svn admin directory contains pristine (unchanged) copies of files that were downloaded from the repository. (It contains a few other things, too.)

Earlier, I said “When you are working on a Subversion project, there are actually TWO directories that you are working with… the repository and the working copy.” Now I want to change that. It would be more accurate to say that there are really THREE directories that you are working with:

  • the main ProjectX repository on the server
  • the ProjectX workingCopy directory on your PC, which contains editable (and possibly changed) copies of the files in the project …and also …
  • the hidden Subversion administrative directory, which contains a (pristine, unchanged, and uneditable) copies of the files in the main ProjectX repository on the server.

That means that, on your PC, the ProjectX workingCopy directory looks like this.

	ProjectX [DIRECTORY]
		projectx.py
		projectx_constants.py
		.svn [DIRECTORY]
			projectx.py
			projectx_constants.py

Now things start to become clearer…

Subversion introductions and tutorials often say things that are rather cryptic to someone who is trying to learn Subversion. Even HELP questions and FAQs posted on the Web can be mystifying. Now let’s see how some of those things make sense in light of our knowledge of the .svn subdirectory.


Showing file changes

The reason that Subversion can allow “a user to see what changes they’ve made without even contacting the server” is that the Subversion diff works only on the workingCopy directory on your own PC.

When Subversion shows file changes (that is, shows diffs) it is actually showing diffs between

  • your edited files in the workingCopy directory, and
  • the pristine copies of the those files that are being held in the .svn subdirectory of the workingCopy directory.

“unversioned” files vs. files “under version control”

Suppose that I make a change to one of my files: to ProjectX/projectx_constants.py.

When I make the changes, my editor automatically creates a backup file: ProjectX/projectx_constants.py.bak.

At this point, ProjectX/projectx_constants.py.bak is what is called an “unversioned” file. It exists in the ProjectX directory, but not in the ProjectX/.svn directory, so Subversion knows nothing about it. That makes sense: we don’t want projectx_constants.py.bak to be considered a project file anyway.

But suppose I want to add a new module to the project, called projectx_utils.py. If I simply create the file in the ProjectX folder, it will be an “unversioned” file in just the same way that projectx_constants.py.bak is an unversioned file: it will not exist in the ProjectX/.svn directory, so Subversion knows nothing about it.

So that is why Subversion has a “svn add” command. The command svn add projectx_utils.py will add the file to the project by copying ProjectX/project_utils.py to ProjectX/.svn/project_utils.py. At this point — after it has been added to the .svn subdirectory — the file is said to be “under version control”.

Note that — at this point — although project_utils.py has been “added” to the copy of the project in the workingCopy, the main repository still doesn’t know anything about it — project_utils.py hasn’t been added to the central repository on the server.

When I “commit” my changes, I send the files from my workingCopy to the main repository. Only after that happens does the new file truly become part of the project by becoming one of the files in the central repository.


Help! I’ve lost my .svn directory and I can’t get up!

Because a Subversion workingCopy directory needs a .svn subdirectory in order to work properly, you can have problems with Subversion if you accidentally delete the .svn subdirectory.


What is a “clean copy”?

In various tutorials, and in the Subversion docs, you will run across the expression “clean copy”. A “clean copy” is a copy of only the source-code files, without the .svn directory.

An introduction to Subversion (which is also a nice introduction to the TortoiseSVN open-source Windows GUI client for Subversion) explains things nicely.

If you look closely in your working copy, you may see an .svn folder in each folder of your working copy. The folders are hidden folders, so depending on the Windows settings you may not see them, but they are there. Those folders contain the information that Subversion uses to link your working copy to the repository.

If ever you need to get a copy of what’s in the repository, but without all the .svn folders (say for example you’re ready to publish it or hand the files over to your client), you can do an “SVN Export” into a new folder to get a “clean” copy of what’s in your repository.

Having the concept of a “clean copy” makes it easier to understand the next question…


Checkout vs. Export

A Frequently Asked Question about Subversion is What’s the difference between a “checkout” and an “export” from the repository?

The CollabNet docs say this:

They are the same except that Export doesn’t include the.svn folders and Checkout does include them. Also note that an export cannot be updated.

When you do a Subversion checkout, every folder and subfolder contains an .svn folder. These.svn folders contain clean copies of all files checked out and .tmp directories that contain temporary files created during checkouts, commits, update and other operations.

An Export will be about half the size of a Checkout due to the absence of the.svn folders that duplicate all content.

Note that the reason an exported folder cannot be updated is that the update command updates the .svn directory of a workingCopy, but an export does not create an .svn directory.

Note also that you can export from either the main repository or from the workingCopy .svn directory. See Subversion docs for export.


The (import, checkout) usage pattern for getting started with Subversion

Most “getting started with Subversion” tutorials start the same way. Assuming that you have some project files that you want to put into Subversion, you are told to:

  • do an import
  • do a checkout

in that order.

What you are not told is why you start with those two particular actions in that particular order.

But by now, knowing about the hidden .svn administrative directory and what it does, you can probably figure that out.

Import is the opposite of export. It takes a directory of files — a clean copy of the files, if you will — from your hard drive and copies them into the central Subversion repository on the server.

Always the next step is to do a checkout. Basically a checkout copies the project files from the central repository to a workingCopy directory on your PC. If the workingCopy directory does not exist on your PC, it is created.

The workingCopy directory contains everything you need in order to be able to work with Subversion, including an .svn administrative directory. As the CollabNet documentation (quoted earlier) says:

When you do a Subversion checkout, every folder and subfolder contains an .svn folder. These.svn folders contain clean copies of all files checked out and .tmp directories that contain temporary files created during checkouts, commits, update and other operations.

So the second step — the checkout command — is absolutely necessary in order to get started. It creates a workingCopy directory containing the project files. Only after that happens are your files properly “under version control”.


checkin vs. commit

PVCS (and SourceSafe, and many other version control systems) work on a locking model. “Checking out” a file from the repository means that you get a local working copy of the file, and you lock the file in the repository. At that point, nobody can unlock it except you. Checking out a file gives you exclusive update privileges on it until you check it back in.

“Checking in” a file means that you copy your local working copy of the file back into the repository and you unlock the file in the repository.

It is possible to copy your local working copy of the file into the repository without unlocking the file in the repository. When you do this, you are in a sense “updating” the repository from the working copy.

Because of my familiarity with this kind of version control, I had a certain “mental model” of how a version control system works. And because of that mental model, many of the Subversion tutorials were quite confusing.

One source of confusion is the fact that (as we will see in the next section) the word “updating” in the context of Subversion means exactly the opposite of what it means in the context of PVCS.

One of the Subversion tutorials that I found said that you must checkout your workingCopy from the main repository, because you can’t do a checkin back to the main repository if you hadn’t checked it out. This was very confusing to an ex-PVCS user.

First, it suggested that Subversion works like PVCS: that there is a typical round-trip usage pattern consisting of

  • checking out (locking)
  • editing
  • checking in (unlocking)

But Subversion doesn’t work like this, at least not by default.

What the tutorial was trying to say, I think, was that in order to work with Subversion, you must create a workingCopy directory (that is, a directory that contains an .svn administrative subdirectory). And the way to create a workingCopy directory is to run a svn checkout command against the repository on the server.

Second, explaining things this way was confusing because Subversion doesn’t really have a checkin command. It does have a commit command, which some tutorials call a “checkin” command. But that command does not do the same thing as a PVCS checkin.

Ignore the fact that the short form of the commit command is ci (which stood for “checkin” in an earlier incarnation of Subversion). A Subversion “checkin” is the same thing as a “commit”, and has nothing to do with locking. It would really be helpful if all Subversion tutorials would stop using the term “checkin” and replace it with “commit”.

If you are used to working with a VCS that uses the “check out, edit, check in” paradigm, and you come to understand that Subversion’s commit is not the same as your old familiar check in, then your next question will almost certainly be:

Once you checkout a project into a working folder, how do you check it in a la SourceSafe? [Or PVCS, or other lock-based VCSs? — Steve Ferg]

I know there is “commit” which puts my changes into the respository, but I still have the files checked out under my working folder. What if I am done with a particular file and I don’t want to have it checked out? How do I check it back in?

You can read the answer here.


What does svn update do?

EXECUTIVE SUMMARY: svn update updates the workingCopy, not the repository.

The Subversion docs describe the update command this way:

When working on a project with a team, you’ll want to update your working copy to receive any changes other developers on the project have made since your last update. Use svn update to bring your working copy into sync with the latest revision in the repository:

Basically, what the update command does is to copy the project files from the central repository down to the .svn directory in your workingCopy.

This is something you should do frequently, because you don’t want the files in your workingCopy/.svn directory to get too far out of sync with the file in the central repository. And you don’t want to try to commit files if your workingCopy/.svn is out of sync with the central repository.

That means that as a general rule, you should always run an svn update:

  • just before you start making a new round of changes to your workingCopy, and
  • just before doing a commit.

Now, having mastered the concept of an .svn directory, we can Understand Many Things, even arcana such as why Serving websites from svn checkout considered harmful.

So that’s it.

This post contains information written by a Subversion newbie in the hopes that it will be useful to other Subversion newbies. But of course, having been written by a newb, there are all sorts of ways it could be wrong.

If you’re a Subversion expert (and it doesn’t take much to be more expert than I am) and you see something wrong, confused, or misleading here, please leave a comment. I, and future generations of Subversion newbies, will thank you for it.

Thanks to my co-workers Mark Thomas and Jason Herman for reviewing an earlier draft of this post.

How to fix a programmable Northgate keyboard

After my earlier post about Northgate keyboard repair it occurred to me that this information might be useful. I don’t think it can be found anywhere else on the Web.

Note that in the following slideshow (showing the repair of an Evolution keyboard) you can mouse-over the image. Controls will pop up that allow you to pause the show and to step forward and backward.

This slideshow requires JavaScript.

When programmable keyboards go bad

A while ago, one of my Northgate keyboards seemed spontaneously to sustain some kind of brain injury. A number of the keys seemed to have gone haywire. The left shift key didn’t work and several pairs of keys seemed to have exchanged places.

I talked with Bob Tibbetts of Northgate Keyboard repair (http://www.northgate-keyboard-repair.com/) and he explained the situation. Here is what I learned.

The Northgates are programmable keyboards — they contain a programmable chip. They were designed so that certain key combinations (e.g. pressing the left shift key four times) puts the keyboard (that is, the programmable chip) into programming mode.

Unfortunately the programmable chip had software that worked only with Windows 98 and earlier. If you are using a Northgate keyboard with any other system, the programmable chip is basically a bad chip and should be removed. (Bob noted that he removes the chip from any keyboards that he sells.)

Fixing the problem is a two-step process. First you “reboot” the keyboard into non-progamming mode, then you remove the chip.

You can just reboot the keyboard without removing the chip, of course, and that will fix the immediate problem. But as long as the programmable chip is still in the keyboard, similar problems can occur again at any time.

How to “reboot” the programmable keyboard

Shut the computer down. Don’t just a log off or do a “soft” reboot. Power off.

Press the ESCAPE (ESC) key down and hold it down while you power up your PC. Do not release the ESC key until the computer beeps at you, or you have to do something like entering a password.

This should make the keyboard work normally. (If it doesn’t, then the problem was something other than the programmable chip.)

The anatomy of an Evolution keyboard

Working with Evolution keyboards is tricky because the Evolutions have the little GlidePoint touchpad in the middle of the top of the keyboard. There are short cables that go from the GlidePoint touchpad in the upper part of the keyboard to the “motherboard” in the bottom part of the keyboard.

Basically, the GlidePoint cables act as a sort of tether between the upper and lower halves of the keyboard. The cables are short, and virtually impossible to re-attach if you pull them loose. So you have to be careful not to pull them loose.

How to remove the programmable chip from an Evolution keyboard

First, make sure you have read “The anatomy of an Evolution keyboard” (above). Then …

“Reboot” the keyboard (see the instructions given above), then shut down (power off) your PC.

Turn the keyboard over, so that you are looking at the bottom of the keyboard.

Take the six screws (the ones holding the upper and lower parts of the keyboard together) out of the keyboard.

Turn the keyboard over, so that it is face up and you are looking at the keys.

DO NOT lift the top off of the keyboard.

Well, you can lift it a little. 

In the slideshow, you can see the top of the keyboard sitting on a little green box that lifts it about 2.75 inches (7 cm).  You can see the GlidePoint cables running from the touchpad in the top of the keyboard to the motherboard in the bottom of the keyboard. Those are the cables that you don’t want to disturb.

Lift the top half of the keyboard just enough to free it from the bottom half, then rotate the top clockwise about 4 or 5 inches, just enough to expose the programmable chip. Rotate the top using the location of the touchpad as the pivot point — that way you will disturb the touchpad cables at little as possible.

On the top right-hand side, locate the programmable chip. It is a small chip about 1/4″ x 3/8″ with 24C16 embossed on it.

Take a small screwdriver and pry the chip out. When you do this, you may break a few of the prongs that hold the chip to the motherboard. That’s OK. Bob Tibbetts suggested using a jeweler’s screwdriver. I used a small (but long) electrician’s screwdriver. I also found that once I had the chip lifted up, but not completely free of the motherboard, a needle-nose pliers was perfect for the final removal.

Around the edges of the chip socket, carefully cut off any remaining prongs. The goal is to leave no prongs sticking up that might touch each other or anything else. I think a “side cutter” pliers would be too big for this job. Something like a toenail clipper might be about right. I had only one prong left stuck in the motherboard, and I gently twisted it off with the needle-nose pliers.

Carefully lower the top of the keyboard back down onto the lower part.

Carefully turn the keyboard over, making sure to keep the two halves of the keyboard together.

Put the screws back in.

You’re done!

How to remove the programmable chip from a non-Evolution programmable keyboard

For other programmable Northgate keyboard models (models ending in a P for “programmable”) — 101P, 102P, Ultra TP and Ultra P — you can use basically the same procedure as described above for the Evolution.

The difference is that non-Evolution keyboards don’t have the GlidePoint touchpad embedded in the top of the keyboard. That means that you don’t need to worry about the GlidePoint cables, so you can lift the keyboard top completely off in order to access the programmable chip.

Northgate keyboard repair

The best computer keyboards ever made (even when compared to the original IBM model M keyboards) were the Northgate Omnikey keyboards.  They were heavy keyboards built like tanks, featuring buckling spring key-switches notable for their distinctive clicking as you typed.  These were real keyboards — no crappy “rubber dome” key switches allowed.

Omnikey Ultra keyboard

Omnikey Ultra keyboard

I used only Northgate Omnikey Ultras for years, lugging them from job to job like an itinerant medieval carpenter carrying his tools with him from town to town, and using special keyboard plug adapters when keyboard plug design evolved first to PS/2 and then to USB.

But tools get worn and dirty and a few years ago my Ultras were terminally filthy and starting to fail.  That was when, thanks to the twin miracles of the Web and Google, I found Bob Tibbetts and his Northgate Keyboard Repair web site.  Bob belongs to the school of minimalist website design, but his keyboard expertise and repair skills are totally maximal, and he really saved my bacon keyboards.   He also, in a manner of speaking, saved my wrists.

After 25 years of coding, the joints in my hands and wrists were starting to protest.  I switched from using a mouse to a using a trackball (I prefer a Logitech Cordless Optical Trackman), and that helped a lot.   Carpal tunnel syndrome forced a friend of mine to retire on disability and put The Fear into me.  A bout of online research convinced me that we really need more ergonomic keyboards, so I went shopping for one. 

The major feature of an ergonomic keyboard is a split design in which the left and right halves of the keyboard  are split apart, separated by a few inches, and angled slightly so that you can type without bending your wrists.  The result is a keyboard that is shaped like a V rather than like a straight unbroken line. In a sense, the keyboard is bent so your wrists don’t have to be.

Image of Northgate Evolution keyboard

Northgate Evolution keyboard

What I really wanted, of course, was an ergonomic version of the Omnikey Ultra. 

One day, in an email to Bob, I mentioned that although I loved my Ultras (one of which Bob was cleaning and repairing at the time), what I really wished for was an ergonomic V-shaped version of the Ultra. 

Well, I nearly fell off my chair when Bob told me that such a thing actually existed.  It was called the Omnikey Evolution keyboard.  Evolutions were very advanced for their time, and very few were made.  But a few — new in the box — still existed, and he had a few for sale.

I immediately ordered one, tried it out, and loved it.  It is my favorite keyboard ever.  So I followed my Mom’s tongue in cheek advice (“Get ’em before the hoarders do.”) and got more.  I now own 5 — one for work, one for my home Vista machine, one for my home Linux machine, and two backups.

As I type this, it is almost midnight on March 11, 2011, and Bob has only 3 Evolution keyboards left. 

The good news is that if you have a beloved old Northgate that is showing its age, Northgate Keyboard Repair is still in the business of cleaning and repairing Northgate keyboards.

Finally, if you’re looking to purchase a keyboard with buckling spring key switches, you might check out the Customizer line of keyboards at pckeyboards.com.  It is a reincarnation of the original IBM model M.

And keep on clicking…

## updated January 1, 2012

An alternative to string interpolation

I sort of like this.

# ugly
msg = "I found %s files in %s directories" % (filecount,foldercount)

# better
def Str(*args): return "".join(str(x) for x in args)
:
:
msg = Str("I found ", filecount, " files in ", foldercount, " directories" )

You don’t have to call it “Str”, of course.

A Globals Module pattern

Two comments on my recent posts on a Globals Class pattern for Python and an Arguments Container pattern reminded me that there is one more container for globals that is worth noting: the module.

The idea is a simple one. You can use a module as a container.

Most introductions to Python tell you all about how to get stuff — that is, how to import stuff — *from* imported modules. They talk very little about writing stuff *to* imported modules. But it can be done.

Here is a simple example.

Let’s start with the intended container module, mem.py. I’d show you the contents of mem.py, except for the fact that there aren’t any. mem.py is empty.

Next let’s look at two modules that import and use mem.py.

The caller module is leader.py. Note that it imports mem and also imports the subordinate module, minion.  (Note the use of the print() function; we’re running Python 3 here.)

"leader.py"
import mem
import minion

mem.x = "foo"
print("leader says:",mem.x)
minion.main()
print("leader says:",mem.x)

print()

mem.x = "bar"
print("leader says:",mem.x)
minion.main()
print("leader says:",mem.x)

The subordinate module is minion.py.

"minion.py"
import mem

def main():
	print("minion says:",mem.x)
	mem.x = "value reset by minion from " + mem.x

If you run leader.py it imports minion and mem, and uses mem as a container for variable x.  It assigns a value to x in mem and calls minion, which reads mem.x and resets mem.x’s value, which leader then reads.

When you run leader.py, you see this output:

leader says: foo
minion says: foo
leader says: value reset by minion from foo

leader says: bar
minion says: bar
leader says: value reset by minion from bar

Note that leader.py passes no arguments to minion.main() and minion.main() doesn’t return anything (other than None, of course). Leader and minion communicate solely by means of the variables set in mem. And the communication is clearly two-way. Leader sets values that minion reads, and minion sets values that leader reads.

So what we have here, in mem, is a truly global container. It is not “module global” as in the Globals Class pattern. It is “application global” — it is global across the multiple modules that make up an application.  In order to gain access to this container, modules simply import it.

In keeping with the earlier posts’ grandiosity, I will call this use of an imported module the Globals Module pattern.

Every Python programmer is familiar with one special case of the Globals Module pattern. Just rename mem.py to config.py, stuff it with a bunch of constants or configuration variables, and you have a typical Python file for defining constants or setting configuration values. These values are “application global”, available to all module in an application. All they have to do is to import config.py.

Doing a bit of arm-waving, and christening a Globals Module pattern, does one thing.  It reminds us that modules — used as containers for “application global” values — aren’t limited to supplying constants and pre-set values. Modules can also be written to.  The communication between “normal” modules and Globals Modules is a two-way street.

An Arguments Container pattern

In a comment on my earlier post A Globals Class pattern for Python, Mike Müller wrote
“No need for globals. Just explicitly pass your container. In my opinion this is much easier to understand.”

Mike’s comment led me to some further thoughts on the subject.

Suppose you have a number of things — x, y, and z — that you want to make available to many functions in a module.

There are four strategies that you could use. You could

1. pass x, y, and z as individual arguments
2. make x, y, and z globals

or you could create a container C of some sort and

3. pass container C as an argument
4. make container C a global

So you have two basic questions to answer. When you make the things — x, y, and z — available:

A. Do you make them available in global variables, or in arguments that you pass around?

B. Do you make them available individually, or do you put them in some kind of container and make the container available?

My original post assumed that in at least some situations you might answer question A with “use global variables” and then went on to propose that in those situations the best answer to B is “put them in a container”.

Since the point of that post was to point out the usefulness of a class as a container, I called the proposed pattern the Globals Class pattern. But in most cases some other kind of container would do as well as a class. I could almost as easily have called the pattern the Globals Container pattern.

So if you look at these two questions — A and B — I think it is interesting where Mike and I differ, and where we agree.

Question A: args or globals

Where we differ, if you could call it that, is in the answer to A.

Mike wrote “No need for globals. Just explicitly pass your container. In my opinion this is much easier to understand.”

In my post I wrote “Sometimes globals are the best practical solution to a particular programming problem.” But that wasn’t really what the post was about. It was about the answer to question B.

So I can’t really say that Mike and I disagree very much. He says “I like apples”. I say “Sometimes I like an orange.”  No big deal.

Question B — multiple things or a single container

What is much more interesting is that we both agree on the answer to question B: use a container object.

But since I was talking about globals, I was talking about a container for globals.  Since Mike was talking about arguments, he was talking about a container for arguments.

Which means that we have two different patterns. My earlier post was about strategy 4 — a Globals Container pattern. Mike is talking about strategy 3 — what we might call an Arguments Container pattern.

As it happens, I had stumbled onto the Arguments Container pattern myself, not in Python but in Java. The circumstances were very similar to the circumstances that led to the Python Globals Class pattern. I had a lot of variables that I needed to pass around. As the code evolved,the argument lists got longer and harder to manage. Finally I just bundled all of the variables into a single container object and passed the container around. As I needed to add new arguments, I was able to add them to just one place — the container.

At the time, I felt sort of stupid doing this. I hadn’t ever heard of this as a programming technique.  It smacked of sneaking global variables in through the back door, and of course everybody knows that globals are always bad. But it worked, and it made my life a lot easier.

So now Mike comes along and proposes doing exactly the same thing. I feel relieved. I’m not the only one doing this. It may even be a Good Thing.

So I’m happy to announce — not the discovery, certainly — the christening of the Arguments Container pattern, which says, basically:

Sometimes when you have a lot of individual variables that you need to pass around to a lot of different functions or methods, the best solution is to put them into a container object and just pass the container object around.

This is not a specifically Python pattern. And in a way it is No Big Deal. But I’m doing a bit of shouting and arm-waving here because I think that somewhere there is probably at least one person for whom this post might be useful.

A Globals Class pattern for Python

I’ve gradually been evolving a technique of coding in which I put module globals in a class. Recently I stumbled across Norman Matloff’s Python tutorial in which he recommends doing exactly the same thing, and it dawned on me that this technique constitutes a truly idiomatic Python design pattern.

A pattern needs a short, catchy name. I don’t think anyone has yet given this pattern a name, so I will propose the name Globals Class pattern.

I’m sure that many experienced Python programmers are already quietly using the Globals Class pattern. They may not see much point in making a big deal about it, or in giving it a name and decking it out with the fancy title of “design pattern”. But I think a little bit of hoopla is in order. This is a useful technique, and one worth pointing out for the benefit of those who have not yet discovered it.  A bit of cheering and arm-waving is in order, simply to catch some attention.

The technique is extremely simple.

  • You define a class at the beginning of your module.  This makes the class global.
  • Then, all of the names that you would otherwise declare global, you specify as attributes of the class.

Really, there is virtually nothing class-like about this class; for instance, you probably will never instantiate it. Instead of functioning like a true class, it functions as a simple container object.

I like to use the name “mem” (in my mind, short for “GlobalMemory”) for this class, but of course you can use any name you prefer.

All you really need is a single line of code.

        class mem: pass

That is enough to create your mem container. Then you can use it wherever you like.

        def doSomething():
            mem.counter = 0
            ...
        def doMore():
            mem.counter += 1
            ...
        def doSomethingElse():
            if mem.counter > 0:
                ...

If you wish, you can initialize the global variables when you create the class. In our example, we could move the initialization of mem.counter out of the doSomething() function and put it in the definition of the mem class.

        class mem:
            counter = 0

In a more elaborate version of this technique, you can define a Mem class, complete with methods, and make mem an instance of the class. Sometimes this can be handy.

        class Mem:
            def __init__(self):
                self.stupidErrorsCount = 0
                self.sillyErrorsCount  = 0

            def getTotalErrorsCount(self):
                return self.stupidErrorsCount + self.sillyErrorsCount

        # instantiate the Mem class to create a global mem object
        mem = Mem()

What’s the point?

So, what does the Globals Class pattern buy you?

1. First of all, you don’t have to go putting “global” statements all over your code.   The beauty of using a globals class is that you don’t need to have any “global” statements in you code.

There was a time — in the past, when I still used “global” — when I might find myself in a situation where my code was evolving and I needed to create more and more global variables. In a really bad case I might have a dozen functions, each of which declared a dozen global variables. The code was as ugly as sin and a maintenance nightmare.  But the nightmare stopped when I started putting all of my formerly global variables into a global class like mem.  I simply stopped using “global” and got rid of all those “global” statements that were cluttering up my code. 

So the moral of my story is this.  Kids, don’t be like me.  I started out using “global” and had to change.  I’m a recovering “global” user. 

Don’t you even start.  Skip the section on the “global” keyword in your copy of Beginners Guide to Learning Python for Dummies.  Don’t use “global” at all.  Just use a globals class.

2. I like the fact that you can easily tell when a variable is global simply by noticing the mem. modifier.

3. The globals statement is redundant.  The Globals Class pattern relieves us of of the burden of having to worry about it.

Python has the quirk that if X is a global, and a function only reads X, then within the function, X is global. But if the function assigns a value to X, X is treated as local.

So suppose that — as your code evolves — you add an assignment statement deep in the bowels of the function. The statement assigns a value to X. Then you have — as a side-effect of the addition of that statement — converted X (within the scope of the function) from a global to a local.

You might or might not want to have done that.  You might not even realize what you’ve done.   If you do realize what you’ve done, you probably need to add another statement to the function, specifying that X is global.  That is sort of a language wart. If you use the Globals Class pattern, you avoid that wart.

4. I think the use of the Globals Class pattern makes the work of static code analyzers (e.g. PyFlakes) easier.

5. The Globals Class pattern makes it possible to create multiple, distinct groups of globals.

This can be useful sometimes. I have had modules that processed nested kinds of things: A, B, and C. It was helpful to have different groups of globals for the different kinds of things.

        class memA: pass
        class memB: pass
        class memC: pass

6. Finally, the Globals Class pattern makes it possible to pass your globals as arguments.

I have had the situation where a module grew to the point where it needed to be split into two modules. But the modules still needed to share a common global memory. With the Globals Class pattern, a module’s globals are actually attributes of an object, a globals class.  In Python, classes are first-class objects.  That means that a globals class can be passed — as a parameter — from a function in one module to a function in another module.

Is this really A Good Thing?

At this point I can hear a few stomachs churning. Mine is one of them. Because, as we all know, Global Variables are Always a Bad Thing.

But that proposition is debatable.  In any event, it is an issue that I’m not going to explore here.  For now, I prefer to take a practical, pragmatic position:

  • Sometimes globals are the best practical solution to a particular programming problem.
  • For the occasions when Globals are A Good Thing, it is handy to have a way to Do Globals in A Good Way.

So the bottom line for me is that there are occasions when some kind of globals-like technique is the best tool for the job.  And on those occasions the Globals Class pattern is a better tool for the job than globals themselves.