Learning Subversion: the mystery of .svn

If you are googling for “Subversion command line tutorial introduction for beginners”, read this first! This is for all Subversion newbies.

After using PVCS for many years, our office recently started moving to Subversion. Which means that recently I started trying to learn Subversion.

I was pressed for time. I was in a hurry. I was looking for something that would get me up and running quickly.

First, I got a copy of the free online Subversion documentation Version Control with Subversion.

Second, I got a copy of Mike Mason’s excellent Pragmatic Version Control Using Subversion (2nd ed.).

Third, I googled the Web looking for the kinds of things that you’d expect: Subversion tutorial introduction beginning beginners commands. And I found some good stuff.

But even after reading many of the online Subversion tutorials, I still could not grok Subversion. Different commands seemed to be doing the same thing, and the tutorials used a lot of terms that were never defined or explained: “versioned”, “unversioned”, “under version control”, and so on.

Gradually, I realized the problem. Many of the online tutorials and introductions try to explain how to use Subversion without explaining how Subversion works. They tell you what commands to issue, and when, but they don’t tell you why you are issuing the command at this particular time, or what the command is doing under the covers.

So I had to dig deeper.

What I found was that there was one particular piece of information missing from most of the tutorials and introductions that I found. If you don’t have that piece, nothing about Subversion makes much sense. With it, all of the other pieces of the puzzle fall into place.

So the purpose of this post is to tell you — the Subversion newbie — about that piece.


How Subversion Works

The basic unit of work for Subversion is a project.

A project is basically a directory.

Technically, a project is a subtree: a directory, including all of its files and subdirectories, and all of those subdirectories’ files and subdirectories, etc. But in order to keep things simple, I will talk as if a project is just a directory.

When you are working on a Subversion project, there are actually two directories that you are working with.

  • There is the repository, which is a directory (controlled by Subversion and running on a server somewhere) that contains the master copy of the project directory.
  • There is your own personal workingCopy, which is a directory (controlled by you) that exists on the file system of your own machine (that is, on the hard drive of your own PC).

But (and this is the piece that was missing) a workingCopy directory is not an ordinary directory.

The use of the expression “working copy” is one of the most confusing things about Subversion tutorials and even the Subversion documentation itself. When you encounter the expression “working copy” you assume that you are dealing with an ordinary filesystem directory that is being used to hold a copy of the files in your project. Not so!

In the context of Subversion, “working copy” is a very specific term of art — a Subversion-specific technical term. That is why in this post I avoid the expression “working copy” and instead use workingCopy.

So what is a Subversion workingCopy directory?

A workingCopy directory is a directory that has a hidden subdirectory called “.svn”.

The hidden .svn directory is what Subversion calls an “administrative directory”.

Note the leading period in “.svn”. On Unix systems, a directory whose name begins with a dot is a “hidden” (or “dotfile”) directory.

On your PC, the project’s top-level workingCopy directory has a hidden .svn subdirectory. And each of the subdirectories of the workingCopy directory (if it has any), and each of their subdirectories (if they have any), and so on, has its own hidden .svn subdirectory.

Having a hidden .svn subdirectory is what makes an ordinary file system directory into a Subversion workingCopy directory, a directory that Subversion can recognize and manage.

So, for a project named “ProjectX” the workingCopy directory will be named “ProjectX”. It might look like this:

	ProjectX [DIRECTORY]
		projectx.py
		projectx_constants.py
		.svn [DIRECTORY]

What is in a .svn subdirectory? What does a Subversion administrative directory contain?

The Subversion documentation says this about workingCopy directories:

A Subversion working copy is an ordinary directory tree on your local system, containing a collection of files. You can edit these files however you wish, and if they’re source code files, you can compile your program from them in the usual way. …

A working copy also contains some extra files, created and maintained by Subversion, to help it carry out these commands. In particular, each directory in your working copy contains a subdirectory named .svn, also known as the working copy’s administrative directory. The files in each administrative directory help Subversion recognize which files contain unpublished changes, and which files are out of date with respect to others’ work.

Here’s another clue: a passage from Pragmatic Version Control Using Subversion:

Subversion has a highly efficient network protocol and stores pristine copies of your working files locally, allowing a user to see what changes they’ve made without even contacting the server [where the central repository is stored].

So now we know what a Subversion administrative directory contains.

The .svn admin directory contains pristine (unchanged) copies of files that were downloaded from the repository. (It contains a few other things, too.)

Earlier, I said “When you are working on a Subversion project, there are actually TWO directories that you are working with… the repository and the working copy.” Now I want to change that. It would be more accurate to say that there are really THREE directories that you are working with:

  • the main ProjectX repository on the server
  • the ProjectX workingCopy directory on your PC, which contains editable (and possibly changed) copies of the files in the project …and also …
  • the hidden Subversion administrative directory, which contains a (pristine, unchanged, and uneditable) copies of the files in the main ProjectX repository on the server.

That means that, on your PC, the ProjectX workingCopy directory looks like this.

	ProjectX [DIRECTORY]
		projectx.py
		projectx_constants.py
		.svn [DIRECTORY]
			projectx.py
			projectx_constants.py

Now things start to become clearer…

Subversion introductions and tutorials often say things that are rather cryptic to someone who is trying to learn Subversion. Even HELP questions and FAQs posted on the Web can be mystifying. Now let’s see how some of those things make sense in light of our knowledge of the .svn subdirectory.


Showing file changes

The reason that Subversion can allow “a user to see what changes they’ve made without even contacting the server” is that the Subversion diff works only on the workingCopy directory on your own PC.

When Subversion shows file changes (that is, shows diffs) it is actually showing diffs between

  • your edited files in the workingCopy directory, and
  • the pristine copies of the those files that are being held in the .svn subdirectory of the workingCopy directory.

“unversioned” files vs. files “under version control”

Suppose that I make a change to one of my files: to ProjectX/projectx_constants.py.

When I make the changes, my editor automatically creates a backup file: ProjectX/projectx_constants.py.bak.

At this point, ProjectX/projectx_constants.py.bak is what is called an “unversioned” file. It exists in the ProjectX directory, but not in the ProjectX/.svn directory, so Subversion knows nothing about it. That makes sense: we don’t want projectx_constants.py.bak to be considered a project file anyway.

But suppose I want to add a new module to the project, called projectx_utils.py. If I simply create the file in the ProjectX folder, it will be an “unversioned” file in just the same way that projectx_constants.py.bak is an unversioned file: it will not exist in the ProjectX/.svn directory, so Subversion knows nothing about it.

So that is why Subversion has a “svn add” command. The command svn add projectx_utils.py will add the file to the project by copying ProjectX/project_utils.py to ProjectX/.svn/project_utils.py. At this point — after it has been added to the .svn subdirectory — the file is said to be “under version control”.

Note that — at this point — although project_utils.py has been “added” to the copy of the project in the workingCopy, the main repository still doesn’t know anything about it — project_utils.py hasn’t been added to the central repository on the server.

When I “commit” my changes, I send the files from my workingCopy to the main repository. Only after that happens does the new file truly become part of the project by becoming one of the files in the central repository.


Help! I’ve lost my .svn directory and I can’t get up!

Because a Subversion workingCopy directory needs a .svn subdirectory in order to work properly, you can have problems with Subversion if you accidentally delete the .svn subdirectory.


What is a “clean copy”?

In various tutorials, and in the Subversion docs, you will run across the expression “clean copy”. A “clean copy” is a copy of only the source-code files, without the .svn directory.

An introduction to Subversion (which is also a nice introduction to the TortoiseSVN open-source Windows GUI client for Subversion) explains things nicely.

If you look closely in your working copy, you may see an .svn folder in each folder of your working copy. The folders are hidden folders, so depending on the Windows settings you may not see them, but they are there. Those folders contain the information that Subversion uses to link your working copy to the repository.

If ever you need to get a copy of what’s in the repository, but without all the .svn folders (say for example you’re ready to publish it or hand the files over to your client), you can do an “SVN Export” into a new folder to get a “clean” copy of what’s in your repository.

Having the concept of a “clean copy” makes it easier to understand the next question…


Checkout vs. Export

A Frequently Asked Question about Subversion is What’s the difference between a “checkout” and an “export” from the repository?

The CollabNet docs say this:

They are the same except that Export doesn’t include the.svn folders and Checkout does include them. Also note that an export cannot be updated.

When you do a Subversion checkout, every folder and subfolder contains an .svn folder. These.svn folders contain clean copies of all files checked out and .tmp directories that contain temporary files created during checkouts, commits, update and other operations.

An Export will be about half the size of a Checkout due to the absence of the.svn folders that duplicate all content.

Note that the reason an exported folder cannot be updated is that the update command updates the .svn directory of a workingCopy, but an export does not create an .svn directory.

Note also that you can export from either the main repository or from the workingCopy .svn directory. See Subversion docs for export.


The (import, checkout) usage pattern for getting started with Subversion

Most “getting started with Subversion” tutorials start the same way. Assuming that you have some project files that you want to put into Subversion, you are told to:

  • do an import
  • do a checkout

in that order.

What you are not told is why you start with those two particular actions in that particular order.

But by now, knowing about the hidden .svn administrative directory and what it does, you can probably figure that out.

Import is the opposite of export. It takes a directory of files — a clean copy of the files, if you will — from your hard drive and copies them into the central Subversion repository on the server.

Always the next step is to do a checkout. Basically a checkout copies the project files from the central repository to a workingCopy directory on your PC. If the workingCopy directory does not exist on your PC, it is created.

The workingCopy directory contains everything you need in order to be able to work with Subversion, including an .svn administrative directory. As the CollabNet documentation (quoted earlier) says:

When you do a Subversion checkout, every folder and subfolder contains an .svn folder. These.svn folders contain clean copies of all files checked out and .tmp directories that contain temporary files created during checkouts, commits, update and other operations.

So the second step — the checkout command — is absolutely necessary in order to get started. It creates a workingCopy directory containing the project files. Only after that happens are your files properly “under version control”.


checkin vs. commit

PVCS (and SourceSafe, and many other version control systems) work on a locking model. “Checking out” a file from the repository means that you get a local working copy of the file, and you lock the file in the repository. At that point, nobody can unlock it except you. Checking out a file gives you exclusive update privileges on it until you check it back in.

“Checking in” a file means that you copy your local working copy of the file back into the repository and you unlock the file in the repository.

It is possible to copy your local working copy of the file into the repository without unlocking the file in the repository. When you do this, you are in a sense “updating” the repository from the working copy.

Because of my familiarity with this kind of version control, I had a certain “mental model” of how a version control system works. And because of that mental model, many of the Subversion tutorials were quite confusing.

One source of confusion is the fact that (as we will see in the next section) the word “updating” in the context of Subversion means exactly the opposite of what it means in the context of PVCS.

One of the Subversion tutorials that I found said that you must checkout your workingCopy from the main repository, because you can’t do a checkin back to the main repository if you hadn’t checked it out. This was very confusing to an ex-PVCS user.

First, it suggested that Subversion works like PVCS: that there is a typical round-trip usage pattern consisting of

  • checking out (locking)
  • editing
  • checking in (unlocking)

But Subversion doesn’t work like this, at least not by default.

What the tutorial was trying to say, I think, was that in order to work with Subversion, you must create a workingCopy directory (that is, a directory that contains an .svn administrative subdirectory). And the way to create a workingCopy directory is to run a svn checkout command against the repository on the server.

Second, explaining things this way was confusing because Subversion doesn’t really have a checkin command. It does have a commit command, which some tutorials call a “checkin” command. But that command does not do the same thing as a PVCS checkin.

Ignore the fact that the short form of the commit command is ci (which stood for “checkin” in an earlier incarnation of Subversion). A Subversion “checkin” is the same thing as a “commit”, and has nothing to do with locking. It would really be helpful if all Subversion tutorials would stop using the term “checkin” and replace it with “commit”.

If you are used to working with a VCS that uses the “check out, edit, check in” paradigm, and you come to understand that Subversion’s commit is not the same as your old familiar check in, then your next question will almost certainly be:

Once you checkout a project into a working folder, how do you check it in a la SourceSafe? [Or PVCS, or other lock-based VCSs? -- Steve Ferg]

I know there is “commit” which puts my changes into the respository, but I still have the files checked out under my working folder. What if I am done with a particular file and I don’t want to have it checked out? How do I check it back in?

You can read the answer here.


What does svn update do?

EXECUTIVE SUMMARY: svn update updates the workingCopy, not the repository.

The Subversion docs describe the update command this way:

When working on a project with a team, you’ll want to update your working copy to receive any changes other developers on the project have made since your last update. Use svn update to bring your working copy into sync with the latest revision in the repository:

Basically, what the update command does is to copy the project files from the central repository down to the .svn directory in your workingCopy.

This is something you should do frequently, because you don’t want the files in your workingCopy/.svn directory to get too far out of sync with the file in the central repository. And you don’t want to try to commit files if your workingCopy/.svn is out of sync with the central repository.

That means that as a general rule, you should always run an svn update:

  • just before you start making a new round of changes to your workingCopy, and
  • just before doing a commit.

Now, having mastered the concept of an .svn directory, we can Understand Many Things, even arcana such as why Serving websites from svn checkout considered harmful.

So that’s it.

This post contains information written by a Subversion newbie in the hopes that it will be useful to other Subversion newbies. But of course, having been written by a newb, there are all sorts of ways it could be wrong.

If you’re a Subversion expert (and it doesn’t take much to be more expert than I am) and you see something wrong, confused, or misleading here, please leave a comment. I, and future generations of Subversion newbies, will thank you for it.

Thanks to my co-workers Mark Thomas and Jason Herman for reviewing an earlier draft of this post.

11 thoughts on “Learning Subversion: the mystery of .svn

  1. Saw this on Planet Python. Nice summary. Would have helped me when I was learning svn.

    Small correction: svn update will also attempt to merge the changes from the repo into the working copy tree (not just record the changes in .svn)

    A nice opportunity for a follow up would address differencees in how git, bazaar and mercury admin directories work.

  2. I recently moved into a new company where I have to learn SVN. In my previous company we used Rational Clearcase for version control. Your article on SVN gave a very good introduction and helped me a lot in getting started.

    Thank you very much

    Regards

  3. This article helped me to understand how svn works. I have been using it for years, but I only learned how to use svn, but not how it worked, till today. Thank you.

  4. Thanks for your article, Steve.

    My office is currently using PVCS, but we are about to start a major project where the vendor is using Subversion. If we were on Subversion, it could make life alot simpler. However, I need more justification than that to recommend migrating all our code to Subversion. Can you tell me what the justification was for your office?

    Many thanks!

    Cheers
    Dave

    • I wasn’t part of the study team, and my memory is hazy, so I can provide only an outline of what I remember about the perception of PVCS in our office. All I can tell you is that the decision to move to Subversion was made by some very competent people, so I trust it.

      The primary justification was a long list of features that we wanted that PVCS just didn’t provide.

      One particular thing that we were looking for was better programmability. After a long period of stagnation, our office started to improve its development practices and was moving toward increased build automation. The final straw came when developers in our office (people with more expertise in build automation than I will ever have) found that they just couldn’t fit PVCS into our new build automation plans. They just couldn’t automate it enough.

      In addition, there was a perception at the time (this was around 2010) that Serena — despite its official statements — was not committed to developing and improving PVCS. As you probably know, PVCS has a long history of changing ownership. There was a perception in our office that PVCS was a dead-end, that it had finally come to rest with a company that wanted to continue to derive income from its current installed base, but had no interest in developing the product any further. Certainly, no one felt that Serena had any interest in developing PVCS into a cutting-edge or state-of-the-art VCS. As I say, this may or may not be true, but that was the general perception in our office at the time.

      So, having decided to leave PVCS, we looked around for the best alternative. We chose Subversion as the best VCS for our particular set of needs. It fit our office’s set of development practices, it was open-source, it had the features we were looking for, and it was stable — it was a mature product, it had a big installed base, it was well-documented, and it was well-supported. (This was before Mercurial, Git, etc. really started to take off, and we didn’t really need a distributed VCS anyway, so they were not contenders.)

      I wish I could give you more details, but I’m working from what is by now a fading memory. So I guess I would recommend googling things like “PVCS vs subversion” and “PVCS subversion comparison”.

  5. It was a nice article Steve.
    I still have some more questions.
    Can some one please help me with those
    Here are the questions

    1.) Does svn commit change the contents of .svn directory also? Because when I edit a file in the working copy and type svn status it shows that the file is modified. But when I commit those changes using the svn commit command, and then type svn status, then it shows nothing. Does it mean that the svn commit command also changes the contents of .svn directory?

    2.) If the svn commit changes the contents of .svn directory, then why do we need the svn update command? If svn commit doesn’t change the contents of the .svn directory, then why does the svn status command doesn’t give any output after the svn commit command?

    Thank you.

    • 1.) Yes. The commit command does change the contents of the .svn directory. The working copy of the file is copied to the main SVN repository (that’s the commit) and then back down to your .svn directory.

      2.) Why do we need the update command?

      Remember, you are (or may be) working in a multi-user environment… you are not the only one who may be committing changes to the main SVN repository. Other people may be committing changes also. So you need to keep your .svn directory in sync with changes that other people may have made to the repository.

      So now read again the description of what update does, and note the bolded text.

      When working on a project with a team, you’ll want to update your working copy to receive any changes that other developers on the project have made since your last update.

Comments are closed.