Wednesday, March 15, 2006

Hard links!

I found the Unix Hater's Handbook for the umpteenth time. When I was in high school, I got it out from the public library. It's a long rant, but it's cute. It has many stories of ancient relics, such as LispM, and early Unix frustration... (One can also peruse some archives of the Unix Haters' mailing list.)

In the last chapter, they bring up one of my favorite things: hard links. Most of you probably don't care, but that hasn't stopped me yet. Hard links are absolutely wonderful creations. Most Unix-like filesystems behave very similarly. Information on a file (such as its owner and permissions and where it is located on disk) are stored in a central place. This allows for a little legerdemain. The filename is not part of this information. Instead, the filesystem views directories as lists of names and associated files (i.e. pointers to these blocks of information).

This means the name, and location in the directory hierarchy are not inherent properties of a file. It is possible that a file on disk is "hooked into" the directory hierarchy at no places (this shouldn't happen) or at more than one place. This latter condition is known as a hard link. This means one has the exact same file at two (or more) different locations in the filesystem, possibly in two different directories, possibly with different names. This is not just an exact copy--it's the exact same file, and it's only taking up space for one.

Why is this useful you ask? That's a very good question. Hard links do seem arcane at first glance. One could use them to get a certain amount of safety. Say you have an important file. Hard link it to a safe place. It can still get screwed by malicious actors--it's the same file--but if your usual editor breaks hard links (i.e. when it edits a file, it takes a hard linked file, and replaces it with a copy) you will be safe from accidentally messing it up. You have a known good copy in the "safe place."

An even better use is to create branches cheaply. When I program something (I've been known to do this ;)), and I want to work on some feature or another, I can hardlink the entire directory structure over. This is quicker than copying, and it uses much less disk space. Then my editor can break the links for any files I edit. And I now have two directory structures: one with the unmodified source and one with the modified source. And they get to share the disk space for all the unedited files! Yes, disk is cheap, but my laptop hard disk isn't that big. (There are even certain revision control systems that can work within this setting. Even cooler!)

To create a hard link, just run ln filename linkname. They must be on the same filesystem. After this you'll now have a file called linkname which is identical in every respect to filename. If you want to do this on a whole directory structure, you can run cp -al directory/ newdirectory/. Now you'll have a new directory structure newdirectory/ with exactly the same files as directory/.

Then, you may want to teach your editor to break hard links. For vim you can do this by putting set backupcopy=auto,breakhardlink in your .vimrc file.