Adam @ Heroku
a tornado of razorblades

Changesets, Not Snapshots

February 18, 2008 at 01:37 AM

It seems that Git fever is going around.

An important concept to keep in mind with decentralized revision control is that the entire paradigm is one of changesets, not snapshots. Subversion and friends take a snapshot of your source tree and save it. Branching and merging is thus quite painful, because it's hard to tell the intentions of the code authors when all you have is a before snapshot and an after snapshot.

A changeset is a single commit - mostly the patch (diff) of the code, but also some metadata about the patch, such as the parent node in the commit tree. This is what allows git and friends to seem smarter (in fact, nearly omniscent) when it comes to merging. It isn't about a better merge algorithm; it's about well-formed data.

Developing in decentralized revision control should lead you to think about your commits in a different way. Each commit should be a single, atomic bundle. If you find yourself thinking "Huh, I should commit - it's been a while since my last one" - that's wrong. Your commit will mostly likely contain several changes, and thus not a discrete changeset.

Imagine that each commit were a patch that was going to be emailed to the maintainer of the project, or posted in Trac. If you sent a patch that had two unrelated changes in it, the maintainer would send it back and ask you to separate them into two patches. Think of your commit tree the same way. Each commit should stand on its own merit, isolated from other changes.

git provides tools to keep your commit history well-formed. git stash is one of my favorites. But the crown jewel is git rebase -i HEAD~10. If you haven't used this yet, try it at your next opportunity. It loads your last 10 commits into a text editor and lets you manipulate them in a freeform fashion.

The "squash" option is particularly nice. If you've ever had a commit log that looks like this:

r50 | adam | amazing feature that works perfectly
r51 | adam | oops, it works now
r52 | adam | oops, one other thing
r53 | adam | ok really this time

...you can use squash to combine these all into a single commit.

You'll understand better how this can be possible when you internalize the fact that commits and pushes are not the same thing. A commit indicates that you have completed a single, atomic change. A push publishes your changes out to others on your team, but is not a commit event in and of itself.

I Git It

December 30, 2007 at 12:24 AM

I've been sold on the concept of distributed revision control systems ever since Simon Michael showed me Darcs a year ago. But I've been slow to adopt it for any real use. Part of the reason for this was that I wasn't super-happy with the implementations available, although there are quite a few. Darcs certainly seemed like the best of the bunch, but it never felt quite right to me. Besides being sluggish, it relies on tons of interactive prompting - which is not the sort of UX I look for in a command-line tool.

Then I discovered Git. Younger than most of the other choices, Git seems to have that extra bit of pop that Darcs et al are missing. It doesn't hurt that it's author is an opinioned Scandanvian uber-hacker who we all love. (No, not that one.)

I'm especially pleased to see that there seem to be some recent rumblings that Git may be gaining traction in the Rails community. I thought I'd be a minority voice on this, but it seems like everyone else is seeing the light too - and with remarkable synchronization. An idea whose time has come, perhaps.

Tom Moertel has an intriguing description of porting his workflows from Darc to Git. Although ostensibly about how Git is a preferred tool, along the way it shows off some of the crazy-awesome features of Darcs.

But that's because decentralized SCMs spank the pants off of centralized ones. So why isn't everyone using them already? Because there's a cost: the higher level of brainpower required. To use it at all, certainly, but also to take advantage of the really powerful features of the decentralized model. But since SCM is a major part of a developer's toolchain, the time investment in learning a harder but more powerful tool makes sense. Going from Subversion to Git is like going from pico to vi, or from tcsh to bash, or from flat files to SQL. Getting your head around it may hurt a bit, but the effort will pay off very nicely in the long run.