Adam @ Heroku
a tornado of razorblades

Rebasing is Editing Commits

June 30, 2008 at 05:19 PM

Rebase is one of Git's most alluring and yet most difficult-to-comprehend features. Rebasing is editing commits. When you rebase, you're rewriting history.

This is possible with Git because of the separation between a commit and a push. A commit is a changeset with some attached metadata like the commit message. A push publishes your commits to a remote repository. In Subversion, these steps are inseparable, both part of the commit.

Because of this separation, you can rewrite your commits before you push them. Just as you can edit your sourcefiles as many times as you want before you commit the results, so to can you edit your commits as many times as you want before you push the results.

Rebasing comes in two main forms. One is the interactive rebase. git rebase -i HEAD~5 pops you into an editor where you can change the order of the commits, delete entire commits, or squash commits together. You can also edit a commit, which will take you out of the editor and let you work on the commit in your working tree, and then commit with git commit —amend.

The other type of rebase is rebasing your local commits on top of some changes you're pulling from a remote source, or against a local branch. (Hence the term "rebase": you're creating a new base for your patches.) The action takes some new commits from a branch and slips them in underneath yours, at the point the two branches diverge. Commits prior to the divergence point are unaffected.

I use this kind of rebasing instead of git pull. You'll notice that pull almost always creates a merge commit, which is one of these things:

commit c4110e1fb1aa50c4f876716bde07f6a982a1f31c
Merge: 296a0db... cb6050b...
Author: Joe <joe@example.com>
Date:   Tue Jun 24 14:46:41 2008 -0700

    Merge branch 'master' of example.com:repo.git

You might wonder why a merge commit is needed. Subversion doesn't have that, after all. But that's because the merge commit in svn is always implicit. Did you ever find yourself working on an active project, and then when you went to commit, you needed to svn up and got a whole boatload of changes, including some conflicts? Sure you did. And then you had to sit there and perform the merge on your source, finally making a big commit which included both your changes and the merge.

Git encourages discrete changesets, so it makes sense to break apart regular changes (new feature, bugfix, etc) from merges. But on an active project with lots of contributors, there's always merging going on. So you end up with lots of ugly merge commits cluttering up your logs.

Rebasing lets us have and eat our cake. Now you can make atomic commits as you're working, regardless of whether you are ready to share those commits with your team. But when you want to pull down the latest work from your team and merge it with your work, you can instead use rebase to reapply your patches on top of theirs. If you're not working on the same areas of the code, then this takes almost no work. Just type git fetch && git rebase origin master and you're done. (Notice that the output of git log then shows your recent changes at the top, regardless of the timestamps.)

Occasionally there are merge conflicts during the rebase, and this will drop you out into a shell with some rather intimidating messages. Don't worry though, all you need to do is take a look at the conflicted files, choose the part that you want (just like resolving a regular merge conflict), and then git add them. When you're done, run git rebase —continue. It does this merge step separately for each commit, so if there are a lot of commits in the difference between you and the remote source, this could get time-consuming. In that case you may want to git rebase —abort and then run a regular git merge or git pull. Next time, resolve to rebase more frequently, to make the merge job less of a headache.

Since rebasing is editing of commits, it doesn't make much sense to rebase things that have already been pushed. You can do it, but as soon as you go to merge with another repo that had the unedited commit history, you'll bump into weirdness (and probably invalidate your whole reason for rebasing, which was to clean up the history). So as a general rule, I recommend never rebasing things that have already been pushed.

If you push something and then realize five seconds later that you shouldn't have, it is possible to rebase your local branch and then git push --force, which will obliterate the remote repo's history. This won't help if someone else has already pulled the commits, since the next time they push the commits will come back, so only use it when you're certain that no one else has pulled.

Comments: 2 (view/add your own) Tags: git

Git Submodule

June 25, 2008 at 12:09 AM

Git submodules are pretty cool, except for kind of sucking. Things I don't recommend doing if you value your sanity:

  • Switching a submodule from one repository to another (i.e., editing .gitmodules and changing the repo url)
  • Switching a directory from a submodule to regular content
  • Switching a directory with regular content to a submodule (though this might help you)

It's a shame, because submodules are pretty handy. But you'll probably end up wanting to do one of these things during the lifetime of your project, and then you're screwed.

Comments: 1 (view/add your own) Tags: git

More Git Techniques

May 15, 2008 at 05:28 PM

In the spirit of Graeme Mathieson's git techniques, here are a few of my favorites, with their svn equivalents for reference.

Restore a file to repository version
  svn: rm file; svn up file
  git: rm file; git checkout file

See what commits would be pulled on an update
  svn: svn stat -u
  git: git fetch; git log HEAD..origin/master

See what code changes would be pulled on an update
  svn: svn diff -rHEAD
  git: git fetch; git diff HEAD..origin/master

Grab one commit without any of the commits around it:
  svn: svn diff -rN1:N2 > my.patch; scp my.patch other.server; ssh other.server "patch < my.patch"
  git: git fetch; git cherry-pick [commit-hash]

Revert a commit
  svn: svn merge -rN2:N1; svn commit -m "reverted commit N2"
  git: git revert [commit-hash]

Set some changes aside to work on something else
   svn: cd ..; mv myproj myproj_stash; svn co svn+ssh://server/myproj
   git: git stash

Discard local changes
  svn: svn revert -R .
  git: git reset —hard HEAD

Edit recent commits
  svn: echo "Oops."
  git: git rebase -i HEAD~5
Comments: 3 (view/add your own) Tags: git

Loose Whitespace Annoys Me

April 04, 2008 at 03:46 PM

It tickles me to death that Git highlights loose whitespace in red.

I may be showing my obsessive-compulsive side here, but one thing I don't like about TextMate is that it indents blank lines. (Other editors, such as vim and emacs, remove the superfluous whitespace when you press enter twice on an indented block. That is, they never save a file which has a blank line containing whitespace; blank lines are always just a newline and nothing else.)

I'd be ok with it if it was consistent, but it isn't. Which is why you end up with blank lines with whitespace that doesn't match nearby indents. Hardly a showstopper, but it does mean that I end up wasting time chasing whitespace around when I should be coding. (Britt Selvitelle agrees, and offers some tricks to work around TextMate's inherent whitespace sloppiness.) Maybe now that Git is all the rage, the TextMate author will be encouraged to change its handling of whitespace.

Update: You only get colorized diffs if you have this in your $HOME/.gitconfig:

[color]
    diff=auto
Comments: 4 (view/add your own) Tags: git, ux

git-wiki

March 09, 2008 at 02:28 PM

git-wiki is a wiki written in less than 200 lines of code using Sinatra as the web framework and Git as the database. Quite clever. Check out how they store the CSS at the end of the file using END - a Ruby trick I was previously unawrae of.

Comments: 0 (view/add your own) Tags: git, ruby

Changesets, Not Snapshots

February 18, 2008 at 01:37 AM

It seems that Git fever is going around.

An important concept to keep in mind with decentralized revision control is that the entire paradigm is one of changesets, not snapshots. Subversion and friends take a snapshot of your source tree and save it. Branching and merging is thus quite painful, because it's hard to tell the intentions of the code authors when all you have is a before snapshot and an after snapshot.

A changeset is a single commit - mostly the patch (diff) of the code, but also some metadata about the patch, such as the parent node in the commit tree. This is what allows git and friends to seem smarter (in fact, nearly omniscent) when it comes to merging. It isn't about a better merge algorithm; it's about well-formed data.

Developing in decentralized revision control should lead you to think about your commits in a different way. Each commit should be a single, atomic bundle. If you find yourself thinking "Huh, I should commit - it's been a while since my last one" - that's wrong. Your commit will mostly likely contain several changes, and thus not a discrete changeset.

Imagine that each commit were a patch that was going to be emailed to the maintainer of the project, or posted in Trac. If you sent a patch that had two unrelated changes in it, the maintainer would send it back and ask you to separate them into two patches. Think of your commit tree the same way. Each commit should stand on its own merit, isolated from other changes.

git provides tools to keep your commit history well-formed. git stash is one of my favorites. But the crown jewel is git rebase -i HEAD~10. If you haven't used this yet, try it at your next opportunity. It loads your last 10 commits into a text editor and lets you manipulate them in a freeform fashion.

The "squash" option is particularly nice. If you've ever had a commit log that looks like this:

r50 | adam | amazing feature that works perfectly
r51 | adam | oops, it works now
r52 | adam | oops, one other thing
r53 | adam | ok really this time

...you can use squash to combine these all into a single commit.

You'll understand better how this can be possible when you internalize the fact that commits and pushes are not the same thing. A commit indicates that you have completed a single, atomic change. A push publishes your changes out to others on your team, but is not a commit event in and of itself.

Forking Rails

January 05, 2008 at 03:51 PM

For a while, everything that we did in Heroku to extend the functionality of Rails, or interoperate with it, has been done through extension mechanisms. Monkeypatching via plugins, use of the somewhat obscure Mongrel GemPlugin, and tweaking of the user's Rails app files directly. (We try to avoid that last one whenever possible. Early on we did a lot of it, but more recently we've managed to avoid it almost entirely, much to my relief.)

All of this is quite a testament to the extensibility of Rails, Mongrel, Nginx, etc. I think it's safe to say that we're bending these tools in ways that go well outside the common use cases. But even the most supple reed can only bend so far. Certain areas (for example, script/generate) can't be monkey patched through standard mechanisms. And so earlier this week I decided it was time to fork Rails.

Since I'm now a fan of Git, this turned out to be a good way to maintain our fork. Steve or Pablo will tell you how. Maintaining a parallel branch has, so far, proven to be quite easy - even fun.

Packaging up the modified version for use is just a matter of running "rake package" in each module's subdirectory (i.e. activerecord, activesupport, railties, etc). The resulting gem is dropped into pkg/ under each module. The gem can then be copied to each user's app server as it boots and installed with gem install, which overwrites the standard gem if it already exists. (I'm still trying to figure out a way to run the package script without building the documentation, which takes ages. I ended up commenting out the body of the generate_rails_framework_doc task as a stopgap.)

One thing I still haven't decided on is how to maintain multiple versions. I wish that there was a syntax for environment.rb that looked something like:

RAILS_GEM_VERSION = '2.0.*'

Personally, I rarely care which minor rev I'm running - the latest version available on whatever box the app is running on is usually just fine. For Heroku apps, they should definitely use whatever the latest minor rev is. But note that setting one catch-all with an environment variable isn't adequate, because each app should be configured to use a particular major rev, and we need to respect that. Hopefully I'll come up with something better on this eventually.

Comments: 3 (view/add your own) Tags: git, rails

I Git It

December 30, 2007 at 12:24 AM

I've been sold on the concept of distributed revision control systems ever since Simon Michael showed me Darcs a year ago. But I've been slow to adopt it for any real use. Part of the reason for this was that I wasn't super-happy with the implementations available, although there are quite a few. Darcs certainly seemed like the best of the bunch, but it never felt quite right to me. Besides being sluggish, it relies on tons of interactive prompting - which is not the sort of UX I look for in a command-line tool.

Then I discovered Git. Younger than most of the other choices, Git seems to have that extra bit of pop that Darcs et al are missing. It doesn't hurt that it's author is an opinioned Scandanvian uber-hacker who we all love. (No, not that one.)

I'm especially pleased to see that there seem to be some recent rumblings that Git may be gaining traction in the Rails community. I thought I'd be a minority voice on this, but it seems like everyone else is seeing the light too - and with remarkable synchronization. An idea whose time has come, perhaps.

Tom Moertel has an intriguing description of porting his workflows from Darc to Git. Although ostensibly about how Git is a preferred tool, along the way it shows off some of the crazy-awesome features of Darcs.

But that's because decentralized SCMs spank the pants off of centralized ones. So why isn't everyone using them already? Because there's a cost: the higher level of brainpower required. To use it at all, certainly, but also to take advantage of the really powerful features of the decentralized model. But since SCM is a major part of a developer's toolchain, the time investment in learning a harder but more powerful tool makes sense. Going from Subversion to Git is like going from pico to vi, or from tcsh to bash, or from flat files to SQL. Getting your head around it may hurt a bit, but the effort will pay off very nicely in the long run.