Adam @ Heroku
a tornado of razorblades

Dude, That Is So Fringe

July 22, 2008 at 01:00 PM

Rubyfringe was great. Off-the-beaten-path topics, small size, tons of style, and countless small touches made it stand apart from the typical tech conference. (They included a Rubyfringe-branded condom in the swag, for gawdsakes.)

Quick summaries of some of the talks follow. Warning, these probably reflect my own disposition as much as that of each speaker.

  • Dan Grigsby - Don't work for the man, be a programmer/entrepreneur instead. Treat each venture as an experiment and don't be shy about terminating the ones that don't work. You'll strike out a lot but eventually hit a home run.
  • Yehuda Katz - Living on the edge is fun, but also dangerous. Finding a balance between the cutting edge and the bleeding edge is the trick.
  • Luke Franci - There are methods for testing other than code-oriented ones like unit testing. RCov doesn't mean much. Use QA, usability tests, and code reviews for well-rounded coverage.
  • Obie Fernandez - To be successful at consulting, watch the balance of power in the relationship with your clients. Most Rails freelancers are charging too little - he recommends no less than $100/hr.
  • Matt Todd - Don't be afraid to dive in with both feet. Make mistakes, learn from them. Pick good problems to solve.
  • Jeremy McAnally - Frameworks are getting fat. Frameworks should be specific. Don't use Rails (or any other framework) outside of its domain.
  • Hampton Catlin - Javascript is a good general purpose language, but terrible in the browser. Add-on libraries like Prototype and JQuery are just band-aids on this problem. The client-side language should be tightly coupled to the DOM, like CSS. He proposes Jabl.
  • Giles Bowkett - Fuck the man, fuck the mainstream. Programming is a tool to do your art. His art is music, his tool is Ruby, and the result is Arcaeopteryx. (Giles stole the show with this talk. He ran like twice as long as his alloted time but we were all so into it that no one minded.)
  • Damien Katz - If you want to become the guy that gets paid to build cool things, take a risk: start by making something cool without any plan for how to get paid for it. (He did that, and now IBM pays him to work on CouchDB and contribute it to the Apache Foundation.) Also, Erlang is sweet.
  • Reginald Braithwaite - It's too bad Ruby isn't more like Lisp or Haskell.
  • Tom Preston-Warner - The scientific method rules, use it for coding (and life). Ruby 1.8 has weird memory leak bugs. Github now has a git-powered pastie site, gist.github.com. Also, Erlang is sweet.
  • Blake Mizerany - Sinatra is a framework for fast, small web services. Routers are unnecessary obfuscation; treat a web resource url like you would a function name, and address it directly.

The overarching theme of the conference seemed to be that Ruby's steady march into the mainstream means it's losing its luster for us, the early adopters. Ruby was once fringe, but now it's not. We're now all in seek of the fringes of Ruby, and of software tech in general.

You'll note the use of "fringe" as an adjective. A unique culture seemed to emerge from the attendees over the weekend, and using "fringe" this way - as in, "Dude, that is so fringe" - was one trait of that culture.

It's a great term. The fringe refers to all the places where the weird, interesting, chaotic experimentation goes on. This is the spawning ground for tomorrow's new hotness, but the fringe never looks like much when looking at it from the mainstream. Why do I need a new database paradigm? SQL serves just fine, thank you. What about a new web framework, a new object relational mapper, a new transport protocol, a new language? Rails, ActiveRecord, HTTP, and Ruby also serve just fine. CouchDB, Sinatra, DataMapper, XMPP, and Erlang are on the fringe, along with countless other even less well-known projects. These things are not just weird in how they work, they're weird in that they solve problems that most people don't even know they have. That's the fringe.

"Fringe" is more descriptive than the more commonly used "cutting edge." Cutting edge implies a one-dimensional graph, as if tech is on a single well-charted path toward an ultimate destination. Put that way, who wouldn't want to be on the cutting edge, meaning you're further down that inevitable path?

But that's not how it is. The state of tech, charted over time, is an N-dimensional graph. The fringe is the ragged edges of that graph, the weird bits hanging off the edge. Weirdos with weird visions doing weird stuff that few besides them understand. 99% of this weird stuff never turns into anything. But some tiny fraction turns out to be a new direction for tech, the next big thing, the new hotness, a revolution. In this, it has more in common with biological evolution than with the design and planning we associate with the works of man.

Early adopters crave being in the fringe. We love the chaos, the freedom for wild experimentation, the cognitive challenge of trying to predict which bits of this massively heterogeneous mess may turn into something world-changing. It's not the best place to be from a practical standpoint: proven tech from the mainstream is what you want for getting "real" work done. But there's a satisfaction you can get in the fringe that can't be found anywhere else.

rush 0.4

July 16, 2008 at 11:55 AM

rush 0.4 released. Some changes:

  • Rush::Box#processes returns a ProcessSet, for syntax like this:
processes.filter(:cmdline => /mongrel_rails/).kill
  • Daemonize shell commands: bash 'some_daemon', :background => true

  • Pass args to rush on the command line to execute a one-off, like this:

$ rush 'processes.filter(:command => "ruby")'

Details and rdocs.

Comments: 1 (view/add your own) Tags: rush

Owning Up

July 14, 2008 at 02:35 AM

The healthcare industry is starting to see benefits from being honest when they make mistakes. Denying that doctors and hospitals ever screw up has been the historical approach to avoiding malpractice suits; yet by being more honest, hospitals are seeing a decrease in such lawsuits.

If doctors can do it, why not software authors? Own up to your mistakes. People will trust you more in the long run.

The End of Bugs?

July 06, 2008 at 11:55 PM

Although I've been a believer in TDD/BDD for quite a while, rush was the first time I started a project and said, ok, THIS time I'm going to get really serious about it. My very first commit was a bunch of empty specs, and since then, I don't think I've committed any new feature or even a bugfix without an accompanying spec.

A few weeks into the project, I had made really substantial progress and had quite a lot of functionality. One day, while using it to access a remote machine to do some file operations, something very surprising happened: I typed a certain command and it didn't behave the way I expected it to.

I figured I must have a typo or something, but double-checking it, I realized that, no: my command was correct, it was the program which was behaving incorrectly. I had found, it seemed, a bug.

I felt a sudden sense of disorientation and panic. The program wasn't behaving as expected? What should I do? Log things? Run a debugger? Just squint at the code for a while and see if I could spot the problem? These methods seemed so...crude.

And yet, I realized, these techniques are the very ones I've been using my whole career, that I use every single day to get work done. I don't give them a second thought. Yet, after two weeks of doing pure BDD, the idea of spending time dealing with a bug seemed foreign and painful.

I ended up tracking down the problem and discovered that, despite my nearly 100% code coverage (as reported by RCov), I had a fork in a single-line conditional that was not speced. Upon realizing this, another feeling washed over me: a sense of mistrust in the project's codebase, because a fragment of one line of code did not have spec coverage. I wrote the spec, confirmed that it failed / exposed the bug, and then fixed the bug itself ten seconds later. Relief flooded over me: things were right with the world again.

Nearly all developers, myself included, spend most of our time in that state: not quite trusting that the code all works. I only had the chance to realize how stressful and unpleasant this state is because I lived outside it for a little while.

This experience also told me something else: after several weeks of intense development, this was the first time I had encountered a bug. I feel absolutely certain that that's never happened to me before.

You could argue that rush was not then (and is still not now) a very big or complicated program, but I don't think that's significant. First, by this time I was already using it to control a remote server (i.e., it was communicating commands over the network via rushd); that's the sort of thing that is highly prone to bugs. But more importantly, even very short and simple programs often have bugs lurking in them.

My whole life I've assumed that bugs are a given in software, and that trying to eliminate them completely is a waste of time. And I think I was correct, for the traditional, non-BDD approach. ("Code-driven development," I guess?)

But it may be that a very disciplined approach to BDD means the opportunity to have truly bug-free software. Just imagine what it would be like to have all the time you spend debugging code go into writing it instead. Granted, a lot of that goes into writing specs; but for me, writing specs is far more fun than debugging. It's programming, not sleuthing.

Bug-free software is a very bold claim, though it does require a somewhat narrow definition of the word "bug." A bug is a situation where the code has been specified to behave one way, and instead it behaves another. It does not include things that users or developers may like the software to do, but that it is not specified to do currently. It also doesn't include external depedencies (libraries, web services..) behaving in an unexpected manner - good software should try to recover gracefully from failures in external services, but doing so is a type of feature.

Look back at that definition of bug again: "...the code has been specified to behave one way..." BDD is the very act of writing executable specifications. In non-BDD, there is no specification; so by definition, software written that way can be said to either have an infinite number of bugs, or perhaps just no functionality that you can rely on.

Comments: 14 (view/add your own) Tags: bdd

Read-Only Source Trees

July 02, 2008 at 03:06 PM

Cloud computing is on everyone's minds, because it offers the promise of infinite horizontal scalability. But to achieve this, we have to change how we build applications.

One such change is how we use the filesystem. The filesystem is unix's database. "Everything is a file" has served us well for decades, and that concept will continue to be critical at the systems layer. But at the application layer, it's time to stop treating the filesystem as a catch-all dumping ground, and start treating the data we store there in a more structured way.

An app's main use of the filesystem is sourcefiles. What qualifies as a sourcefile? Your code, sure - Ruby, ERB, HTML, Javascript, CSS, specs/tests, rake tasks. But also, small static assets that are part of the application's interface, like public/images/top_left_gradient.png and public/robots.txt. If you check it into revision control, then it is probably a sourcefile.

Other than sourcefiles, what do we stick on the filesystem? PIDs and logfiles come to mind. Anything that it is in tmp or log. This stuff is not source, which is probably why it's in your .gitignore. In my opinion it should not be in your application's directory structure at all.

How about user-uploaded assets, like profile pictures? attachment_fu offers a filesystem backend, which shoves files into your public/ dir. But these are not source - it's application data. It has more in common with the contents of the database: data specific to a particular installation of the app. Putting this data into your source tree is confusing.

More significantly, it greatly complicates the problem of scaling.

The correct solution, in my opinion, is to forbid access to the source tree by the web app. Temporary files can be offered through Ruby's Tempfile interface, with the understanding that files thus created are not accessible beyond the lifetime of the request being served.

Logs are a whole other challenge. I'm not a big fan of logfiles; there are better solutions to the logging problem, which I'll write about some other time. In the meantime, logs should go outside the code tree, some sort of /var-style location which can be cycled or thrown away as needed. This location could be write-only for the app; it pushes things in, but it can't read them back or otherwise access it once written. A one-way channel, ala syslog.

As for attachments, asset stores are the correct solution. attachment_fu's :storage => :s3 backend, for example. Storing in the database is reasonable, though I've always found a lot of frustration in trying to store large binary data in the database. Apps on Heroku can use the :storage => :heroku attachment_fu backend.

As we continue to explore the next generation of application deployment, I think we're going to bump into a number of ways to structure apps differently in order to make them scalable. There will be some transitionary pain with these changes, because structure implies restrictions. Many PHP developers coming to Rails have complained about not being able to access sessions from models, or write SQL in your view. MVC creates restrictions, yes, but those very restrictions are what provides the structure. Coming from an unstructured environment, those restrictions may seem cumbersome or arbitrary; but once you're in the habit, you come to appreciate the structure they create.

Rebasing is Editing Commits

June 30, 2008 at 05:19 PM

Rebase is one of Git's most alluring and yet most difficult-to-comprehend features. Rebasing is editing commits. When you rebase, you're rewriting history.

This is possible with Git because of the separation between a commit and a push. A commit is a changeset with some attached metadata like the commit message. A push publishes your commits to a remote repository. In Subversion, these steps are inseparable, both part of the commit.

Because of this separation, you can rewrite your commits before you push them. Just as you can edit your sourcefiles as many times as you want before you commit the results, so to can you edit your commits as many times as you want before you push the results.

Rebasing comes in two main forms. One is the interactive rebase. git rebase -i HEAD~5 pops you into an editor where you can change the order of the commits, delete entire commits, or squash commits together. You can also edit a commit, which will take you out of the editor and let you work on the commit in your working tree, and then commit with git commit —amend.

The other type of rebase is rebasing your local commits on top of some changes you're pulling from a remote source, or against a local branch. (Hence the term "rebase": you're creating a new base for your patches.) The action takes some new commits from a branch and slips them in underneath yours, at the point the two branches diverge. Commits prior to the divergence point are unaffected.

I use this kind of rebasing instead of git pull. You'll notice that pull almost always creates a merge commit, which is one of these things:

commit c4110e1fb1aa50c4f876716bde07f6a982a1f31c
Merge: 296a0db... cb6050b...
Author: Joe <joe@example.com>
Date:   Tue Jun 24 14:46:41 2008 -0700

    Merge branch 'master' of example.com:repo.git

You might wonder why a merge commit is needed. Subversion doesn't have that, after all. But that's because the merge commit in svn is always implicit. Did you ever find yourself working on an active project, and then when you went to commit, you needed to svn up and got a whole boatload of changes, including some conflicts? Sure you did. And then you had to sit there and perform the merge on your source, finally making a big commit which included both your changes and the merge.

Git encourages discrete changesets, so it makes sense to break apart regular changes (new feature, bugfix, etc) from merges. But on an active project with lots of contributors, there's always merging going on. So you end up with lots of ugly merge commits cluttering up your logs.

Rebasing lets us have and eat our cake. Now you can make atomic commits as you're working, regardless of whether you are ready to share those commits with your team. But when you want to pull down the latest work from your team and merge it with your work, you can instead use rebase to reapply your patches on top of theirs. If you're not working on the same areas of the code, then this takes almost no work. Just type git fetch && git rebase origin master and you're done. (Notice that the output of git log then shows your recent changes at the top, regardless of the timestamps.)

Occasionally there are merge conflicts during the rebase, and this will drop you out into a shell with some rather intimidating messages. Don't worry though, all you need to do is take a look at the conflicted files, choose the part that you want (just like resolving a regular merge conflict), and then git add them. When you're done, run git rebase —continue. It does this merge step separately for each commit, so if there are a lot of commits in the difference between you and the remote source, this could get time-consuming. In that case you may want to git rebase —abort and then run a regular git merge or git pull. Next time, resolve to rebase more frequently, to make the merge job less of a headache.

Since rebasing is editing of commits, it doesn't make much sense to rebase things that have already been pushed. You can do it, but as soon as you go to merge with another repo that had the unedited commit history, you'll bump into weirdness (and probably invalidate your whole reason for rebasing, which was to clean up the history). So as a general rule, I recommend never rebasing things that have already been pushed.

If you push something and then realize five seconds later that you shouldn't have, it is possible to rebase your local branch and then git push --force, which will obliterate the remote repo's history. This won't help if someone else has already pulled the commits, since the next time they push the commits will come back, so only use it when you're certain that no one else has pulled.

Comments: 2 (view/add your own) Tags: git

EVDO Rules

June 28, 2008 at 05:21 PM

EVDO cards rule. The speed and latency are good enough for working over ssh and other common development tasks like pulling up documentation on the web or installing a small gem or two. (Though I don't recommend doing a git clone of Rubinius over EVDO). At Railsconf I used my EVDO card instead of the spotty wireless and was tapping away happily as everyone else struggled to load google.com.

Mine is from Verizon, $50/mo, worked on my Mac without as soon as I plugged it in, no software installation or configuration of any kind. So far I've been able to use it everywhere I've traveled (all within the US so far, though I'll be racking up some of those $0.002/kb roaming charges in Canada on my way to Rubyfringe next month).

During a road trip to Santa Barbara last month, the pager went off right in the middle of farm country. I was able to crack open my laptop and take care of the problem without even pulling over. sshing at 75 mph is a whole new experience. (No, I wasn't driving.)

EVDO has changed the way I work and travel. Cutting the tether means I'm no longer afeared of getting stuck in a waiting room for an hour - that's just enough time to crank out a cool new feature.

Office Aesthetics

June 27, 2008 at 03:26 PM

Slicehost's new offices are mouth-wateringly gorgeous.

DSC00339

Comments: 2 (view/add your own) Tags: (none)

Service-Oriented Architectures

June 26, 2008 at 01:56 AM

I've always liked to build systems with a bunch of small apps that talk to each other through various protocols. Orion and I built TrustCommerce in this manner, and that gave it some pretty impressive fault-tolerance and scalability.

I had heard the term SOA (service-oriented architecture), but had always dismissed it as enterprisey talk. (Bland-yet-pompous three-letter acronyms make my brain turn off.)

At some point, it dawned on me that what I like to do - build small apps communicating with each other over the network - is exactly what SOA means. In its modern incarnation, service architectures use REST calls, which follows the unix tradition of small sharp tools, loosely coupled by a simple but extremely flexible protocol.

Heroku is no one app - even aside from all the server software and configs, our code is currently split among around two dozen apps, each with their own repository, and most with their own database. (Some have no database.) Most of these are Rails apps, though some are bare Ruby on a Mongrel or bare Rack on a Thin, and some are Sinatra apps.

This is why my Railsconf talk was about HTTP routing. When you've got dozens of apps, some of which respond to complex domains (for example, edit.*.heroku.com), a powerful http router outside your application VM becomes damned near indispensable.

Service architectures are the solution to the longtime problem of apps growing to monstrous proportions. Once you exceed a couple dozen models and/or controllers, it starts to be very hard for new developers to grock everything that's going on, and the barrier of entry becomes very high.

With a service architecture, each app has a simplicity that's reminiscent of the "make your own blog" sample apps you see in tutorials. Rails seems to better retain its beauty in this state. Probably everything does.

But it introduces new challenges. You've got dependencies between repositories - any time you change the interface, you have to be sure to roll out the server and client apps together. The relationship between internal apps is thus very similar to external ones - you need versioning and dependency management. Heroku has an app we call our architecture atlas which tracks all the components, dependencies between them, and documents their APIs.

Managing authentication becomes a big job. We do a lot with custom HTTP headers on this (again, one of the main topics in my Railsconf talk), but I've got my eye out for even more sophisticated solutions. OAuth is one that has piqued my interest.

Perhaps the hardest question is where to draw the dividing line between one app and the next. Does a given model go in app A or app B? And that requires a lot of hard thinking about your design. An app should do just one thing, and without having to touch other apps too much. This often means splitting an app apart, and occasionally fusing two apps back together. This is the same process as managing object classes within an app: as each item's responsibility within the architecture changes, code moves around.

I find it useful to think of each internal component as its own service that could potentially be spun off as its own company. If it has its own code repo, database, tests, API, and docs, then turning it into a standalone service would just be a matter of giving it a slick marketing name and putting up a website. It's not that you'd want to do this, mind you: but if your service architecture is designed well, it would be easy to.

Comments: 1 (view/add your own) Tags: heroku, rest, soa

Git Submodule

June 25, 2008 at 12:09 AM

Git submodules are pretty cool, except for kind of sucking. Things I don't recommend doing if you value your sanity:

  • Switching a submodule from one repository to another (i.e., editing .gitmodules and changing the repo url)
  • Switching a directory from a submodule to regular content
  • Switching a directory with regular content to a submodule (though this might help you)

It's a shame, because submodules are pretty handy. But you'll probably end up wanting to do one of these things during the lifetime of your project, and then you're screwed.

Comments: 1 (view/add your own) Tags: git

Recruiting

June 23, 2008 at 02:39 PM

Once or twice a week, I get an email from a recruiter looking to hire a Ruby developer. I can spot these within the first half a sentence, and delete them without reading the rest. Obviously they got my name someplace and didn't stop to notice that I'm the founder of VC-backed startup and am by no means looking for employment. I suspect that most other Rubyists get the same sort of emails, and the better know you are, the more you get.

These emails are the waste product of an inefficient recruiting system. There are tons of Rubyists out there that very much want a day job using the language they love. There are also tons of great companies, big and small, who very much need to hire said Rubyists. But there's no good mechanism for making those matches efficiently.

The first generation of web-based recruiting technology (monster.com, dice.com) tried to solve this in a straightforward way. If it's just a search problem, then throwing a bunch of job postings and resumes online with keywords and parameters like years of experience should do the trick, right? Turns out - no, not in the slightest. I used these sites a few times in hiring for the first company I founded, and they were borderline useless. Turns out that making a hacker <-> company match is way harder than just "you need code, I need a paycheck, let's connect."

This is a nearly identical problem to the one faced by dating sites. Again, you've got millions of people out there who want to make some sort of romantic connection with another person. But the parameters aren't as straightforward as it seems. Posting an ad on a dating site which states "I am a heterosexual male, seeking a heterosexual female as a mate" probably won't get you a lot of response. Even though there are in fact millions of heterosexual females out there looking for a heterosexual male mate.

You might think that this just a matter of needing more parameters. For dating, that's stuff like smoker or non, drinker or non, age, photos, hobbies, and favorite movies. For hiring, it's skills (languages and tools), years of experience, and keywords like "self-managing" or "enterprise" or "agile." This stuff certainly helps, but it's not enough - not enough by far. A dating site may make what seems like a perfect match, but more often than not, no sparks fly when the people meet face-to-face. A recruiting site can also fit based on quantitative parameters, but then the moment you sit down to start the interview, you discover something like that the energy level of the candidate's personality is a total mismatch with that of the company's engineering team.

In dating, we call this bit of magic "chemistry." If you don't have chemistry with someone, it doesn't matter how many hobbies you share. In hiring, we call this magic bit "culture." If the hacker's culture doesn't match with his employer, it doesn't matter whether the hacker's years of experience with a certain technology and desired salary are a perfect match with employer's open position.

So what is the next-generation solution? Recruiters, in my opinion, are nearly worthless. I've experienced using them on both sides and while they do occasionally make a good match, as near as I can tell that's usually blind luck. Yet they get paid a huge amount - $10k - $20k/yr for the length of the employment is common. I don't think that the value they provide is really on par with this price; and I think both employer and employee would be much happier if that money could go into the employee's salary instead of to the recruiter. This particular avenue seems like a dead-end to me.

So how about a next-generation technology solution? There's Catch the Best, which is a tool to help manage the screening process. And there are two startups in my Y Combinator session, Snaptalent and Joberator. Snaptalent seems to understand the importance of culture, and have a lot of features like embedded video to try to help convey culture - like in this job listing for Anybots. They are also not offering a generalized search solution, but instead only expose the ads on targeted sites - blogs related to the particular industry or area of interest of the available position.

This only addresses one side of the problem, though. The other problem is that top people are never actually out looking for new employment. (This insight comes by way of Joel Spolsky, in a rare recent moment of relative lucidity.) Using myself as a data point, this seems correct: I've never once been on the job market. (Although since most of my career has been as a founder, I may not be representative.) Back in my employee days, I never really went looking for a job. I'd just start to get disillusioned with, or bored of, my current job. Then a friend or former co-worker would get hired someplace else and talk me into coming with them. So it was a matter of being solicited at exactly the right time.

This is why recruiters have the traction they do: they go out and bug people who, 99% of the time, are annoyed at being bugged. But 1% of the time it spurs them into action, even though they may not have been quite at the point of wanting to go start scanning job ads yet. Surely we can come up with a recruiting process which reduces that wasteful 99%.

RubyGems 1.2

June 23, 2008 at 01:38 PM

The new RubyGems doesn't update the index every time, so gems install very quickly. Add the —no-rdoc —no-ri options and they install instantly. Excellent.

Comments: 0 (view/add your own) Tags: ruby

RestClient 0.5

June 21, 2008 at 02:56 PM

gem install rest-client for new features:

  • SSL support
  • User/password embedded in the url (e.g. https://joe:mypass@example.com)
  • Subresource nesting with [] syntax (e.g. site['posts/1/comments'].get) (more examples)
  • Better exception classes with access to the response object and more readable output in irb

RDocs.

Thanks to Pedro Belo and Ardekantur, who contributed most of the new code in this release.

Rack, and Why It Matters

June 19, 2008 at 05:37 PM

Rack is one of the most important developments in the Ruby web space in the past year. I suspect it's been slow to get attention because the benefits are a bit subtle. Witness the Rails core team being confused about Rack just a few months ago. So if you don't get what the deal is with Rack, don't feel bad - you're in good company.

James covered Rack in his Railsconf talk, partially at my insistence. (His talk was about Mongrel handlers, but Rack middleware is a newer and better way to achieve the same end.) It's worth noting that he asked the crowd - a couple hundred Rubyists - whether they had heard of Rack, and almost every single hand went up. But when he asked if they knew what it was for, not a single hand was raised.

So what's the deal with Rack? In short, Rack provides a standard interface between the web app server and the app framework. This is useful in light of the multiplying number of web app servers (Webrick, Mongrel, Thin, Ebb...) and frameworks (Rails, Merb, Sinatra, Ramaze...). A standard not only reduces the amount of code the framework authors have to write, but it makes the layers in the stack more pluggable. Pluggability encourages experimentation (which means more innovation over time), and generally makes the whole stack more robust.

One implication of Rack is that you can skip the app framework altogether. I've always liked using use standalone Mongrels running tiny Ruby apps without a framework for internal daemons. These days, I generally use Sinatra for that purpose - but there's still something cool about skipping the use of any framework and just coding down to the metal.

Want to try it out yourself? Stick this code into hello.ru:

class HelloHandler
   def call(env)
      [ 200, { 'Content-type' => 'text/plain' }, 'hello, world' ]
   end
end

run HelloHandler.new

Then at the shell:

rackup hello.ru &
curl http://localhost:9292/

Congratulations, you've just made a frameworkless Ruby web application in five lines of code.

A Rack handler is anything that can respond to the call method and returns an array with the status code, output headers, and output body. Handlers can be the end of the request chain, or do input and output filtering anywhere in the middle (hence "middleware"). Here's an example from Marc-André Cournoyer. Though his example is presented for Thin, you can run his code on any Rack-compatible server.

In the real world, what is Rack middleware useful for? We recently ported the Heroku toolbar to Rack middleware. The previous implementation was several hundred lines of very hard-to-follow monkeypatching of ActionController, combined with a rarely-used and poorly-maintained plugin framework for Mongrel call GemPlugins. (Which I nominate for Most Confusing Name Ever.) That code was hard to read and nearly impossible to spec, but it's the only way we could make it work with the traditional Mongrel/Rails setup. It was also very tightly coupled to a particular version of Rails and Mongrel.

Ricardo (one of the new Heroku devs) banged out the Rack middle port in just a couple of days. It's a fraction the number of lines of code, and can be speced normally. Plus, our toolbar is now compatible with any Ruby app server or framework.

Because Rack separates the layers of the stack more cleanly, it was way easier to hook the Heroku toolbar code into the right place. Take that lesson and generalize it, and you'll start to glimpse the significance of Rack.

The dreaded "wedged Mongrel" - your app server stuck on one request, with others piling up, waiting infinitely for it to come free - is a problem all production Rails apps face sooner or later. The solution most commonly used is to restart the app servers frequently, via something like Monit, or just on a cron job.

But such solutions are just a band-aids which hide the real problem, which is that your code is getting stuck in an infinite loop, or waiting on an IO request which never returns. A better solution is to wrap all your actions in a timeout:

class ApplicationController < ActionController::Base
  around_filter :timeout

  def timeout
    require 'timeout'
    Timeout.timeout(30) do
      yield
    end
  end
end

This prevents the wedged app server. And combined with an exception notifier, you'll be able to see which requests are getting wedged, so that you can fix your code. (Periodic app server restarts are still needed to combat memory leakage - another problem entirely.)

I'm surprised that request timeouts aren't a standard part of web frameworks like Rails, application servers like Mongrel, or both. (If you've seen the "timeout" parameter for Thin or Mongrel, don't be fooled - it's not that kind of timeout.) After all, web requests aren't supposed to be long-lasting. Nginx or Apache will time out the request after 90 seconds or so anyway, but this doesn't stop your app server from grinding away infinitely on the request.

But there's a catch with Timeout. It uses Ruby threads, which only works as long as it's Ruby code that's getting stuck or taking too long. The second case - a system call that's getting stuck - is often the problem. So this will time out:

Timeout.timeout(3) do
  sleep 4
  puts 'done'
end

...but this will not:

Timeout.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Good unix jockeys know that SIGALRM is the correct solution here. Back in my MUD days I encountered this technique in the CircleMUD server: it would detect infinite loops and abort with a log message, allowing the game to continue running. "Wow," I said the first time I saw it in action. "How does it know?" That's the magic of SIGALRM.

Philippe Hanrigou and David Vollbracht have implemented a SIGALRM solution for Ruby in the form of SystemTimer. (They also give a great description of green threads and why they don't play well with the underlying OS.) This is a nearly drop-in replacement for Timeout. Try it:

SystemTimer.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Woot! So now, your final solution for preventing wedged app servers in production:

class ApplicationController
  around_filter :timeout

  def timeout
    require 'system_timer'
    SystemTimer.timeout(30) do
      yield
    end
  end
end
Comments: 7 (view/add your own) Tags: rails, unix

Cloud Computing Taxonomy

June 16, 2008 at 04:28 PM

Most agree that cloud computing is the Next Big Thing, but beyond that things get murky. Being such a new space means that there's not yet a consensus on what all the pieces are, and how they fit together.

Michael Crandell gives a good descrption of what he considers to be the three tiers of cloud computing: apps, platforms, and infrastructure. (He correctly puts Heroku on the middle tier.)

It gets a bit harder as you try to subdivide each tier. This diagram is one I draw a lot these days, but it's a little different each time, as we continue to discover new challenges about our slice of the cloud computing pie.

Quickstart to Hacking Rubinius

June 12, 2008 at 01:38 AM

I recently tried my hand at hacking on Rubinius. Here's a rough description of what I did - following this same pattern should let any skilled Ruby developer be contributing patches in no time.

First:

git clone git://git.rubini.us/code rubinius
cd rubinius
rake spec:update
rake build

This will take a while to finish. Once it's built, you can run Ruby scripts by using shotgun/rubinius [filename], instead of ruby [filename]. You can also run it with no arguments to get an interactive shell (i.e., irb). Put this alias into your bashrc:

alias rbx=/home/adam/rubinius/shotgun/rubinius

...and then you can run rbx instead of ruby from anywhere.

Install gems:

rbx gem install [gemname]

I kicked things off by running the specs on my own open source Ruby libraries, once I had installed the rspec and other dependency gems. RestClient passed fine, so I moved on to rush, which is much more complex. Running individual specs, e.g.:

rbx rush/spec/dir_spec.rb

...I soon found some that broke. Since I know these specs pass normally, this served as my launchpoint for how I could improve Rubinius.

Rubinius was throwing an exception in popen, which was only partially implemented. But before trying to fix it, the first step was to write a spec. Rubinius uses mspec, which an RSpec-style specing library. I ran the popen spec like this:

bin/mspec spec/ruby/1.8/core/io/popen_spec.rb

This runs it on Rubinius. To run it on MRI (aka Ruby 1.8), use:

bin/mspec -t ruby spec/ruby/1.8/core/io/popen_spec.rb

Write the spec to pass on MRI first. Then check to see if it passes on Rubinius. If not, you'll need to tag it as failing. Here's my first committed spec, and notice that the second file is popen_tags.txt, which marks the read/write pipe as failing.

All specs should pass on MRI at all times. But it's entirely reasonable to write a spec that passes MRI and fails on Rubinius - this makes evident a missing feature in Rubinius.

Once I had the spec which passed on MRI but failed on Rubinius, I could now turn my attention to making the spec pass. The code I tinkered with was in kernel/core/io.rb. It relies on a primitive create_pipe (via IO.pipe), which is actually a very simple C function that can be found in shotgun/lib/primitives.rb.

In the process of working on the main issue you will most likely discover small ones. Take the opportunity to write a spec, and fix Rubinius. But if you can only do one or the other (write the spec, or fix a failing spec), that's great too. This two-phase process (expose the problem, then fix it) works extremely well for mapping out the complex problem space of writing a language VM, particularly in trying to track an implementation that has no formal specification.

One thing that triped me up was that you must run rake to rebuild after any code change - even if all the code you changed was Ruby. I'm not used to having a compile step on an interpreted language, but once I got into the habit it was easy to remember.

For submitting the patch, put it into a pastie and post it on Lighthouse. Most developer communication happens in the IRC channel, so you may want to start by posting your pastie there and soliciting feedback.

I was really impressed by the receiptiveness of the Rubinius developers to new hackers. Even while critiquing my patches on IRC, they took every opportunity to let me know how much they appreciated my contribution. From what I've seen, these guys are setting a new bar on making a low barier of entry for people to jump in and contribute. I hope to see other open source projects follow their good example.

Further reading:

yaml_db and heroku-client in Github

June 11, 2008 at 02:40 AM

Not big news, but a few folks have bugged me recently to make it easier to contribute patches to some of our gems and plugins. So here you are: the Heroku client gem and yaml_db.

Sinatra, My New Favorite Microframework

June 10, 2008 at 02:19 PM

A few months ago, I went in search of a way to build an extremely lightweight Ruby web app. Merb can be stripped down pretty far, but I wanted a true microframework. Ramaze and Camping were getting close, but didn't quite fit my taste. Then I discovered Sinatra.

Sinatra apps are typically written in a single file. It starts up and shuts down nearly instantaneously. It doesn't use much memory and it serves requests very quickly. But, it also offers nearly every major feature you expect from a full web framework: RESTful resources, templating (ERB, Haml/Sass, and Builder), mime types, file streaming, etags, development/production mode, exception rendering. It's fully testable with your choice of test or spec framework. It's multithreaded by default, though you can pass an option to wrap actions in a mutex. You can add in a database by requiring ActiveRecord or DataMapper. And it uses Rack, running on Mongrel by default.

One of the most important backend services for Heroku is written using Sinatra. We're now running several hundred instances of it in our cluster. It's performed like a champ - I haven't seen it die or leak memory, other than bugs in our app code.

Some interesting (though not necessarily meaningful) stats.

Lines of framework code (not counting tests or examples)

Rails87,990
Merb-core12,417
Ramaze11,796
Camping1,704
Sinatra1,576

require 'sinatra' pulls in just one file: sinatra.rb. Now that's a commitment to small.

How about memory footprint? Camping takes the crown here, but Sinatra doesn't do too shabby:

Memory footprint of an empty application

Rails52MB
Merb-core25MB
Ramaze18MB
Sinatra16MB
Camping7MB

Aside: I got these numbers using the Linux free command before and after starting the server. Gauging real memory usage is very difficult because of shared pages, but free is much better than the VSZ/RSZ silliness you see in ps, which don't tell you very much.

But my real joy in Sinatra is its minimalist simplicity. The direct mapping of URLs to code (routes? who needs 'em?), the incredible ease of writing tests, and even just the simple fact that the return value from an action is its output. For example, an action might look like:

get '/posts/:id.xml' do
  Post.find(params[:id]).to_xml
end

And a matching test:

should 'get a post in xml format' do
  Post.expects(:find).with('123')
  get_it '/posts/123.xml'
end

When I return to writing Rails apps after working on Sinatra for a while, I sometimes find myself thinking: "wait, what did I need all this other crap for again?"

Comments: 3 (view/add your own) Tags: sinatra

Railsconf Wrapup

June 08, 2008 at 02:54 PM

Whew, I think my brain has finally returned from its liquified state after Railsconf. Last year the conference felt like a vacation, since all I did was attend. This year, with all the speaking and booth-manning and meetings, it was pretty grueling.

Despite that, it was still plenty of fun. For example, due to some capacity issues at the convention hall, my talk got moved into the keynote room. So I got to feel like a big shot for an hour up on the big stage with all the banners and lights. :)

PICT0034

Tim Goh posted an excellent blow-by-blow of my talk. Apparently he manually keyed in all the code from my slides during the presentation, which is damn impressive because I was going through them pretty quickly!

We answered about a zillion questions about Heroku at our booth (not to mention just people stopping us in the hallway). This involved lots of waving our arms around and drawing on our plexiglass wipeboard.

PICT0007 1

(A few more photos.)

Talking to so many people about what we're doing brought a few things into focus for me. The main one is that not many people really get what we're doing. Questions I answered over and over: No, we're not a web-based IDE. No, we're not a reseller of EC2. No, we're not a competitor of Engine Yard.

My best answer to the "what is Heroku?" question is as follows: Heroku is a automated deployment platform for Rails. "Automated" means you don't think about server stuff at all: just load up your code and go. "Deployment platform" means a place to run your app - which could be while you're developing on it, could be a staging/prototype/demo-for-the-client deployment, or could be a full production deployment.

Going forward, the guys and I are going to be thinking about ways to make our message clearer. I guess that's one of the challenges of using a blue ocean business strategy: when you're pioneering a completely new space, explanations are hard.

Thanks to the conference organizers for another great event; to everyone who came to my talk; to everyone who approached us about partnership opportunities; and to all the excellent speakers. And to all the new friends I made, I hope to see you again soon (maybe at Rubyfringe?).

Railsconf Slides

June 03, 2008 at 10:19 PM

Here's my slides from Railsconf. More thoughts to come once I recover a bit more - it was pretty intense for my partners and I this year, what with doing three talks and manning a booth.

Someone in the session asked to see get_logged_in_user() (called from the code shown in slide 36). Here it is, in all its C string-manipulation glory:

void get_logged_in_user(ngx_http_request_t *r, u_char *user, int user_size)
{
   ngx_table_elt_t **cookies;
   ngx_table_elt_t *elt;
   char cookie[256] = "";
   int i;

   cookies = r->headers_in.cookies.elts;
   for (i = 0; i < (int)r->headers_in.cookies.nelts; i++)
   {
      elt = cookies[i];
      if (extract_and_overwrite_cookie((char *)elt->value.data, "heroku_session=", cookie, sizeof(cookie)))
         break;
   }

   if (cookie[0] != 0)
      find_user_by_cookie(cookie, (char *)user, user_size);
}

void find_user_by_cookie(const char *cookie, char *email, int size)
{
   char sql[256], scratch[128];
   snprintf(sql, sizeof(sql)-1,
      "SELECT username FROM sessions WHERE cookie='%s'",
      pg_escape(cookie, scratch, sizeof(scratch)));

   pg_select_one_string(sql, email, size);
}

Also, one correction: I incorrectly stated that redirect() was an Nginx function. It's actually a helper function I created; here's the code.

void redirect(ngx_http_request_t *r, char *url)
{
   location = ngx_palloc(r->pool, strlen(url));
   r->headers_out.location = ngx_palloc(r->pool, sizeof(ngx_table_elt_t));

   ngx_copy(location, url, strlen(url));

   r->headers_out.location->value.data = location;
   r->headers_out.location->value.len = strlen(url);      
   r->headers_out.content_length_n = 0;
   r->header_only = 1;
   r->keepalive = 0;
}

If you use this, make sure to return NGX_HTTP_MOVED_TEMPORARILY immediately after calling it, as shown in the slides.

Railsconf

May 27, 2008 at 02:06 PM

Places you'll find me at Railsconf this year:

  • Giving my talk about custom Nginx modules on Saturday afternoon. The talk has evolved quite a bit since I wrote the description, so expect some broader topics, like why I think HTTP is the critical enabling protocol in the era of Rails and cloud computing.
  • Attending the Heroku product talk, in which we propose why you may never need to think about servers or hosting again. This is inconveniently scheduled immediately before my session talk, so I'll have to duck out partway through.
  • Signing books along with the other recipe contributors to Mike Clark's Advanced Rails Recipes at the Powell's Books booth at the lunch break on Friday.
  • Hanging around our booth, where I intend to hack on Heroku and my open source projects, listen in on Geoffrey Grosenbach's podcast interviews, and meet everyone that stops by. So... stop by! :)

Advanced Rails Recipes

May 19, 2008 at 07:42 PM

Advanced Rails Recipes by Mike Clark is now shipping. I'm pleased to have contributed to the chapter on nested resources.

Most of the staple Rails books - like Agile Web Development and the original Rails Recipes - are a bit out of date now, so it's nice to see a new one. Advanced Rails Recipes is full of useful goodies, some of which are what I consider basics (authentication, restful resources, foreign keys) but quite a lot of which live up to the "advanced" label (dtrace, custom rspec matchers, process monitoring). Good stuff.

More Git Techniques

May 15, 2008 at 05:28 PM

In the spirit of Graeme Mathieson's git techniques, here are a few of my favorites, with their svn equivalents for reference.

Restore a file to repository version
  svn: rm file; svn up file
  git: rm file; git checkout file

See what commits would be pulled on an update
  svn: svn stat -u
  git: git fetch; git log HEAD..origin/master

See what code changes would be pulled on an update
  svn: svn diff -rHEAD
  git: git fetch; git diff HEAD..origin/master

Grab one commit without any of the commits around it:
  svn: svn diff -rN1:N2 > my.patch; scp my.patch other.server; ssh other.server "patch < my.patch"
  git: git fetch; git cherry-pick [commit-hash]

Revert a commit
  svn: svn merge -rN2:N1; svn commit -m "reverted commit N2"
  git: git revert [commit-hash]

Set some changes aside to work on something else
   svn: cd ..; mv myproj myproj_stash; svn co svn+ssh://server/myproj
   git: git stash

Discard local changes
  svn: svn revert -R .
  git: git reset —hard HEAD

Edit recent commits
  svn: echo "Oops."
  git: git rebase -i HEAD~5
Comments: 3 (view/add your own) Tags: git

Firefox REST Plugin

May 12, 2008 at 03:02 PM

While we wait for web broswers to become fully REST-capable, Poster is a handy Firefox plugin for sending any type of HTTP request, including all four verbs and different content types. I usually use rest-client at a rush shell for one-offs; but if you need your browser's cookies for a call that can't be authenticated with http basic auth, or you just want a dialog that shows all the options visually, Poster is quite handy.

Comments: 0 (view/add your own) Tags: rest

Don't Build the Super Nifty Node System

May 08, 2008 at 03:26 PM

Programmers tend to overdesign. Rather than building a quick solution that works right now for the specific case, we want to build one that will solve all problems of that type, both now and for all time. Dreaming In Code shows a particularly egregious case of this: the developers spend years building a framework for the app, rather than the app itself. This type of story is quite common. So frequent is this pitfall that the agile methodology mantra for avoiding it if often referred to by its abbreviation: YAGNI (You Aren't Going To Need It).

On the opposite extreme, there's the quick-and-dirty hack. But this has its own problems: it's fragile and inflexible. (It's telling that Microsoft's first pass at an operating system was named QDOS, where the QD stood for Quick-and-Dirty.) Quick hacks don't lend themselves to being built upon. Contrived example: a method called sumtwoand_two that returns the constant four would not be nearly as useful as a method called sum that takes two integer arguments and returns their sum. Generalized solutions are important - hell, they're what software is about.

So where is the right balance? My parter Orion puts it well: "You want to build an architecture that will be somewhat flexible, but not infinitely so."

Here's an anti-example: at our first venture together, we needed some customer relationship management software which the whole company could access. (This was eight years ago, and there weren't any suitable web-based commercial or open source choices that we knew of.) We ended up writing something called the Super Nifty Node System. It didn't track companies, people, and leads, like you might expect. Instead it had a single object type - a Node - and the users could define node types and relationships between them. The idea was that we were building a system which was so flexible that we wouldn't need to do any programming when someone wanted to track something new. We thought we were solving both the original problem and many related problems, and would never need to touch the software again.

In practice, this worked out poorly. Our non-technical users didn't understand how to create new node types, and didn't really want to anyway. Things like getting the fields to go in a certain order was really difficult, leading to a lot of user frustration on entering addresses. Worse was that simple programmatic tasks we wanted to perform - like pulling out a list of email addresses for all our partner companies, or a list of all our customers who had been with us for a year or more - required SQL statements with fifteen complex joins that wrapped around the screen six times. In retrospect, we should have just made tables for companies, people, and leads - even though there would have been a fair bit of duplicated code, this being in the pre-Rails (and generally pre-framework) era.

Many people have asked us why we didn't build Heroku to support multiple frameworks and languages. Why not Heroku for Python/Django, my second favorite language/framework combo after Ruby/Rails? Why not for PHP and some of the excellent MVC frameworks that exist for it? Why limit ourselves, and our potential audience, by building to the more specific case?

Were I building Heroku earlier in my career, I might have designed it with this in mind. I might have created an abstract base class App, and from that inherited RailsApp. Or perhaps App hasone :framework and hasone :language, and then Framework and Language are abstract classes that can be inherited by Framework::Rails, Framework::Django, Language::Ruby, and Language::Python. The Django and Python classes would have sat empty for months or years as we put our energy into developing for Ruby and Rails.

And there's a cost to leaving that infrastructure laying around. Just because you're not actively developing on a particular part of the codebase doesn't mean its maintenance-free: you've got more abstractions to keep in your head, more training time for new members on the team, more specs to keep running. Even just the extra files hanging around in the code tree adds a tiny bit of overhead for your brain each time you do a directory listing or otherwise manage the code.

So don't start generalizing until you have a strong need for it. Generalizing too early is the death of many a project - almost as often as generalizing too late.

A Better Daemonize

May 07, 2008 at 02:16 AM

Mongrel, Thin, and every other web application server I've ever used all suffer from a similar deficiency: they daemonize too early. That is, they daemonize prior to trying to boot your app, which means any error - even a really obvious, immediate boot problem - will silently feed into the log as the process dies, without so much as a peep on the command line.

Try this:

$ mkdir nothing
$ cd nothing
$ thin start -d && echo success

(Substitute "mongrel_rails" for "thin" here if you want, the result is identical.)

Wait, what? There's not even anything there. Why does it return true? Early daemonization, that's why.

A better approach would be to boot the app, and once it's online and listening and ready to serve requests, then return.

Comments: 0 (view/add your own) Tags: ruby, thin

Rocking the Mocking

May 02, 2008 at 02:12 PM

How do you write a spec for this method without touching the filesystem or the user's environment?

def authkey
  File.read("#{ENV['HOME']}/.ssh/id_rsa.pub")
end

Just repeat this mantra to yourself: It's Ruby. Everything Is An Object Or A Method. Objects And Methods Are Always Mutable.

Got your answer yet? Here's mine:

it "reads the ssh rsa key from the user's home directory" do
   ENV.should_receive(:[]).with('HOME').and_return('/home/joe')
   File.should_receive(:read).with('/home/joe/.ssh/id_rsa.pub').and_return('the key')
   @client.authkey.should == 'the key'
end
Comments: 1 (view/add your own) Tags: bdd

What Defines the Ruby Community?

April 29, 2008 at 10:18 PM

A friend of mine, who is a Ruby developer but a little less immersed in the Ruby culture than I, was recently boggling at the excitement around the new VMs (Rubinius, JRuby...) and new frameworks (Merb, Sinatra...). "Wait," he said. "They're replacing Ruby, and they're replacing Rails. So what the heck defines the Ruby on Rails community, if both Ruby and Rails are replaceable?"

Good question. Peter Cooper wrote "'Ruby' is starting to represent both a community and a language 'ideal' rather than just a single, well-defined programming language." I agree. But what are the specific traits that bind the Ruby community together?

I suspect that there will be surprising divergence of opinion on this subject. But here's my answer, in the form of a two characteristics that I believe all Rubyists share.

First, Rubyists love elegance. We want to solve problems in a simple and elegant fashion. Most programming languages and software infrastructure feels like the inside of a industrial revolution-era factory: it gets the job done, but it sure ain't pretty. Rubyists create things that have the minimalist and pleasing aesthetic of a haiku or a Zen garden. We are so committed to elegance that given the choice between an inelegant solution and none at all, we typically choose the latter.

The second, and more subtle point: Rubyists are dynamists. We have a deep understanding of the infinite series of technological progress: each stage of advancement building on the next. There is no such thing as perfection: anything and everything can be improved upon. In this, we are not afraid to swap out any component with a superior replacement. Apache giving way to Nginx, Subversion giving way to Git, Prototype giving way to JQuery, Mongrel giving way to Thin, Test::Unit giving way to RSpec. Even our most fundamental foundation components - Ruby and Rails - are not safe, if someone can build better replacements.

Comments: 0 (view/add your own) Tags: ruby

Curators

April 24, 2008 at 04:45 PM

During a presentation I gave last month, someone asked what Heroku plans to do about the many different versions of Rails and Ruby, not to mention choices on components like database (MySQL, PostgreSQL, SQLite) or app server (Mongrel, Thin, Ebb).

I answered that this is part of the benefit you get by using Heroku. (In corporate-speak, part of our "value-add.") We select the best components, from the server infrastructure and operating system up to the version of Rails and preinstalled gems and plugins. We make sure it all works together in a complete package, so that you can just dive in and start coding, without worrying about making a bunch of choices before you even begin.

Another company that works this way is Apple. Apple doesn't make motherboards or video chipsets or drives. What they do is select all of the best OEM components and put them together into a beautiful package where all the parts are guaranteed to work seamlessly together. You never worry about your monitor not supporting the sync rate your video card wants to use, or whether you'll have drivers for your DVD writer, like you would with a build-it-yourself PC.

A similar metaphor, but in the software world, is the makers of Linux distributions. This is actually quite close to what we're doing: bringing together lots of open source components into a unified bundle, where all the pieces have been tested as playing nicely together. Given the fast pace of software development and the huge variety of different pacakges, this is quite a challenge - and an ongoing one.

Todd Hoff did a writeup on my presentation, and he mentions the term "curator" to describe companies or people that fill the sort of role I've just described. I think that's a pretty good term - Apple, Linux distributions, and Heroku all fit into this category.

My current favorite distros are Debian on the server and Ubuntu on the desktop, and they both do a great job at this collector role. But I think it's worth looking at another player, Red Hat, to see how this role can be done both right and wrong.

A few years ago, Red Hat was the distro to beat. Red Hat 7, 8, and 9 were all stellar releases. Then they split their distro into two forks: Red Hat Enterprise and Fedora. This seemed like a logical step: provide a slow-moving, highly stable version for production server environments; and a more cutting-edge version for desktops and personal or staging servers.

But this turned out to be a disaster for their market share: nearly overnight, Red Hat went from the most popular distro by far to a distant third. (The move may have been good for their revenue or profitability, I can't speak to that.)

Understanding why this happens helps us understand the role of a good curator. Having a single distro meant that Red Hat had to balance pushing the technology envelope against keeping things stable. In those days they did a great job at finding the sweet spot. Many people complained when they introduced major new infrastructure like glibc2 or UTF8, but in my opinion their timing was exactly right. Sure, it caused some transitional pain for users - mostly proprietary binaries like Adobe Acrobat and Yahoo Messenger breaking - but the longer-term benefit of the technology being introduced, such as being able to easily internationalize all applications, were huge.

But once they split the distro into RHEL and Fedora, they no longer had to maintain this balance. Now RHEL is constantly out of date - causing users to seek out third-party packages or compile things themselves, which defeats the whole purpose of stability. And Fedora is bleeding edge instead of cutting edge, making it painful and unpleasant to work with for anything but pure tinkering.

You saw similar user pain when Apple switched from OS 9 to OS X, or from PPC to x86. Users cried about this stuff - a LOT. (As with UTF8 on Linux, Adobe apps seemed to be one of the centerpoints of user pain.) But once the initial shock wore off - which was pretty quickly - everyone was a lot better off. Can you imagine what Apple's place in the technology industry would be today if they had stuck with OS 9 and PPC?

The curator role is surprisingly challenging to do right. I hope we can learn from the lessons of these companies, and others that curate successfully, as we continue to develop Heroku.

Video Quagmire

April 23, 2008 at 11:45 PM

2007 may have been the year of video on the internet, but support for video technology in software is still pretty poor. The emergence of Flash players enabled the video explosion, sure. But Flash is still proprietary and has a bunch of other problems as well, not the least of which is that it's not strictly a video format, but rather an entire programming environment which happens to support video.

Video is primarily such a mess because of codecs. I was first exposed to the codec madness circa 1999, when people in my office shared videos via a shared drive. I was using Linux exclusively (this was pre-OS X, remember), so this meant I could very rarely view them. I always assumed that OS 9 and Windows users didn't have these problems. In envisioned Mac and Windows users living in utopia of video, where they just clicked on any file and it opened and started playing.

Then I had occasion to use a friend's Windows computer to try to view video. Nope! Same codec problems. Then later I got a Mac. Bzzt! Some videos would play, others wouldn't. Windows, Mac, and Linux all had the same problem: fail to play the video, with an extremely unfriendly and unhelpful error dialog. (Not one of them said "You don't have the right codec." They all read something like, "Error code: -29")

Things are a bit better today. Actually I have no idea about Windows, but on Mac you can get a WMV plugin for Quicktime, and Ubuntu and other Linux distros prompt you to download reverse-engineered codecs from outside the US when you click on a video format that isn't recognized. Still, in neither case does it the video just play without any hassle. Linux is the easiest, but there's still a bunch of dialogs to click through. And this doesn't even begin to touch the problem of authoring: just look at the codec dropdown in iShowU:

How does one make an informed decision about this? I don't think most people do. I kept trying formats at random until I got one that 1. wasn't horribly grainy, 2. didn't produce massively large files, and 3. played on Quicktime and VLC without issues. This feels like the dark ages of digital video, not the revolution.

Ogg Theora could be the solution. It's a free and open standard, and could open up the creation and distribution of video in the same way that HTML did for documents.

As usual, though, the problem is default support. No one wants to install extra software to play video; but the two biggest operating system vendors (Microsoft and Apple) have their own proprietary formats (Windows Media and Quicktime) that they want to defend. So that's not going to happen. (Ubuntu plays Theora out of the box, I hardly need mention.)

At a recent Super Happy Dev House, I met someone who was working on Theora support for MediaWiki. He mentioned that Firefox 3 was going to support Ogg Theora out of the box, which I found very exciting. It would be as simple as a <video> tag in your html. Awesome.

Unfortunately, it looks like Ogg has been dropped from the latest HTML 5 specification, and from what I can tell FF3 is not going to have support for Theora (though it may offer the video tag, without any default codecs). Unawesome.

Still, all of this seems gives me some hope that there is a path out of this horrible quagmire that video technology is currently stuck in. We're not quite on the path yet, but at least it's there - waiting for when the world is ready for video stop sucking.

Comments: 4 (view/add your own) Tags: desktop, ux

The Startup Curve

April 23, 2008 at 11:08 PM

PG drew this on the whiteboard at the last dinner of our Y Combinator session:

So, so true.

Why No Love For RSpec?

April 22, 2008 at 08:11 PM

It's no secret that I'm a big RSpec fan. Test::Unit feels pretty outdated these days, and none of the other frameworks can yet match the level of BDD goodness you get from RSpec. Throw in that it's now a mature and stable library, and it seems like a sure bet for all your Ruby specing (or testing, if you like) needs.

It's thus surprising to me that some Rubyists seem reluctant to use RSpec. Certainly, some of this comes from the fact that it's not a standard include with Ruby or Rails, unlike Test::Unit. In this way it faces the same battle that Haml, DataMapper, Thin, or any other add-on library that swaps out a substantial component of the framework stack: the user must actively make the choice. Defaults are the status quo, the incumbent; they win, well, by default.

But I sense that people's mistrust of RSpec extends further than what these other components face. My guess is that the reasons for this are: too big, too complicated, and too much magic.

The first concern is a reasonable one - the plugins are pretty large, and especially since it's common to install it as a plugin rather than a gem, this can seem to bog down your source repo. The too big issue is a bit of an illusion, as Rick DeNatale explains.

The too complicated and too much magic problems can somewhat be addressed by using a subsest of the matcher library, as I suggested with minimalist RSpec matchers. It occurs to me that all of these problems could be solved with a lightweight implementation of RSpec which implements just the core syntax. That is:

describe MyClass do
  before do
    @my_obj = MyClass.new
  end

  it "sums two numbers" do
    MyClass.sum(1, 2).should == 3
  end

  it "raises an error when arguments are not integers" do
    lambda { MyClass.sum(1, 'x') }.should raise_error(ArgumentError)
  end
end

If someone wrote an leaner RSpec-alike library which ran the above spec correctly, I'd probably switch. (The example excludes RSpec's built-in specs and mocks, but I'd be ok using Mocha instead, which is very similar.) Maybe mSpec is such a thing, though I'm still kind of confused as to what it is exactly, since the readme claims you should still run the specs using RSpec.

A final reason why many people don't understand the importance of RSpec is simply not fully drinking the BDD koolaid. If you find yourself thinking things like "Well, yeah, BDD is a good idea, I hope to find the time to do it more often...", then I count you in that category.

I was in that position not too long ago, so don't feel bad. But it was using RSpec that caused the lightbulb to turn on above my head. Pat Maddox put it well in a mailing list post:

"I would say that TDD is a tool to help you solve the problem of designing and implementing behavior. Test::Unit works fine in that regard, but RSpec reduces the semantic distance between the developer and the problem domain."

The good news is, despite the sense of reluctance many display toward RSpec, it actually is catching on - even becoming the standard. Merb uses it by default out of the box. A recent Rails Envy podcast mentioned that their informal poll showed RSpec as the most popular testing/specing framework with 62% of the vote. (Test::Unit got around 25%, Shoulda around 12%.) The hosts expressed surprise at this. I was surprised too - pleasantly so.

But perhaps most important is that the Rubinius project uses RSpec, and has spawned a spinoff project for an executable specification of the Ruby language, which is being adopted most of the the major Ruby VM implementors (MRI/Yarv, JRuby, IronRuby,