Proof-Driven Development
January 28, 2008 at 02:15 PM
In developing the early implementation of Heroku, I stumbled upon an interesting design/development technique: numerous throwaway proof-of-concepts for potential features. Heroku is an unusual product, and many of the features we envisioned were things we weren't even sure could be accomplished in a reasonable way with current browser technology. Writing these proofs let us explore the problem space without committing to a single product direction or internal architecture too early.
Each proof was something I threw together in a day or two, usually a single evening. I always made them as standalone project, showing the feature being proofed in isolation. One example I posted publicly was the proof of the javascript code editor.
You might argue that this is nothing new - they're called prototypes or tracer rounds. Although it does have much in common with prototype-driven development, proof-driven development actually comes in from the opposite side. Prototypes are supposed to be skeletal demonstrations of the entire application - as Dave Thomas puts it, to "show people how it's going to hang together." The hard implementation details are left as stubs; it's about the big picture, not details.
Proof-driven development is about the details. You prove that a specific feature can be implemented well, and that it will deliver useful value. It's a vertical slice of the application, instead of a horizontal one.
There are a few interesting effects that emerge from working this way. First, since you're starting from scratch each time, it means you're not constrained by any existing code or design. This gives a lot more room for creativity and experimentation. Normally you decide on a big-picture architecture up front, and then force features to fit into that architecture as they are built. Proof-driven development allows you to try out a bunch of cool ideas without worrying how they fit together. Then, you can select your favorites from the list, and think about what kind of high-level architecture would tie them all together.
Another effect of this approach is that it prevents you from getting to attached to any one idea or design choice. When working in an agile environment, developers often throw in experimental features somewhat on a whim to see how they'll work. That's great - often your best features come from this sort of spontaneous inspiration. But other times - probably most of the time - it adds to feature and code clutter, without adding much value. The problem is, once they are in the codebase, they tend to stay.
But since proofs are developed as standalone apps, you have to make a conscious choice to bring that code into your production app. The manual work of cut-and-paste as enough to deter you from bringing in some code unless it's really valuable. Which is a highly desirable effect, since it keeps your main codebase streamlined.
So now when our team is kicking around ideas for new components, the resolution on many points is "ok sounds cool, write a proof." The chief proponent for the idea can then go put together a standalone proof, spending at most a couple of hours, to demonstrate the value of the idea. When they're done it'll either be obvious that the new feature is really kick-ass and deserves to be included; or, perhaps it won't be too hot, in which case it can quietly be dropped without having dirtied our main code at all.
I do recommend keeping a git/svn/whatever repo for your proofs though. Even for ones you don't use, being able to come back and examine a failed technique can be useful in the future. In some cases, we've even written proofs that have not been deemed useful at the time, but later on serve as inspiration for something else.
Rake Task To Port Your Unit Tests to RSpec Specs
January 25, 2008 at 07:16 PM
Put test_port_spec.rake into lib/tasks, then run rake test:port:spec. It doesn't delete the tests it ports; this way you get the joy of typing rm -rf test yourself.
Using Ruby's Readline Library
January 22, 2008 at 08:48 PM
There appears to be no documentation for Ruby's readline support. What's worse, it's written in C, so you can't (easily) read the source to find out its interface.
By peering at IRB's source, however, I was able to construct the following:
require 'readline' loop do line = Readline::readline('> ') Readline::HISTORY.push(line) puts "You typed: #{line}" end
Remote Filesystems
January 20, 2008 at 07:06 PM
The ability to mount a remote filesystem and read and write to it is something everyone has wanted to do for a long, long time. SMB was an ok solution in the 90s, at least for users with minimal needs. Another dinosaur that we all love to hate is NFS. Despite its synchronization and locking issues, NFS actually works pretty well. But it's a real headache when it comes to firewalls and certainly is not suitable for use over the open internet. Sysadmins will always tell you "don't use NFS, it sucks," but when you ask for a better alternative they just shrug and hastily excuse themselves from the conversation.
More recently, it seemed like WebDAV would be the answer. It uses a standard protocol (HTTP) and ties in nicely with the vision for a read/write web. It works perfectly with firewalls and can be encrypted with HTTPS. All major operating systems have out-of-the box support for mounting WebDAV shares. Most publicly readable svn repositories for open source projects, such as Rails plugins, use WebDAV.
And yet, this technology has not come into its own. It's still a huge headache to configure. Apache requires several add-on modules, an obscure syntax that tends to break in mysterious ways, and user management which requires a combination of editing your Apache config and the curmudgeonous htpasswd command line tool. Nginx has a DAV module, but it doesn't actually work. (I checked the source - it's missing major portions of the protocol like PROPFIND and OPTIONS.) Apache is the only real option for serving DAV, and who wants to run Apache anymore?
And all of that isn't even counting all the security issues of having write access to your source repositories controlled by the webserver. For example, a simple injection vulnerability in some PHP script running on another site hosted on the same server could grant the attacker immediate, full write access to the source repositories hosted there. (Some solutions: run a separate Apache process as a different user to serve DAV, or use virtualization to make a separate server instance for your source repos. Still, none of this is what I would call a cakewalk.)
Hmmm. Maybe the fact that people have been trying to build remote filesystem protocols for the last two decades and have yet to make a good one means something. Perhaps mounting remote filesystems is a fork in the tree of technology that should be abandoned. Great, but then what? We still need to host source control repos, share files, and manage content; and all of these things basically require filesystem (or filesystem-like) operations. The fact that there is such a profusion of protocols supported by FUSE proves the deep user thirst for remote filesystem mounting.
If a remote filesystem protocol is ever going to succeed, it needs to be simple. WebDAV was a good try but it's not simple enough. Could we make another pass, again using HTTP, but taking some of the lessons learned from REST and further paring down the features and capabilities?
So let's forget about permissions, symbolic links, and resource forks. We just want to get the list of files in a folder, and to read, write, and delete those files. Sound familiar? It's CRUD.
How about:
GET http://example.com/repo/ <entries> <folder>app</folder> <file size="307">Rakefile</file> <file size="8819">README</file> </entries>
Many things would go under the axe in this hypothetical protocol. Any kind of linking, for example. Hard and symbolic links, certainly, but also those old standbys, "." and "..". Both of these values can and should be determined from the path name.
There are two types of objects we're operating on: folders and files. Filesystem tools often blur the line between these. In the unix shell, for example, rm -rf <dir> will delete a directory and its contents, but rm <dir> will not delete an empty directory. For that you need rmdir. But rmdir won't delete the directory if it's not empty. Consistency is overrated anyway, right?
I'd be in favor of further paring down the operations by removing the ability to do any direct operations on folders, and instead create and remove them dynamically based on pathnames. So if an operation tries to write to a file in a folder that doesn't exist, it and all the parent folders will be created: the equivalent of mkdir -p. Once the last file is removed from a folder, it no longer exists. Renaming a folder would become an expensive operation on a large tree, since what you're really doing is renaming the paths of every contained file; but this may be a fair trade-off. Empty folders would not be possible. (I detect a hint of git's philosophy creeping into my reasoning on this part.)
There are a few other types of operations that filesystems are good at, but stateless web requests aren't. Two of these likely to affect developers are tailing a log and appending a log. If you have a log which is gigabytes in size, rewriting it every time you want to append a line is impractical. As is fetching it continually as you look for recently appended lines.
In this case I'd argue that this approach to logging should be abandoned, in favor of one that treats each log as a discrete object. I've been experimenting with this on Heroku by storing system logs into a database table. There are some challenges here in managing the large datasets produced; but dealing with huge logfiles isn't fun either, just one that has more prior art (e.g. logrotate).
One final point: locks. This has been the bane of NFS and I think DAV has some unpleasant complexity in this department as well. I'm going to once again put this item up on the chopping block. Making remote filesystems is hard problem, and locking is a hard problem: I don't see any reason to combine them and make things worse. Just because we're used to being able to get locking from the filesystem doesn't mean things should stay that way. Separation of concerns.
Maybe what I'm describing here is a document-oriented database. Since there's progress being made in that department, maybe we should all just sit tight with our crufty Apache WebDAV setups and wait for the day when filesystems are no longer relevant.
Or maybe hierarchical filesystems are soon to be outmoded, and what we really want is a document database with lookup by tag. So when you run your Rails app it doesn't look in app/models/*.rb for your models, but instead just queries the document database for all documents with the tags "ruby", "model", and a tag for the project name.
Enough daydreaming for one day.
Passion
January 19, 2008 at 08:12 PM
So far, all of my posts have been about programming. But in fact, I spend almost as much time thinking about entrepreneurship. Marketing (meaning: understanding your product audience), financing, and generally running an efficient business. So I'm going to start diving into those topics a bit more here.
I'll kick things off by mentioning a podcast that all entrepreneurs should listen to: Entrepreneurial Thought Leaders, which is a recording of a weekly presentation at Stanford. Each week they bring a different speaker, all of whom are succesful entrepreneurs, and most of which have extremely compelling stories to tell. These folks have some incredible wisdom that all of us still working to make it big should soak up in full.
Although most of the speakers are founders of software businesses, there's a few biotech and other fields represented. Even though these are less relevant to me personally, I find these more interesting, because I know less about those fields. But there's plenty to be learned from those speakers all the same. Building a good product and running a business to achieve long-term success are the same no matter what field you're in.
After I had listened to a few of these, it struck me how same themes recur again and again. Build a great team. Listen to your customers. Stand out from the crowd. Stick to it, even when times get tough (and they will).
Perhaps the most repeated point is: do what you love. You have to be passionate about what you're doing. You have to believe in it, more than anyone else.
This echos Guy Kawasaki's phrase from The Art of the Start: "Make meaning, not money." Choose your business venture because you believe you can make the world a better place, not because you think it's where the most money can be made.
Depending on how you count, Heroku is about the fifth venture I've cofounded. Our vision - making it fast and easy to build, deploy, and scale web applications using the most advanced programming language in the world - is one that has deep personal significance for me. Whether or not this vision is meaningful to the wider world will be decided by the market. But I know that my own motivation will not be a constraint, because my belief in, passion for, and excitement toward the meaning we are trying to create is unbounded.
Overriding Rake Tasks
January 17, 2008 at 08:21 PM
I'm not sure why overriding rake tasks is so difficult. One gets the feeling that it is frowned upon... but then I've found it necessary on more than a few occassions.
Here's a quick and dirty way to get rid of a task (presumably, right before you redefine it):
Rake.application.send(:eval, "@tasks.delete('db:test:purge')")
In some cases, you can get away with just clearing out the prerequisties - such as overriding the default task. If you're using RSpec, you probably don't want to run rake test prior to running your specs, so this will do the trick:
Rake::Task[:default].prerequisites.clear
Ruby Test Framework Roundup and Musings
January 16, 2008 at 01:44 PM
Last week's icanhasruby had a series of presentations themed around test setups. The main lesson I took away from this is that a single best-practice solution to test/behavior-driven development has not yet been found. But I get the sense that the community is zeroing in on some core concepts that may one day be as ubiquitous as MVC or the HTTP request/response cycle. Even more interesting is that this seems to be happening in a completely decentralized way. I'm not sure where the Rails core team stands, but, given that they are continuing to put work behind Test::Unit (which, as near as I can tell, has been unmaintained since 2003), they don't seem to be participating much in this quiet BDD revolution. But part of the beauty of Rails and Ruby is that they don't need to.
Some Frameworks
RSpec was the pioneer on reworking BDD development in the land of Ruby, and remains both the most mature option and the one to beat. (That's why it's available by default for Heroku apps.) Most people like the plain-english descriptions of individual specs. But many of those same people dislike the magic-heavy syntax of the DSL. user.should have(1).apps seems nifty at first, but once the novelty wears off, you might find yourself pining for the days of assert_equal 1, user.apps.size.
I like the idea of a rich selection of matchers, but I find that I just can't seem to remember them. I'll say this for the assert / Test::Unit approach: once I had written two or three tests with it, I never looked at the docs again. I've been using RSpec on and off for close to a year now, and I still have to look up matcher syntax with surprising frequency.
There are some benefits to the matcher syntax beyond just a more english-like syntax, however: the specification of your desired results in this format gives the test framework more information about what went wrong, which means it can give clearer output. Generally, I find that when a spec breaks, I'm much more likely to be able to tell what went wrong from the error than an assert failure. When an assert fails, I generally ignore the results and just go to the line number of the failure. From there I try to figure out what might have been wrong. RSpec's clearer messages mean that I'm more likely to make a diagonsis from the test output itself, which strikes me as a lot more agile.
If you do prefer asserts, there's the relative newcomer Shoulda. It offers contexts and plain-english descriptions, but sticks with good ol' asserts for specifying expected results. It seems to be well-supported and gaining traction quickly.
There's also test-spec, which provides a compatability layer between RSpec and Test::Unit. You can use this to mix together Test::Unit tests with context-wrapped, plain-english specs, as well as a simple should-style DSL. Personally I like to avoid mixing together different coding styles, but this might work well to transition a large and complex battery of tests over an extended period of time.
Browser-Side Testing
One of the most interesting presentations was JSpec, an RSpec-alike for Javascript. One can hardly even call this a framework, since it's just a single 100 line javascript file which sends its output to the Firebug console; but often, the best tools are the simplest ones. I liked what I saw here quite a bit:
jspec.describe("Math", function() { it("calculate square roots", function() { (Math.sqrt(4)).should("==", 2) } }
How about full-stack integration testing? There's Selenium, which is about as full-stack as you can get: the tests run in Firefox, clicking links and checking rendered results based on recorded scripts. That's great, and you can even launch it from rake, but it's so heavy-weight that I tend to shy away from it.
An intermediary solution is Webrat. Using a Mechanize-style scripting language, you can specify a full user story, as played out in the browser. For example:
def test_sign_up visits "/" clicks_link "Sign up" fills_in "Email", :with => "good@example.com" select "Free account" clicks_button "Register" end
The only thing this won't test is your javascript, which may be significant if your site is ajax-heavy.
Sample Data
Mocks and stubs have their own area of theoretical debate. There's the question of the best library - for example, RSpec's built in mocks versus Mocha. But there's also the question of when to use mocks and stubs versus building up real object trees and letting them behave normally. Too little mocking and stubbing means you end up with every single spec being an integration test. Too much, though, and you're not testing the real behavior of your code, and creating a lot of overhead on maintaining the mocks.
That brings us into the realm of fixtures, which have historically been a significant point of pain for Rails developers. I was in the midst of some serious fixture woes when I attended the fixture scenarios talk at RailsConf last year, and it convinced me that this was a good way to go. However, this component doesn't seem to have taken off in popularity like expected. I assume this is because fixtures are something that people seem to want to avoid in general. When to use fixtures vs mocks vs stubs vs just building the object manually in the spec setup is not well-defined in my brain at all, and I suspect I'm not the only one that has this problem.
And that highlights an important fact of this whole exploration of the BDD space that's currently taking place. The problem is not really a technical one; it's about methodology. Rails showed us how to encode a methodology into a framework. Now Rubyists are trying to do the same thing with BDD. We'll keep trying these frameworks on for size until we find one that feels right for the most common scenarios of application development.
Summary
Most of the points being debated here reflect the central question of BDD: rigidity. You want your app to have some rigidity, so that when a developer makes any sort of significant change to the implementation or the technical design without updating the specs, running the test battery fails loudly. This prevents things from changing unintentionally or through unintended side-effects.
On the other hand, too much rigidity is the very antithesis of agility. If doing something simple like renaming a field means I have to update not only the database schema and the code, but also the specs, the fixtures, the mock objects... well, the developer might be disinclined to make the change at all. Codebases need to be supple enough that developers are never demotivated from making worthwhile changes.
As I warned in the beginning, BDD/TDD in Rails is nowhere near a resolved question. I hate to be a two-handed professor, so let me summarize with some simplified recommendations by situation.
- If you're new to testing and/or just overwhelmed and confused by the amount of activity in this area, RSpec is probably your best bet. Install the two plugins, run the rspec generator, and then generate some specs with the rspec_model generator.
- If you're working on an existing project and/or on a large team and/or in a corporate environment, you'll probably need to stick to the standard vanilla Rails testing based on Test::Unit. In all honesty it works just fine, and is certainly far better than writing no tests at all. In other words, don't be afraid to write Test::Unit tests just because there's so much going on with the development of new test frameworks.
- If you're really bothered by the should syntax magic of RSpec, use Shoulda.
ThruDB
January 12, 2008 at 11:12 PM
I just love the incredible speed with which new technology paradigms appear and gain traction in these happy and hale days of bubbledom. Less than a month after I discovered document-oriented databases, there's a new one in the mix: ThruDB. Don't look at the Ruby sample code, it'll hurt your eyes - the creators of ThruDB aren't Rubyists. Instead, I recommend Sebastian Delmont's writeup of how it could be used with Rails. Rather than write an ActiveRecord adapter, he suggests an ActiveDocument which uses a DataMapper type pattern for the schema, and has similar methods and callback hooks for most operations. Finds would use ThruDB's native query language, which is the Lucene syntax.
Sebastian puts it quite nicely: "I'm willing to trade off some performance for the promise of infinite scalability." Yeah, me too.
Rails Shared Hosting
January 12, 2008 at 02:46 PM
After Dreamhost's post that shared hosting with Rails is too difficult to implement, and DHH stated that he (mostly) agrees, Peter Cooper of RubyInside suggested that we need a mod_ruby. Now I'm going to chime in with a few thoughts of my own.
First, shared hosting of Rails apps is entirely possible with the tools that exist today. The proof: Heroku is doing it. Of course we're not much like a traditional shared host, and in fact we try to avoid calling ourselves "hosting" at all. But hosting is certainly a substantial part of the service we've created, and in terms of implementation, our free accounts are set up in what is basically a shared hosting environment.
How have we managed to accomplish this, especially given that Heroku as a product is less than six months old? First, there's the matter of focus. We're hosting Ruby, and Ruby on Rails, only. Being focused allows us to tailor the infrastructure to that setup exactly.
Second, Heroku is opinionated software. Even just saying "Ruby on Rails" only tells you two components of the stack. RoR can run on different operating systems, different databases, and be fronted by different webservers. When you put your app on Heroku, you don't get a choice of operating system, webserver, or database. Instead, we make those choices to create a unified stack that we can verify works together flawlessly. The idea is that individual app developers need not waste time making these decisions, and instead can dive straight in to the part that matters: coding their app.
You could compare this approach to an Apple computer vs. a PC built from OEM parts. An Apple has all the parts preselected and verified to work together, from the network chipset right up to the operating system and all the software that comes with it. Building a PC from parts, one has many choices to make, and sometimes the choices will be incompatible with each other. For DIY-types, building your own system can be a lot of fun, or let you build unusual configurations to fit specific needs, like creating you own MythTV box. Heck, I've built dozens of PCs, mostly servers, over the years. But if you're not looking for DIY fun, and you just want a computer that works as quickly and easily as possible, the Apple approach is clearly the better one.
Finally, I'd like to say that mod_ruby is probably not the right path for the Ruby world. It did work extremely well for Perl and PHP, true; but Ruby frameworks are different. (Mainly because, well, they are frameworks, not just a language. mod_ruby would probably be fine if people wanted to write raw Ruby scripts to generate HTML directly.) Perhaps mod_sinatra or mod_camping would be more appropriate. Still, it seems to me that Apache is probably near the end of its lifecycle, so investing time and energy in developing modules for it does not seem terribly worthwhile to me.
Perhaps the message here is not that we actually need a mod_ruby, but rather that we need a component that solves the same problem that mod_perl or mod_php did: drawing together several components in the toolchain to make the infrastructure easier to deploy. My personal feeling here is that something that makes use of the growing availability of virtual hosting is a better choice than trying to pursue the shared hosting model so popular in the era of PHP. Smarticus suggests that Rails users favor VPS hosts for deployment, and I wholeheartedly agree with that. But further, perhaps the equivalent of mod_php for the Ruby world is going to be some sort of tool that makes it easy to stamp out prebuilt VPS instances. EC2onRails seems to be one possibility (I haven't used it myself). I'd suggest to others looking to solve this problem that VPS hosting is going to be the key to creating a mod_ruby-alike solution for the Ruby world.
Forking Rails
January 05, 2008 at 03:51 PM
For a while, everything that we did in Heroku to extend the functionality of Rails, or interoperate with it, has been done through extension mechanisms. Monkeypatching via plugins, use of the somewhat obscure Mongrel GemPlugin, and tweaking of the user's Rails app files directly. (We try to avoid that last one whenever possible. Early on we did a lot of it, but more recently we've managed to avoid it almost entirely, much to my relief.)
All of this is quite a testament to the extensibility of Rails, Mongrel, Nginx, etc. I think it's safe to say that we're bending these tools in ways that go well outside the common use cases. But even the most supple reed can only bend so far. Certain areas (for example, script/generate) can't be monkey patched through standard mechanisms. And so earlier this week I decided it was time to fork Rails.
Since I'm now a fan of Git, this turned out to be a good way to maintain our fork. Steve or Pablo will tell you how. Maintaining a parallel branch has, so far, proven to be quite easy - even fun.
Packaging up the modified version for use is just a matter of running "rake package" in each module's subdirectory (i.e. activerecord, activesupport, railties, etc). The resulting gem is dropped into pkg/ under each module. The gem can then be copied to each user's app server as it boots and installed with gem install, which overwrites the standard gem if it already exists. (I'm still trying to figure out a way to run the package script without building the documentation, which takes ages. I ended up commenting out the body of the generate_rails_framework_doc task as a stopgap.)
One thing I still haven't decided on is how to maintain multiple versions. I wish that there was a syntax for environment.rb that looked something like:
RAILS_GEM_VERSION = '2.0.*'
Personally, I rarely care which minor rev I'm running - the latest version available on whatever box the app is running on is usually just fine. For Heroku apps, they should definitely use whatever the latest minor rev is. But note that setting one catch-all with an environment variable isn't adequate, because each app should be configured to use a particular major rev, and we need to respect that. Hopefully I'll come up with something better on this eventually.
Be Narrow In Your Rescues
January 01, 2008 at 04:47 PM
When trying to block certain kinds of exceptions, it's tempting to write catch-all cases. Like:
string.match(/inet addr: ([\d.]+)/)[1] rescue ""
or:
File.delete('file_that_might_not_exist') rescue nil
In the first case we want no match to return nil. In the second we just want the file gone, and aren't concerned if it wasn't there initially. The problem is, both of these mask all exceptions, so we may not find out about other kinds of errors.
So we want to differentiate between user-generated errors and programatic errors. User-generated errors are a natural result of the imperfect data coming from a user or some other source outside our program. We can't force the world to always give us perfect data, so we handle these error gracefully.
Programatic errors are mistakes that we, as the developer, have made in crafting the business logic of the app. In this case, we want to hear about it as soon and as loudly as possible - so that we can find the flaw and fix it. (This is the Rule of Repair: "When you must fail, fail noisily and as soon as possible.")
So in our examples above, one option is to skip the rescue and test for the case that we're actually looking for. On the regexp match:
(m = string.match(/inet addr: ([\d.]+)/)) ? m[1] : nil
This will protect only the error we care about (no match) and let through any others.
The other option is to capture the specific exception you are looking for only. This might be the right choice on the delete file example:
begin File.delete('file_that_might_not_exist') rescue Errno::ENOENT end
Alas, not all on one line, but correctness beats succinctness when the two come in conflict. (Although you could also try FileUtils.rm('filename', :force => true) here.)