Adam @ Heroku
a tornado of razorblades

The End of Bugs?

July 06, 2008 at 11:55 PM

Although I've been a believer in TDD/BDD for quite a while, rush was the first time I started a project and said, ok, THIS time I'm going to get really serious about it. My very first commit was a bunch of empty specs, and since then, I don't think I've committed any new feature or even a bugfix without an accompanying spec.

A few weeks into the project, I had made really substantial progress and had quite a lot of functionality. One day, while using it to access a remote machine to do some file operations, something very surprising happened: I typed a certain command and it didn't behave the way I expected it to.

I figured I must have a typo or something, but double-checking it, I realized that, no: my command was correct, it was the program which was behaving incorrectly. I had found, it seemed, a bug.

I felt a sudden sense of disorientation and panic. The program wasn't behaving as expected? What should I do? Log things? Run a debugger? Just squint at the code for a while and see if I could spot the problem? These methods seemed so...crude.

And yet, I realized, these techniques are the very ones I've been using my whole career, that I use every single day to get work done. I don't give them a second thought. Yet, after two weeks of doing pure BDD, the idea of spending time dealing with a bug seemed foreign and painful.

I ended up tracking down the problem and discovered that, despite my nearly 100% code coverage (as reported by RCov), I had a fork in a single-line conditional that was not speced. Upon realizing this, another feeling washed over me: a sense of mistrust in the project's codebase, because a fragment of one line of code did not have spec coverage. I wrote the spec, confirmed that it failed / exposed the bug, and then fixed the bug itself ten seconds later. Relief flooded over me: things were right with the world again.

Nearly all developers, myself included, spend most of our time in that state: not quite trusting that the code all works. I only had the chance to realize how stressful and unpleasant this state is because I lived outside it for a little while.

This experience also told me something else: after several weeks of intense development, this was the first time I had encountered a bug. I feel absolutely certain that that's never happened to me before.

You could argue that rush was not then (and is still not now) a very big or complicated program, but I don't think that's significant. First, by this time I was already using it to control a remote server (i.e., it was communicating commands over the network via rushd); that's the sort of thing that is highly prone to bugs. But more importantly, even very short and simple programs often have bugs lurking in them.

My whole life I've assumed that bugs are a given in software, and that trying to eliminate them completely is a waste of time. And I think I was correct, for the traditional, non-BDD approach. ("Code-driven development," I guess?)

But it may be that a very disciplined approach to BDD means the opportunity to have truly bug-free software. Just imagine what it would be like to have all the time you spend debugging code go into writing it instead. Granted, a lot of that goes into writing specs; but for me, writing specs is far more fun than debugging. It's programming, not sleuthing.

Bug-free software is a very bold claim, though it does require a somewhat narrow definition of the word "bug." A bug is a situation where the code has been specified to behave one way, and instead it behaves another. It does not include things that users or developers may like the software to do, but that it is not specified to do currently. It also doesn't include external depedencies (libraries, web services..) behaving in an unexpected manner - good software should try to recover gracefully from failures in external services, but doing so is a type of feature.

Look back at that definition of bug again: "...the code has been specified to behave one way..." BDD is the very act of writing executable specifications. In non-BDD, there is no specification; so by definition, software written that way can be said to either have an infinite number of bugs, or perhaps just no functionality that you can rely on.

Comments: 14 (view/add your own) Tags: bdd

Rocking the Mocking

May 02, 2008 at 02:12 PM

How do you write a spec for this method without touching the filesystem or the user's environment?

def authkey
  File.read("#{ENV['HOME']}/.ssh/id_rsa.pub")
end

Just repeat this mantra to yourself: It's Ruby. Everything Is An Object Or A Method. Objects And Methods Are Always Mutable.

Got your answer yet? Here's mine:

it "reads the ssh rsa key from the user's home directory" do
   ENV.should_receive(:[]).with('HOME').and_return('/home/joe')
   File.should_receive(:read).with('/home/joe/.ssh/id_rsa.pub').and_return('the key')
   @client.authkey.should == 'the key'
end
Comments: 1 (view/add your own) Tags: bdd

Why No Love For RSpec?

April 22, 2008 at 08:11 PM

It's no secret that I'm a big RSpec fan. Test::Unit feels pretty outdated these days, and none of the other frameworks can yet match the level of BDD goodness you get from RSpec. Throw in that it's now a mature and stable library, and it seems like a sure bet for all your Ruby specing (or testing, if you like) needs.

It's thus surprising to me that some Rubyists seem reluctant to use RSpec. Certainly, some of this comes from the fact that it's not a standard include with Ruby or Rails, unlike Test::Unit. In this way it faces the same battle that Haml, DataMapper, Thin, or any other add-on library that swaps out a substantial component of the framework stack: the user must actively make the choice. Defaults are the status quo, the incumbent; they win, well, by default.

But I sense that people's mistrust of RSpec extends further than what these other components face. My guess is that the reasons for this are: too big, too complicated, and too much magic.

The first concern is a reasonable one - the plugins are pretty large, and especially since it's common to install it as a plugin rather than a gem, this can seem to bog down your source repo. The too big issue is a bit of an illusion, as Rick DeNatale explains.

The too complicated and too much magic problems can somewhat be addressed by using a subsest of the matcher library, as I suggested with minimalist RSpec matchers. It occurs to me that all of these problems could be solved with a lightweight implementation of RSpec which implements just the core syntax. That is:

describe MyClass do
  before do
    @my_obj = MyClass.new
  end

  it "sums two numbers" do
    MyClass.sum(1, 2).should == 3
  end

  it "raises an error when arguments are not integers" do
    lambda { MyClass.sum(1, 'x') }.should raise_error(ArgumentError)
  end
end

If someone wrote an leaner RSpec-alike library which ran the above spec correctly, I'd probably switch. (The example excludes RSpec's built-in specs and mocks, but I'd be ok using Mocha instead, which is very similar.) Maybe mSpec is such a thing, though I'm still kind of confused as to what it is exactly, since the readme claims you should still run the specs using RSpec.

A final reason why many people don't understand the importance of RSpec is simply not fully drinking the BDD koolaid. If you find yourself thinking things like "Well, yeah, BDD is a good idea, I hope to find the time to do it more often...", then I count you in that category.

I was in that position not too long ago, so don't feel bad. But it was using RSpec that caused the lightbulb to turn on above my head. Pat Maddox put it well in a mailing list post:

"I would say that TDD is a tool to help you solve the problem of designing and implementing behavior. Test::Unit works fine in that regard, but RSpec reduces the semantic distance between the developer and the problem domain."

The good news is, despite the sense of reluctance many display toward RSpec, it actually is catching on - even becoming the standard. Merb uses it by default out of the box. A recent Rails Envy podcast mentioned that their informal poll showed RSpec as the most popular testing/specing framework with 62% of the vote. (Test::Unit got around 25%, Shoulda around 12%.) The hosts expressed surprise at this. I was surprised too - pleasantly so.

But perhaps most important is that the Rubinius project uses RSpec, and has spawned a spinoff project for an executable specification of the Ruby language, which is being adopted most of the the major Ruby VM implementors (MRI/Yarv, JRuby, IronRuby, Rubinius). This means that soon the most important spec in the Ruby world - the spec for the very language itself - will use RSpec.

One Expectation Per Spec

March 15, 2008 at 02:17 PM

Jay Fields posts about one expectation per spec, something that I generally agree with but often find hard to practice. I typically find myself with one mock and one assert per test, such as this example from the rush specs:

it "transmits file_contents" do
  @con.should_receive(:transmit).with(:action => 'file_contents', :full_path => 'file').and_return('contents')
  @con.file_contents('file').should == 'contents'
end

I want to test both the input and the output - that the file_contents method calls the right method with the right arguments, and that it returns the expected value. Breaking this into two specs would be:

it "transmits file_contents" do
  @con.should_receive(:transmit).with(:action => 'file_contents', :full_path => 'file')
  @con.file_contents('file')
end

it "gets the right return value from file_contents" do
  @con.stub!(:transmit).and_return('contents')
  @con.file_contents('').should == 'contents'
end

This is a lot more verbose but I don't feel it adds a whole lot of clarity. Checking both the input and the output in one place seems reasonable to me. But I'll keep this in the back of my head and see how it influences my spec-writing.

Another item I spotted in Jay's example specs is stub_everything. I wasn't previously aware of this. (His examples use Mocha, but the RSpec mocks have the same exact method.) Like this:

class BankAccount
  def transfer(other_account, amount)
    balance -= amount
    other_account.balance += amount
  end
end

it "transfers money out of this account"
  @account.balance = 10
  @account.transfer(stub_everything, 1)
  @account.balance.should == 9
end

stub_everything returns an object that responds to every possible method, but does nothing on the calls. This allows you to effectively ignore any operations on that object, rather than having to stub every call explcitly.

Comments: 2 (view/add your own) Tags: bdd

Test Assumptions, Not Methods

February 23, 2008 at 04:27 PM

Specs should test assumptions, not methods.

How many specs should there be for each method? One per assumption. For example, this method:

def thumbnail_filename
  "images/thumbs/#{basename}.jpg"
end

...has one assumption, so it needs one spec:

it "stores thumbnail jpegs in the thumbs subdir" do
  @image.should_receive(:basename).and_return("test")
  @image.thumbnail_filename.should == "images/thumbs/test.jpg"
end

But this method:

def popular?
  !archived? and recent_views > 50
end

...has three assumptions:

it "is never considered popular when archived" do
  @item.should_receive(:archived?).and_return(true)
  @item.should_receive(:recent_views).and_return(9999999)
  @item.should_not be_popular
end

it "is not considered popular when there are not many recent views" do
  @item.should_receive(:archived?).and_return(false)
  @item.should_receive(:recent_views).and_return(1)
  @item.should_not be_popular
end

it "is considered popular when there are lots of recent views" do
  @item.should_receive(:archived?).and_return(false)
  @item.should_receive(:recent_views).and_return(1000)
  @item.should be_popular
end
Comments: 4 (view/add your own) Tags: bdd

Minimalist RSpec Matching

February 07, 2008 at 07:08 PM

I've been thinking about the "too much magic" problem of RSpec. There are two sides to this: too much magic in its internals, and too much magic in its matcher syntax. These are related. Shoulda and test_spec are popular because people like asserts. You never forget the syntax for assert_equals - at least, once you can remember the order the arguments go in (I always want to write the value first and the expected value second).

Lately I've made peace with this by limiting my use of RSpec's matchers to just a couple of syntaxes:

var.should == 'expected'
var.should match(/pattern/)
var.should be_true   # or be_false
lambda { ... } should raise_error(SomeException)

These are easy to remember and they read nicely. var.should == 'expected' is nearly as good as assert_equal in its simplicity, with the further benefit that the expected value goes on the right, which I find more natural.

Very occassionaly I will use var.should > 0, but that's about it. I also rarely find much use for should_not, except sometimes against a match. All of the other matchers I now avoid. Here's a conversion chart from the 'standard' RSpec way to Adam's minimalist approach:

RSpec Full Minimalist
item.should be_kind_of(Item) item.class.should == Item
file.should be_exists file.exists?.should be_true
cart.should have(3).items cart.items.size.should > 3

The argument against my approach is that it gives RSpec slightly less information about what's failing. My experience so far has been that it's been more than offset by the peace of mind of having less to rememeber. Actually, the peace of mind is because I no longer have to mentally scold myself every time I forget the right matcher to use.

I also feel that val.should == 1 has a distinctive look that's easy to scan a page for. Not quite as good as assert, but close.

Comments: 0 (view/add your own) Tags: bdd

Theory vs Practice

February 02, 2008 at 11:47 PM

Specs (or tests) show that your code works in theory. A running app in production shows that your code works in practice.

Put this way, the obvious question becomes: what's the point of BDD? Working in practice is what matters.

Actually, both are equally important.

I've worked on (and created, in my less-enlightened past) lots of apps that are thrown together collections of PHP pages, ad-hoc daemons, and so on. These apps work in practice, but not in theory. They've been jury-rigged and duct-taped into working in practice, but when the first earthquake comes along they fall apart.

Another way to state this is:

  • Code that works in theory is code that works by design.
  • Code that works in practice only is code that works by accident.

Code that works in theory may not work in practice - only production deployment can tell you that. An app which has a large suite of running specs, but no users, is really no better than a non-speced, jury-rigged / it-works-if-you-just-don't-jiggle-it-too-hard app. Only when it has been proven in both theory and practice is your app truly sound.

Put test_port_spec.rake into lib/tasks, then run rake test:port:spec. It doesn't delete the tests it ports; this way you get the joy of typing rm -rf test yourself.

Comments: 0 (view/add your own) Tags: bdd

Ruby Test Framework Roundup and Musings

January 16, 2008 at 01:44 PM

Last week's icanhasruby had a series of presentations themed around test setups. The main lesson I took away from this is that a single best-practice solution to test/behavior-driven development has not yet been found. But I get the sense that the community is zeroing in on some core concepts that may one day be as ubiquitous as MVC or the HTTP request/response cycle. Even more interesting is that this seems to be happening in a completely decentralized way. I'm not sure where the Rails core team stands, but, given that they are continuing to put work behind Test::Unit (which, as near as I can tell, has been unmaintained since 2003), they don't seem to be participating much in this quiet BDD revolution. But part of the beauty of Rails and Ruby is that they don't need to.

Some Frameworks

RSpec was the pioneer on reworking BDD development in the land of Ruby, and remains both the most mature option and the one to beat. (That's why it's available by default for Heroku apps.) Most people like the plain-english descriptions of individual specs. But many of those same people dislike the magic-heavy syntax of the DSL. user.should have(1).apps seems nifty at first, but once the novelty wears off, you might find yourself pining for the days of assert_equal 1, user.apps.size.

I like the idea of a rich selection of matchers, but I find that I just can't seem to remember them. I'll say this for the assert / Test::Unit approach: once I had written two or three tests with it, I never looked at the docs again. I've been using RSpec on and off for close to a year now, and I still have to look up matcher syntax with surprising frequency.

There are some benefits to the matcher syntax beyond just a more english-like syntax, however: the specification of your desired results in this format gives the test framework more information about what went wrong, which means it can give clearer output. Generally, I find that when a spec breaks, I'm much more likely to be able to tell what went wrong from the error than an assert failure. When an assert fails, I generally ignore the results and just go to the line number of the failure. From there I try to figure out what might have been wrong. RSpec's clearer messages mean that I'm more likely to make a diagonsis from the test output itself, which strikes me as a lot more agile.

If you do prefer asserts, there's the relative newcomer Shoulda. It offers contexts and plain-english descriptions, but sticks with good ol' asserts for specifying expected results. It seems to be well-supported and gaining traction quickly.

There's also test-spec, which provides a compatability layer between RSpec and Test::Unit. You can use this to mix together Test::Unit tests with context-wrapped, plain-english specs, as well as a simple should-style DSL. Personally I like to avoid mixing together different coding styles, but this might work well to transition a large and complex battery of tests over an extended period of time.

Browser-Side Testing

One of the most interesting presentations was JSpec, an RSpec-alike for Javascript. One can hardly even call this a framework, since it's just a single 100 line javascript file which sends its output to the Firebug console; but often, the best tools are the simplest ones. I liked what I saw here quite a bit:

jspec.describe("Math", function() {
  it("calculate square roots", function() {
    (Math.sqrt(4)).should("==", 2)
  }
}

How about full-stack integration testing? There's Selenium, which is about as full-stack as you can get: the tests run in Firefox, clicking links and checking rendered results based on recorded scripts. That's great, and you can even launch it from rake, but it's so heavy-weight that I tend to shy away from it.

An intermediary solution is Webrat. Using a Mechanize-style scripting language, you can specify a full user story, as played out in the browser. For example:

def test_sign_up
  visits "/"
  clicks_link "Sign up"
  fills_in "Email", :with => "good@example.com"
  select "Free account"
  clicks_button "Register"
end

The only thing this won't test is your javascript, which may be significant if your site is ajax-heavy.

Sample Data

Mocks and stubs have their own area of theoretical debate. There's the question of the best library - for example, RSpec's built in mocks versus Mocha. But there's also the question of when to use mocks and stubs versus building up real object trees and letting them behave normally. Too little mocking and stubbing means you end up with every single spec being an integration test. Too much, though, and you're not testing the real behavior of your code, and creating a lot of overhead on maintaining the mocks.

That brings us into the realm of fixtures, which have historically been a significant point of pain for Rails developers. I was in the midst of some serious fixture woes when I attended the fixture scenarios talk at RailsConf last year, and it convinced me that this was a good way to go. However, this component doesn't seem to have taken off in popularity like expected. I assume this is because fixtures are something that people seem to want to avoid in general. When to use fixtures vs mocks vs stubs vs just building the object manually in the spec setup is not well-defined in my brain at all, and I suspect I'm not the only one that has this problem.

And that highlights an important fact of this whole exploration of the BDD space that's currently taking place. The problem is not really a technical one; it's about methodology. Rails showed us how to encode a methodology into a framework. Now Rubyists are trying to do the same thing with BDD. We'll keep trying these frameworks on for size until we find one that feels right for the most common scenarios of application development.

Summary

Most of the points being debated here reflect the central question of BDD: rigidity. You want your app to have some rigidity, so that when a developer makes any sort of significant change to the implementation or the technical design without updating the specs, running the test battery fails loudly. This prevents things from changing unintentionally or through unintended side-effects.

On the other hand, too much rigidity is the very antithesis of agility. If doing something simple like renaming a field means I have to update not only the database schema and the code, but also the specs, the fixtures, the mock objects... well, the developer might be disinclined to make the change at all. Codebases need to be supple enough that developers are never demotivated from making worthwhile changes.

As I warned in the beginning, BDD/TDD in Rails is nowhere near a resolved question. I hate to be a two-handed professor, so let me summarize with some simplified recommendations by situation.

  • If you're new to testing and/or just overwhelmed and confused by the amount of activity in this area, RSpec is probably your best bet. Install the two plugins, run the rspec generator, and then generate some specs with the rspec_model generator.
  • If you're working on an existing project and/or on a large team and/or in a corporate environment, you'll probably need to stick to the standard vanilla Rails testing based on Test::Unit. In all honesty it works just fine, and is certainly far better than writing no tests at all. In other words, don't be afraid to write Test::Unit tests just because there's so much going on with the development of new test frameworks.
  • If you're really bothered by the should syntax magic of RSpec, use Shoulda.
Comments: 2 (view/add your own) Tags: bdd