I Git It
December 30, 2007 at 12:24 AM
I've been sold on the concept of distributed revision control systems ever since Simon Michael showed me Darcs a year ago. But I've been slow to adopt it for any real use. Part of the reason for this was that I wasn't super-happy with the implementations available, although there are quite a few. Darcs certainly seemed like the best of the bunch, but it never felt quite right to me. Besides being sluggish, it relies on tons of interactive prompting - which is not the sort of UX I look for in a command-line tool.
Then I discovered Git. Younger than most of the other choices, Git seems to have that extra bit of pop that Darcs et al are missing. It doesn't hurt that it's author is an opinioned Scandanvian uber-hacker who we all love. (No, not that one.)
I'm especially pleased to see that there seem to be some recent rumblings that Git may be gaining traction in the Rails community. I thought I'd be a minority voice on this, but it seems like everyone else is seeing the light too - and with remarkable synchronization. An idea whose time has come, perhaps.
Tom Moertel has an intriguing description of porting his workflows from Darc to Git. Although ostensibly about how Git is a preferred tool, along the way it shows off some of the crazy-awesome features of Darcs.
But that's because decentralized SCMs spank the pants off of centralized ones. So why isn't everyone using them already? Because there's a cost: the higher level of brainpower required. To use it at all, certainly, but also to take advantage of the really powerful features of the decentralized model. But since SCM is a major part of a developer's toolchain, the time investment in learning a harder but more powerful tool makes sense. Going from Subversion to Git is like going from pico to vi, or from tcsh to bash, or from flat files to SQL. Getting your head around it may hurt a bit, but the effort will pay off very nicely in the long run.
SSH Tunnels
December 28, 2007 at 10:39 PM
And now for one of my favorite bits of black magic from the unix poweruser's toolkit: ssh tunnels.
Like most good tricks, this one is simple. It lets you bounce TCP traffic through an ssh connection. This is handy in a variety of situations, but the one I've used it most often for is to access a website which is available only inside of a corporate LAN. If you've got external ssh access, you can set up a tunnel that will let you point your browser at a local port to access that site.
The syntax is:
ssh -L [local_port]:[site_you_want_to_reach]:[remote_port] [ssh_host] [command]
This can be a little confusing, but there's actually only two variables that matter. So first I'll fill in the defaults you'll probably always want to use:
ssh -L 9999:[site_you_want_to_reach]:80 [ssh_host] "sleep 9000"
Much better. Now we just need to know the site we want to go to, and the host we will tunnel it through. For example, let's say we want to view RubyInside, but tunneled through your remote server example.com:
ssh -L 9999:rubyinside.com:80 user@example.com "sleep 9000"
You can now browse to http://localhost:9999/ and see RubyInside. (This won't work if the remote site uses named virtual hosts. You can rememdy this by adding the hostname (like rubyinside.com) to your /etc/hosts as an alias for localhost, and then point your browser to http://rubyinside.com:9999/.)
Since RubyInside is public, this example is not actually useful - but now that you see how it works, let's look at how it can be used to view a Rails app running inside a LAN on someone's workstation. Let's say the mongrel is running on port 3000 and that the workstation is devbox.localnet, but devbox.localnet is only accessible from your ssh bouncepoint, which we'll call bouncepoint.example.com.
ssh -L 9999:devbox.localnet:3000 user@bouncepoint.example.com "sleep 9000"
ssh tunnels can also be used to bypass those silly content filters that some companies insist on installing on their local LANs. This trick requires that you be able to send outgoing TCP traffic, and you have a remote ssh server listening outside the LAN. (If port 22 is blocked, you'll need to find one that isn't, and then get your remote ssh server to listen on that port.) By tunneling all your web traffic through ssh, you can bypass any content filters - and also get a completely secure connection that is impossible to easedrop on.
For example, if you wanted to view the latest Penny Arcade comic, but found that the client site you're working on for the day blocks it, you might use your remote VPS host to set up a tunnel like this:
ssh -L 9999:penny-arcade.com:80 vpsuser@vpshost.com "sleep 9000"
One final note: if you're wondering what the sleep 9000 is for. ssh requires a command, and when the command is done executing, it exits. So you have to give a command that will stall. I use 9000 because it's plenty long and easy to type. (When you're done, just hitting Ctrl-C in the terminal that's running the tunnel will close it.) But if you wanted it to last indefintely, you could use while [ 1 ]; do sleep 100; done.
Nested Resources in Rails 2
December 20, 2007 at 02:44 AM
Nested resources were introduced in Rails 1.2 and are touted as the Right Way to do REST with parent-child model associations. If your app has a url that reads something like /employees?company_id=1, a switch to nested resources would cause it to read /companies/1/employees.
Rails 2 introduced a few subtle but important syntax changes. So far I haven't seen any comprehensive guide to the new syntax, so I'm writing one.
I'll use the example that seems to be popular, which is tickets belonging to events:
class Event < ActiveRecord::Base has_many :tickets end class Ticket < ActiveRecord::Base belongs_to :event end
Full code for the example app is available for browsing or download via svn. Most of the relevant code is in routes.rb, tickets_controller.rb, and the tickets views.
Routes
The first step (and the easiest one) is setting up the named route with :has_many syntax. (Note that this is a significant change from Rails 1.2 syntax of passing a block to map.resources.)
map.resources :events, :has_many => :tickets
Should there be a separate line reading map.resources :tickets? That depends on whether you want the tickets to be accessible in a non-nested form (e.g. //1). Given that a ticket will always have an event, I think it's more consistent not map tickets separately. The nesting information from the url isn't needed for some operations (show, edit, and destroy). But then you'd need to have fine-grained exceptions to the before_filter, deciding when to pull the event from the url and when not to. I don't think there's any consensus on this point yet, but for this post I'm going to use the always-nested approach.
Route Helpers
The next part I've found to be the hardest - or at least, rather time-consuming. It's somewhat difficult to remember the right syntax, since there are a dozen or so helpers to generate all the urls that are needed. Thankfully there's a rake task to show you all the named routes: rake routes. The output looks like this:
events GET /events {:controller=>"events", :action=>"index"} formatted_events GET /events.:format {:controller=>"events", :action=>"index"} POST /events {:controller=>"events", :action=>"create"} POST /events.:format {:controller=>"events", :action=>"create"} new_event GET /events/new {:controller=>"events", :action=>"new"} formatted_new_event GET /events/new.:format {:controller=>"events", :action=>"new"} ...
Blood started squirted out of my eyes the first time I ran this task on an app with nested resources - it's not pretty if your terminal isn't wide enough to prevent the lines from wrapping. I suggest maximizing your window to prevent getting blood all over your keyboard.
The bit we're looking for is in the far lefthand column - the name of the route. For our nested resource, here's the interesting ones:
event_tickets GET /events/:event_id/tickets new_event_ticket GET /events/:event_id/tickets/new edit_event_ticket GET /events/:event_id/tickets/:id/edit event_ticket GET /events/:event_id/tickets/:id
The naming scheme is: parent resource (singular), then child resource (plural). So where you might have used tickets_path before, you now use event_tickets_path. new_ticket_path becomes new_event_ticket, and so on.
Seem simple? Not so fast. You also need to include the event as a parameter. So tickets_path becomes event_tickets_path(event). In cases where the child resource already knows its parent, such as an edit link, you can use edit_event_ticket_path(ticket.event, ticket). (This last bit wouldn't be necessary if you chose to map.resource :tickets, in which case editticketpath(ticket) would still work. The downside is that then you have to remember when you need to use nested helpers and when you don't. As mentioned above, I prefer going the route of consistency - always nested.)
Forms
What else needs to change? form_for has this syntax with resources:
<% form_for(@event) do |f| %>
Which is wonderfully succinct compared to the way that non-resource form_fors usually look. But it gets a little funky with nested resources:
<% form_for([ @event, @ticket ]) do |f| %>
(Trevor Squires makes a good argument to why this syntax isn't too spiffy, and Codafoo makes a slightly less compelling argument as to why it is.)
Redirects
Redirects and XML locations should also use the form_for argument syntax:
if @ticket.save flash[:notice] = 'Ticket was successfully created.' format.html { redirect_to([ @event, @ticket ]) } format.xml { render :xml => @ticket, :status => :created, :location => [ @event, @ticket ] }
In all cases, the parent resource always goes first - same as the url helper, i.e. event_ticket_path.
Before Filter
Since the controller is never called without nesting, the before_filter is simple:
before_filter :get_event def get_event @event = Event.find(params[:event_id]) end
Thus, every action and view can always count on @event being set. In some cases you can access @ticket.event, but in the always-nested approach, @event can be used everywhere.
Scoping
Any place the controller makes an ActiveRecord call like find or new should be scoped:
def index @tickets = @event.tickets.find(:all) def show @ticket = @event.tickets.find(params[:id]) def new @ticket = @event.tickets.new
One trick for making sure you've changed every reference is to search the file for the class name. The text "Ticket" should not appear in tickets_controller.rb, except on the first line as part of the controller name.
Conclusion
Getting this all set up is quite a bit of busywork if you start with two models generated with resource scaffolding. (Which reminds me: scaffold_resource from Rails 1.2 is gone, replaced by scaffold in Rails 2. The original scaffold is gone, which is good, because last I checked it had suffered some serious bitrot.)
It would be immensely convenient if there were a generator for this. Something like:
generate nested_resource Ticket belongs_to:event
However, this would be quite a bit more difficult to write than a typical generator, because it would need to modify existing code beyond just adding lines to a file. So although handy, don't count on seeing this anytime soon. Though if some enterprising soul wanted to put their mind to it, I'm sure the Rails community would be forever, or at least briefly, grateful.
A World Without SQL
December 17, 2007 at 01:05 AM
Amazon’s SimpleDB is brining non-relational databases (such as CouchDB and RDDB) into the spotlight. I’ve been eagerly looking forward to the paradigm in databases for some time now, having never been convinced that object databases were the next step. So this is exciting stuff.
One of the big questions I have when moving away from a SQL database is “what about joins?” Because ultimately SQL is about relations, and the JOIN clause is the heart of the power of a SQL database.
Assaf Arkin’s fascinating post on dumb databases argues that JOIN is actually a hack to fix an underlying design problem. In order to understand a world without them, you need to think in terms of weak and strong relations:
In modeling the entities, I realized that orders and products are distinct with weak ties between them. [...]But in modeling the order entity, the decision I made was to store line items inside the order itself. I realized I have no compelling use case to keep those separate. When I add or remove a line item, I’m changing the order, I expect the order to have a new version and updated timestamp. When I delete the order, I assume all the line items will go away. And when I query the order, I intend to find all the line items there, without resorting to Cartesian join and result-set gymnastics.
In a SQL database, there’s no explicit indication of the strength of a relationship. Product has_many :orders, but rarely or ever do you want to fetch every order associated with a product. On the other hand, Order has_many :items, and yes – any time you fetch or update an order you also want to fetch and update the items that go with it. The former is a weak link, the latter is a strong one. (You could probably an app’s weak and strong links by looking at which tables are typically :include’d from that model’s find.)
Other interesting stuff in Assaf’s post:
- Maintaining indexes is rightfully business logic that should be handled by the application – since the application knows how it wants to query the data later. This makes more sense when you realize that CouchDB calls these views instead of indexes. Also note that computed values (that is, values that are calculated and then cached in a column of the table when the record is updated) become one and the same with indexes.
- Schemas are useless constraints, putting business logic into the database, and requiring migrations and other headaches to maintain. (This might be a good time to review DataMapper, since that pattern fits well with a schemaless database.)
- The old example of using a database transaction to wrap the transfer of money from one bank account to another is total bull. The correct solution is to store a list of ledger events (transfers between accounts) and show the current balance as a sum of the ledger. If you’re programming in a functional language (or thinking that way), this is obvious.
Further – and this is my own conclusion here – the result of these flat document databases is that web protocols and database protocols may be converging. CouchDB already uses JSON and REST. One implication may be that ActiveRecord and ActiveResource may become the same thing in the future.
So, like – woah, dude. This is some heady stuff, yet it rings true to me. SQL, as well as it has served us for so many years, may be on its way out. And not a moment too soon.
In Defense of Yak Shaving
December 12, 2007 at 03:04 PM
You're probably familiar with yak shaving: that is, an infinite recursion of tasks theoretically in support of your original task, but each of which is progressively less related to the original task. You know, you run into a bug with a Ruby gem that's fixed in the latest dot release, so you try to upgrade, but then you realize that you need to build a support library from source to do so, and then you remember that you hadn't get installed gcc after a from-scratch install of Leopard, but that reminds you that you left your OS disks at home, so you need to...
This, of course, can be a serious waste of time and loss of focus. Managers at big software companies are kept awake at night thinking about the fact that half their dev team might be off shaving yaks even though the deadline for their project is two weeks away. At small startups without any managers, it can simply lead to the death of the company. How many startups have spent years building some really impressive architecture or toolset with which they intend to build their product, but not the actual product? I can think of a few. In fact, my very first software job was at a company that did exactly this.
There's another side to yak-shaving, though. Sometimes you really do need to enhance underlying tools or architecture to enable your current task, and future related tasks. Whether it's rearranging your directory structure, streamlining the build process, or writing some diagnostic tools, a project can only stay agile if it has the proper infrastructure.
The counterargument to that (is that a counter-counterargument?) might be: sure, but work on the infrastructure some other time, when you're not focused on a particular task. This seems compelling, but experience tells us that it doesn't really work. You can't see what needs to be done to fix the problem unless you're in the thick of something that needs the fix.
For example: it's hard to do any kind of serious refactoring just 'cause. You can know that a codebase is due for some serious janitorial work, but without a specific goal in mind it'll be hard to focus your efforts.
Let's say you're trying to diagnose a bug. To understand what's going on you need to see it in isolation, so you need to write a test that triggers it. In trying to write a test, though, you realize that the way the object's methods are laid out can't be tested in isolation. Perhaps the methods need to be broken apart, and use parameters and return values instead of modifying object member variables directly. But to do that you may need to modify other places in the code which call the object. So here you are, changing code which has absolutely nothing to do with the bug you're trying to fix. Sound familiar? That's yak shaving, alright.
But in this case, now is the time to write that test and make those changes to that object. You could just struggle by and fix the bug without it, and then make a note to come back and write the test later, but you probably won't. Because you won't need it then. But the next time something in that same code breaks, suddenly it will all come into focus again...
So sometimes the yak DOES need a haircut. Just be mindful as you do it. Am I shaving this yak because it will be very valuable to my current goal and the project as a whole? Or am I doing it because I followed a rabbit trail of dependencies, and no one will care one way or the other whether the yak is shaved tomorrow?
Rails 2 Upgrade: ActiveResource
December 10, 2007 at 05:25 PM
The Rails 2 upgrade has been surprisingly painless on the half dozen or so production apps I've upgraded. One problem seems to be that not all the mirrors are updated, so you get "no such gem" and have to repeat the command a few times until it finds a mirror that has it. But the other one is that for some reason, the Rails 2.0.1 package doesn't find the newly-added activeresource dependency. So the full upgrade becomes:
1 gem update --no-rdoc --no-ri -y update 2 gem install --no-rdoc --no-ri -y activeresource
Note 1: I like to skip rdoc and ri since they take a lot of time to generate, and I never find myself using local docs. Especially on a server, which is where most of my upgrades were being executed.
Note 2: You can thankfully skip the -y option now if you're using the latest RubyGems, 0.9.5.
The activeresource dependency problem seems to happen everywhere - my Macbook, my Ubuntu workstation, and a handful of Debian (etch) servers I've updated. I'm surprised no one else has mentioned it.
Conditional Javascript Includes
December 03, 2007 at 04:46 PM
The Heroku toolbar floats on top of your app, providing access to some development functions.
There's quite a bit of magic to goes on to make this happen, some of which I might get into another time. But one thing we ran into recently was the matter of conditional javascript includes. We want to use Prototype and the effects libraries for the toolbar (and particularly for some of the upcoming features which will be visually complex). But since our html is being mixed in with the html from the user's app, and the user may or may not include Prototype and friends, we need a way to conditionally include these files.
There isn't a best practices sort of way to do this that I can see. But we did come up with a workable solution. Here's a static HTML proof of concept:
1 <html><body>
2
3 <!-- try commenting out the following line -->
4 <script src="prototype.js" type="text/javascript"></script>
5
6 <script type="text/javascript">
7 if (!window.Prototype)
8 {
9 document.write('<script src="prototype.js" type="text/javascript"><\/script>')
10 }
11 </script>
12
13 <div id="write_me"></div>
14
15 <script type="text/javascript">
16 $('write_me').innerHTML = "prototype is loaded"
17 </script>
18
19 </body></html>
The key here is window.Prototype. "if (Prototype)" will cause an uncatchable exception if Prototype is not defined. But since javascript globals are actually properties of the window object, we can check it there. Javascript doesn't complain at all on an access to an undefined object property.
There are some problems with this, such as the possibility of differing versions of Prototype in what the user's app has included and what the toolbar code expects. My biggest complaint with it is that it's just not that elegant. If anyone has other suggestions I'd be interested to hear them.