Service-Oriented Architectures
June 26, 2008 at 01:56 AM
I've always liked to build systems with a bunch of small apps that talk to each other through various protocols. Orion and I built TrustCommerce in this manner, and that gave it some pretty impressive fault-tolerance and scalability.
I had heard the term SOA (service-oriented architecture), but had always dismissed it as enterprisey talk. (Bland-yet-pompous three-letter acronyms make my brain turn off.)
At some point, it dawned on me that what I like to do - build small apps communicating with each other over the network - is exactly what SOA means. In its modern incarnation, service architectures use REST calls, which follows the unix tradition of small sharp tools, loosely coupled by a simple but extremely flexible protocol.
Heroku is no one app - even aside from all the server software and configs, our code is currently split among around two dozen apps, each with their own repository, and most with their own database. (Some have no database.) Most of these are Rails apps, though some are bare Ruby on a Mongrel or bare Rack on a Thin, and some are Sinatra apps.
This is why my Railsconf talk was about HTTP routing. When you've got dozens of apps, some of which respond to complex domains (for example, edit.*.heroku.com), a powerful http router outside your application VM becomes damned near indispensable.
Service architectures are the solution to the longtime problem of apps growing to monstrous proportions. Once you exceed a couple dozen models and/or controllers, it starts to be very hard for new developers to grock everything that's going on, and the barrier of entry becomes very high.
With a service architecture, each app has a simplicity that's reminiscent of the "make your own blog" sample apps you see in tutorials. Rails seems to better retain its beauty in this state. Probably everything does.
But it introduces new challenges. You've got dependencies between repositories - any time you change the interface, you have to be sure to roll out the server and client apps together. The relationship between internal apps is thus very similar to external ones - you need versioning and dependency management. Heroku has an app we call our architecture atlas which tracks all the components, dependencies between them, and documents their APIs.
Managing authentication becomes a big job. We do a lot with custom HTTP headers on this (again, one of the main topics in my Railsconf talk), but I've got my eye out for even more sophisticated solutions. OAuth is one that has piqued my interest.
Perhaps the hardest question is where to draw the dividing line between one app and the next. Does a given model go in app A or app B? And that requires a lot of hard thinking about your design. An app should do just one thing, and without having to touch other apps too much. This often means splitting an app apart, and occasionally fusing two apps back together. This is the same process as managing object classes within an app: as each item's responsibility within the architecture changes, code moves around.
I find it useful to think of each internal component as its own service that could potentially be spun off as its own company. If it has its own code repo, database, tests, API, and docs, then turning it into a standalone service would just be a matter of giving it a slick marketing name and putting up a website. It's not that you'd want to do this, mind you: but if your service architecture is designed well, it would be easy to.
Firefox REST Plugin
May 12, 2008 at 03:02 PM
While we wait for web broswers to become fully REST-capable, Poster is a handy Firefox plugin for sending any type of HTTP request, including all four verbs and different content types. I usually use rest-client at a rush shell for one-offs; but if you need your browser's cookies for a call that can't be authenticated with http basic auth, or you just want a dialog that shows all the options visually, Poster is quite handy.
Model + Controller = Unified Resource?
April 16, 2008 at 12:05 PM
Harry Fuecks thinks MVC might be overrated. In a world of web resources, why not combine models and controllers together? Imagine replacing ActiveRecord (or my new favorite, DataMapper) with a server-side version of ActiveResource. GET, POST, PUT, and DELETE are filled in automatically, and you can extend their functionality and add suport methods.
He also argues that what we really want is RUD, not CRUD. The distinction between create and update is subtle, and hardly important if you think of resources as documents. (Document-oriented databases are poised to replace relational databases - we may soon be thinking in terms of documents instead of records even on the backend.) Think of your filesystem, the seminal document-oriented database: when you write a file, does it matter if one exists already? Not usually. File.open('test.txt', 'w') { |f| f.puts "hello" } behaves the same way regardless of the whether a file by that name previously existed or not.
This server-side ActiveResource might look something like:
class Post < Resource has_one :author has_many :comments, :dependent => :destroy def get super end def put super end def delete super end end
It provides both the controller and the model. The superclass for each of the three public-facing methods handles writing it to the backend store; the relationships described by has_* give both the routing config and the database relationships.
This is more speculation than a real idea, but I found it to be an engaging thought experiment.
to_xml
March 16, 2008 at 07:57 PM
When serving up an ActiveRecord as resource, you can use options like :only and :except to to_xml. But in most cases, you want the same fields served every time - on create, update, and get. So put the options into the class itself:
def to_xml(options={}) options[:only] ||= [ :name, :width, :height, :created_at ] super end
REST Enlightenment
March 14, 2008 at 11:37 PM
REST appealed to me right from the get-go. But it's taken me a surprisingly long time to wrap my head around all of its implications. Some of my recent projects - building the fully RESTful Heroku API, working with Mike Clark on the Nested Resources recipe for his upcoming book, and writing rest-client - have allowed me to finally get a handle on how all the pricinples of REST fit together into a unified whole.
If you're a longtime member of the Church of REST, what I'm posting here may seem obvious. But if you're like I was six months ago - that is, a little confused by how you could make a useful application with only four verbs - then read on.
Raw HTTP
REST is hard to get precisely because it is so simple. It just seems like there isn't much to it. But it turns out that there are some subtle implications hidden within its seemingly straightforward principles.
First, let's get the easy one out of the way. REST is about URLs and HTTP. This part makes sense to most people right away. Hitting a URL is something that can easily be done from almost any language or programming environment; it's extremely transparent (and thereby discoverable and easy to debug) due to the simple nature of URLs and the fact that you can use a web browser to do simple testing; and using HTTP as the transport gives benefits like virtual hosts, easy proxying, and doesn't require running a separate daemon besides your webserver. In short, REST is the web, and this makes it vastly easier to use than the various RPC protocols whose corpses now litter the highway of software history.
But don't get tripped up on this - there's much more to REST than just HTTP. This part of it doesn't really have a name, but it should. "Raw HTTP," maybe? Raw HTTP does beat the heck out of SOAP, so it's a step in the right direction. But true REST devotees make fun of APIs which use only one HTTP verb (GET) and have their own custom verb embedded in the URL. Amazon is (in)famous for this, yielding SimpleDB calls that look like this:
GET https://sdb.amazonaws.com/?Action=PutAttributes&DomainName=MyDomain&ItemName=Item123&Attribute.1.Name=Color&Attribute.1.Value=Blue
A true REST API would look more like this:
PUT https://sdb.amazonaws.com/MyDomain/Item123
...with an input body containing an XML (or even CGI) representation of the data being updated. (Subbu Allamaraju explores what's wrong with the SimpleDB API in detail.)
Headers, Verb, Resource, Payload
So true REST calls divide up the HTTP protocol space into four sections: the headers, the verb, the resource, and the body. (I like to call the input body the "payload," which makes its purpose much clearer to me.) When you're designing a REST API, you have to make sure that each piece of data goes into the correct section, or else things turn into a mess. Let's look closer at each.
The headers are meta information about the request. Most are used by the webserver and don't concern the web app or its API, but two (Content-type and Accept) are important, since they describe the nature of the input body and output body.
The verb is the HTTP method: GET, POST, PUT, or DELETE. These map exactly to the standard CRUD operations: GET = read, POST = create, PUT = update, DELETE = delete. These are the only verbs you can use, and that's all there is to it. This rigid uniformity is a plus, but it takes a while to learn to adapt to it. More on that later.
The resource is what is described by the URL. It's a noun, and by itself does not describe any action. You can do things to a resource, but a resource itself doesn't do anything. (Hence the clever term "RESTful resource" - the resource is at rest, not moving.) Amazon and others who write raw HTTP APIs get flack for calling their APIs REST when in fact everything goes into the URL, ignoring the other three sections.
The payload is the input body of a POST or a PUT. (The other two verbs don't have payloads.) This point turned out to be key for my understanding of how to use REST in the real world. The payload is data you are sending to the server, for use in creating or updating a resource. The simplest example is a form full of data, but its uses are actually much more diverse and flexible than just forms.
The flexibility of payloads comes from the Content-Type header. With a standard web form, your content type will usually be application/x-www-urlencoded. This basically makes the payload of the POST be a bunch of CGI parameters. For years I thought of POST and its ability to stuff the parameters into the payload as nothing more than a way to avoid cluttering the URL. But once you start using different content types with your payloads for POST (and PUT), a whole new world of possibilities opens up.
For example, to upload a photo into a photo collection, you might do:
POST http://myphotos.example.com/users/adam/photos
...with the content-type set to image/jpg and the payload containing the binary data of the image.
How about importing an archive of images? Use the exact same resource! Just set the content-type to application/x-gtar and pass a payload containing the binary data of a tarball.
Updating an existing image could be:
PUT http://myphotos.example.com/users/adam/photos/12345
...again with a content-type of image/jpg and the binary data of the image. But what about updating the metadata on the image, like a caption? Same URL, but switch the content-type, perhaps to application/xml (or perhaps trusty old application/x-www-urlencoded).
How about fetching different types of data for a resource? Just mirror the process: it's now a GET, and we'll use Accept to ask for a different type of content in the returned body. For example, you could download the image by setting the Accept header to image/jpg; or download a collection of images by setting Accept to application/x-gtar.
This distinction between the verb, the resource, and the input and output bodies (each with a content type) is the core of what makes REST both useful and usable.
Status Codes
There's one other piece of the HTTP puzzle here, which is the status code returned by a request. In practice I've found there are three codes you really care about in designing or accessing an API: 200, 401, and 422.
200 means success, everything went fine.
401 is unauthorized, meaning that you failed to supply the proper credentials. Typically this means you need to check your http basic auth username and password.
422 is unprocessable entity. In a Rails app, this usually means that the attempt to create or save the record failed, and the returned body contains XML with the error messages.
Status codes can be treated something like the return value on a shell command; if it's bad, you should treat the result differently than if it was normal. Check out the processresult method in restclient.rb for an example.
Putting It All Together
Now, the big question - how do you make do with only four verbs? The answer is: more resources. In particular, nested resources, but not necessarily full model resources. Rather, think about dividing the domain space of your individual objects into subresources - whether or not this is represented by a separate record in your database.
I'll give you a particularly gnarly example straight from my own real-world experience: Heroku apps. We have a model, App, which maps to the apps table in the main database. This is our central model: everything revolves around the app. As a result, our first pass at the internal API on our backend ended up with an AppsController that had 20+ actions.
These actions make perfect sense: they're the verbs that we need for the App object. Some examples: rename, import, export, purgedatabase, restartserver. How in the world can we represent these with just CRUD operations?
The answer lies in a proliferation of subresources. To quote Mike Clark, the question that REST is always asking is: What's the resource? And if you let yourself break from a one-to-one mapping between resources and models, things get a lot more flexible. So let's map those actions I mentioned to REST, keeping in mind that we also have payloads with different content types available.
| action | verb + resource | payload |
|---|---|---|
| rename | PUT /apps/myapp | content-type: application/xml |
| import | PUT /apps/myapp | content-type: application/x-gtar |
| export | GET /apps/myapp | accept: application/x-gtar |
| db_migrate | PUT /apps/myapp/rake | payload: db:migrate |
| purge_database | DELETE /apps/myapp/database | |
| restart_server | DELETE /apps/myapp/server |
Check out the subresources on the bottom three. These aren't necessarily ActiveRecord models, though they could be. But conceptually, they represent a particular area of action. There may or may not be a Database object belonging to the App, but it makes sense to subdivide the problem space by implying a has-a relationship between app and database.
What's more, you may find that a designing a clean URL scheme gives a great deal of insight into the design for your internal architecture. If you create a REST subresource that isn't a full object, perhaps you should stop and think about making it one.
Rest Client 0.2
March 11, 2008 at 12:16 AM
Based on Dan Kubb's suggestion, I've implemented an ActiveResource-style accessor for rest-client. This also supports basic auth, so now:
r = RestClient::Resource.new('http://example.com/photos/', user, pass) r.put File.read('pic.jpg'), :content_type => 'image/jpg'
The static methods for raw URL access are still available as well. See the updated RDocs for details.
Rest Client
March 09, 2008 at 08:03 PM
REST is part of the Ruby Way. Which is why I'm surprised that every time I go to access a RESTful resource, I find myself writing some sort of ad-hoc rest client. Net::HTTP is too low level - you've got to write at least three or four fairly dense lines of code even for a relatively simple GET or PUT.
I was banking on ActiveResource being the defacto solution starting with Rails 2, but I was a bit disappointed when it was finally released. Its purpose is fairly narrow - accessing resources that are database-recordish and which operate completely in a certain XML format. But further, it doesn't (as near as I can tell) support nested resources, which cuts out about 70% of what I might want to use it for.
The only other thing I can find is this, which monkey patches open-uri to handle other kinds of verbs. Fine, but still a bit too low level.
While I was toying with Sinatra the other day, I realized that what I wanted was just the client-side equivalent of its controller syntax. So I threw together rest-client.
require 'rest_client' RestClient.get 'http://gemtacular.com/gems' RestClient.post 'http://myphotosite.com/users/adam/photos', File.read('pic.jpg'), :content_type => 'image/jpg' RestClient.destroy 'http://heroku.com/apps/myapp'
The middle one - post (or put) with a payload an non-xml content-type - is the one that interests me the most, and that I find hardest to do with other libraries. Particularly for a one-off on the command line. I'll usually hobble together a curl command with a bunch of obscure switches that I can never remember. But now, I've just added require 'rest_client' to my .rush/env.rb, so at the rush command line I can instantly access any REST resource on the web with an easy-to-remember one-liner.
I also threw together a test server at rest-test.heroku.com, to try out all the different verbs. It just echoes back the verb, resource you requested (wildcard routing will match anything), and info about the payload. Here's a session from my rush shell:
rush> RestClient.get "http://rest-test.heroku.com/some/resource" GET http://rest-test.heroku.com/some/resource rush> RestClient.put "http://rest-test.heroku.com/some/resource", home['pic.jpg'].contents, :content_type => 'image/jpg' PUT http://rest-test.heroku.com/some/resource with a 70335 byte payload, content type image/jpg rush> RestClient.delete "http://rest-test.heroku.com/some/resource" DELETE http://rest-test.heroku.com/some/resource
gem install rest-client if you'd like to give it a try. RDocs here.