Adam @ Heroku
a tornado of razorblades

Read-Only Source Trees

July 02, 2008 at 03:06 PM

Cloud computing is on everyone's minds, because it offers the promise of infinite horizontal scalability. But to achieve this, we have to change how we build applications.

One such change is how we use the filesystem. The filesystem is unix's database. "Everything is a file" has served us well for decades, and that concept will continue to be critical at the systems layer. But at the application layer, it's time to stop treating the filesystem as a catch-all dumping ground, and start treating the data we store there in a more structured way.

An app's main use of the filesystem is sourcefiles. What qualifies as a sourcefile? Your code, sure - Ruby, ERB, HTML, Javascript, CSS, specs/tests, rake tasks. But also, small static assets that are part of the application's interface, like public/images/top_left_gradient.png and public/robots.txt. If you check it into revision control, then it is probably a sourcefile.

Other than sourcefiles, what do we stick on the filesystem? PIDs and logfiles come to mind. Anything that it is in tmp or log. This stuff is not source, which is probably why it's in your .gitignore. In my opinion it should not be in your application's directory structure at all.

How about user-uploaded assets, like profile pictures? attachment_fu offers a filesystem backend, which shoves files into your public/ dir. But these are not source - it's application data. It has more in common with the contents of the database: data specific to a particular installation of the app. Putting this data into your source tree is confusing.

More significantly, it greatly complicates the problem of scaling.

The correct solution, in my opinion, is to forbid access to the source tree by the web app. Temporary files can be offered through Ruby's Tempfile interface, with the understanding that files thus created are not accessible beyond the lifetime of the request being served.

Logs are a whole other challenge. I'm not a big fan of logfiles; there are better solutions to the logging problem, which I'll write about some other time. In the meantime, logs should go outside the code tree, some sort of /var-style location which can be cycled or thrown away as needed. This location could be write-only for the app; it pushes things in, but it can't read them back or otherwise access it once written. A one-way channel, ala syslog.

As for attachments, asset stores are the correct solution. attachment_fu's :storage => :s3 backend, for example. Storing in the database is reasonable, though I've always found a lot of frustration in trying to store large binary data in the database. Apps on Heroku can use the :storage => :heroku attachment_fu backend.

As we continue to explore the next generation of application deployment, I think we're going to bump into a number of ways to structure apps differently in order to make them scalable. There will be some transitionary pain with these changes, because structure implies restrictions. Many PHP developers coming to Rails have complained about not being able to access sessions from models, or write SQL in your view. MVC creates restrictions, yes, but those very restrictions are what provides the structure. Coming from an unstructured environment, those restrictions may seem cumbersome or arbitrary; but once you're in the habit, you come to appreciate the structure they create.

Cloud Computing Taxonomy

June 16, 2008 at 04:28 PM

Most agree that cloud computing is the Next Big Thing, but beyond that things get murky. Being such a new space means that there's not yet a consensus on what all the pieces are, and how they fit together.

Michael Crandell gives a good descrption of what he considers to be the three tiers of cloud computing: apps, platforms, and infrastructure. (He correctly puts Heroku on the middle tier.)

It gets a bit harder as you try to subdivide each tier. This diagram is one I draw a lot these days, but it's a little different each time, as we continue to discover new challenges about our slice of the cloud computing pie.