Battling Wedged Mongrels with a Request Timeout
June 17, 2008 at 12:06 AM
The dreaded "wedged Mongrel" - your app server stuck on one request, with others piling up, waiting infinitely for it to come free - is a problem all production Rails apps face sooner or later. The solution most commonly used is to restart the app servers frequently, via something like Monit, or just on a cron job.
But such solutions are just a band-aids which hide the real problem, which is that your code is getting stuck in an infinite loop, or waiting on an IO request which never returns. A better solution is to wrap all your actions in a timeout:
class ApplicationController < ActionController::Base around_filter :timeout def timeout require 'timeout' Timeout.timeout(30) do yield end end end
This prevents the wedged app server. And combined with an exception notifier, you'll be able to see which requests are getting wedged, so that you can fix your code. (Periodic app server restarts are still needed to combat memory leakage - another problem entirely.)
I'm surprised that request timeouts aren't a standard part of web frameworks like Rails, application servers like Mongrel, or both. (If you've seen the "timeout" parameter for Thin or Mongrel, don't be fooled - it's not that kind of timeout.) After all, web requests aren't supposed to be long-lasting. Nginx or Apache will time out the request after 90 seconds or so anyway, but this doesn't stop your app server from grinding away infinitely on the request.
But there's a catch with Timeout. It uses Ruby threads, which only works as long as it's Ruby code that's getting stuck or taking too long. The second case - a system call that's getting stuck - is often the problem. So this will time out:
Timeout.timeout(3) do sleep 4 puts 'done' end
...but this will not:
Timeout.timeout(3) do system 'sleep 4' puts 'done' end
Good unix jockeys know that SIGALRM is the correct solution here. Back in my MUD days I encountered this technique in the CircleMUD server: it would detect infinite loops and abort with a log message, allowing the game to continue running. "Wow," I said the first time I saw it in action. "How does it know?" That's the magic of SIGALRM.
Philippe Hanrigou and David Vollbracht have implemented a SIGALRM solution for Ruby in the form of SystemTimer. (They also give a great description of green threads and why they don't play well with the underlying OS.) This is a nearly drop-in replacement for Timeout. Try it:
SystemTimer.timeout(3) do system 'sleep 4' puts 'done' end
Woot! So now, your final solution for preventing wedged app servers in production:
class ApplicationController around_filter :timeout def timeout require 'system_timer' SystemTimer.timeout(30) do yield end end end
rush, the Ruby Shell
February 19, 2008 at 01:07 PM
The unix shell (bash) and remote login (ssh) are centerpieces of the server and app deployment process. While building Heroku, however, Orion and I became aware that these tools are pretty far out of step with modern, agile development practices.
I've wanted a Ruby-syntax replacement for the unix shell from almost the moment I began using Ruby. Whenever I can, I write shell scripts as Ruby scripts with lots of backticks. But the "everything is text" mechanism starts to show its age when you end up with Ruby code like this:
my_ip = `ifconfig | grep inet | grep -v 127.0.0.1 | grep -v inet6`.match(/inet ([\d.]+)/)[1]
Yergh. (If you've never had occasion to write code like this in the wild, just check out god's process lookup methods.)
What we really want - the modern way - is to query the unix system (filesystem, processes, network, services) as if they were a database. This avoids the fragility of text pipes, the complexities of firing up a complete new environment on each system call, and would allow unit tests of system-level code.
This is why I've created rush. It's a replacement for bash and ssh which uses Ruby syntax. More than that: it IS ruby. Imagine an irb shell in which you can do everything you can do at the unix command line, but without any backticks. That's the vision; what I've got so far is a good start in that direction.
I said it replaces ssh, so this isn't just a shell: you can use it to control an arbitrary number of remote boxes, using the exact same interface as you would locally. Copy a file, or grep through a logfile, or kill a process - whether the machine is remote or local, the interface is identical.
Unlike the character-based connection of ssh, the rush client connects to the rushd process on the remote server and passes discrete commands. This is very similar to connecting to a remote database. When you run a SQL query, it makes no difference to the programmer whether the connection is a remote box or a local one; the client handles this seamlessly. You can even connect to multiple databases from the same client. rush goes even a step further by allowing you to pass data seamlessly between any number of local and remote connections.
A quick example:
local = Rush::Box.new('localhost') remote = Rush::Box.new('my.remote.server') local['/etc/hosts'].copy_to remote['/etc/']
Check the rush website for more examples and to try it out.
One of the inspirations for rush as a shell was this preview of MSH, the Microsoft shell. I get the feeling that this is vaporware (though I don't really know, not being in the Microsoft world at all), but the concepts introduced in the preview really struck home. Treating data returned from shell commands - like file matches from grep or processes from ps - as discrete objecs, rather than text which can be parsed, is the obvious next evolution for shells.
There are some other deficiencies in the bash+ssh model:
- Consistency. Bash is a full-fledged programming language; more specifically, it's a DSL for managing a unix system. But it could also be considered a collection of smaller languages. Standard tools like cp, mv, ps, grep, sed, and sort all have their own unique syntax. You may combine several of these in a single command, which is a bit like mixing several different programming languages on one line. I've been using unix shells on a daily basis for well over a decade, and still I sometimes forget the syntax for a particular command. Compare this to Ruby, or any other modern scripting language, where just a few months of working with the language is enough to teach you 90% of the language's syntax.
- Quoting. Bash commands often have many layers of quoting. Consider:
ssh remote "rm `grep '^class Thing' lib/* -l`"This has four layers of quoting: the bash command line on the client, the bash command line on the server, the backticks, and the regexp. This leads to both confusion (do I need one backslash or two to escape this quote character?) and is riddled with security holes. - Quirks and limitations. Two that I frequently bump into are running out of space in the command line buffer space with backticks, such as:
grep some_method `find . -name \*.rb`On a large project, you'll need to rewrite this with xargs:find . -name \*.rb | xargs grep some_method
If the directory has filename with spaces in them, you have to use the null separator option on both find and xargs:find . -name \*.rb -print0 | xargs -0 grep some_method
Ick. In rush, this would be:dir['**/*.rb'].search(/some_method/)
- Exceptions. Bash commands have three outputs: stdout, stderr, and the shell return value. Most of the time you're only interested in one and can ignore the others. But for more advanced uses, you need two, or perhaps all three. Explicitly checking for return values (or worse, pattern matching against stderr) is not a lot of fun. Exceptions are the modern way to handle errors.
Go give it a try, and then tell me what you think.
Remote Filesystems
January 20, 2008 at 07:06 PM
The ability to mount a remote filesystem and read and write to it is something everyone has wanted to do for a long, long time. SMB was an ok solution in the 90s, at least for users with minimal needs. Another dinosaur that we all love to hate is NFS. Despite its synchronization and locking issues, NFS actually works pretty well. But it's a real headache when it comes to firewalls and certainly is not suitable for use over the open internet. Sysadmins will always tell you "don't use NFS, it sucks," but when you ask for a better alternative they just shrug and hastily excuse themselves from the conversation.
More recently, it seemed like WebDAV would be the answer. It uses a standard protocol (HTTP) and ties in nicely with the vision for a read/write web. It works perfectly with firewalls and can be encrypted with HTTPS. All major operating systems have out-of-the box support for mounting WebDAV shares. Most publicly readable svn repositories for open source projects, such as Rails plugins, use WebDAV.
And yet, this technology has not come into its own. It's still a huge headache to configure. Apache requires several add-on modules, an obscure syntax that tends to break in mysterious ways, and user management which requires a combination of editing your Apache config and the curmudgeonous htpasswd command line tool. Nginx has a DAV module, but it doesn't actually work. (I checked the source - it's missing major portions of the protocol like PROPFIND and OPTIONS.) Apache is the only real option for serving DAV, and who wants to run Apache anymore?
And all of that isn't even counting all the security issues of having write access to your source repositories controlled by the webserver. For example, a simple injection vulnerability in some PHP script running on another site hosted on the same server could grant the attacker immediate, full write access to the source repositories hosted there. (Some solutions: run a separate Apache process as a different user to serve DAV, or use virtualization to make a separate server instance for your source repos. Still, none of this is what I would call a cakewalk.)
Hmmm. Maybe the fact that people have been trying to build remote filesystem protocols for the last two decades and have yet to make a good one means something. Perhaps mounting remote filesystems is a fork in the tree of technology that should be abandoned. Great, but then what? We still need to host source control repos, share files, and manage content; and all of these things basically require filesystem (or filesystem-like) operations. The fact that there is such a profusion of protocols supported by FUSE proves the deep user thirst for remote filesystem mounting.
If a remote filesystem protocol is ever going to succeed, it needs to be simple. WebDAV was a good try but it's not simple enough. Could we make another pass, again using HTTP, but taking some of the lessons learned from REST and further paring down the features and capabilities?
So let's forget about permissions, symbolic links, and resource forks. We just want to get the list of files in a folder, and to read, write, and delete those files. Sound familiar? It's CRUD.
How about:
GET http://example.com/repo/ <entries> <folder>app</folder> <file size="307">Rakefile</file> <file size="8819">README</file> </entries>
Many things would go under the axe in this hypothetical protocol. Any kind of linking, for example. Hard and symbolic links, certainly, but also those old standbys, "." and "..". Both of these values can and should be determined from the path name.
There are two types of objects we're operating on: folders and files. Filesystem tools often blur the line between these. In the unix shell, for example, rm -rf <dir> will delete a directory and its contents, but rm <dir> will not delete an empty directory. For that you need rmdir. But rmdir won't delete the directory if it's not empty. Consistency is overrated anyway, right?
I'd be in favor of further paring down the operations by removing the ability to do any direct operations on folders, and instead create and remove them dynamically based on pathnames. So if an operation tries to write to a file in a folder that doesn't exist, it and all the parent folders will be created: the equivalent of mkdir -p. Once the last file is removed from a folder, it no longer exists. Renaming a folder would become an expensive operation on a large tree, since what you're really doing is renaming the paths of every contained file; but this may be a fair trade-off. Empty folders would not be possible. (I detect a hint of git's philosophy creeping into my reasoning on this part.)
There are a few other types of operations that filesystems are good at, but stateless web requests aren't. Two of these likely to affect developers are tailing a log and appending a log. If you have a log which is gigabytes in size, rewriting it every time you want to append a line is impractical. As is fetching it continually as you look for recently appended lines.
In this case I'd argue that this approach to logging should be abandoned, in favor of one that treats each log as a discrete object. I've been experimenting with this on Heroku by storing system logs into a database table. There are some challenges here in managing the large datasets produced; but dealing with huge logfiles isn't fun either, just one that has more prior art (e.g. logrotate).
One final point: locks. This has been the bane of NFS and I think DAV has some unpleasant complexity in this department as well. I'm going to once again put this item up on the chopping block. Making remote filesystems is hard problem, and locking is a hard problem: I don't see any reason to combine them and make things worse. Just because we're used to being able to get locking from the filesystem doesn't mean things should stay that way. Separation of concerns.
Maybe what I'm describing here is a document-oriented database. Since there's progress being made in that department, maybe we should all just sit tight with our crufty Apache WebDAV setups and wait for the day when filesystems are no longer relevant.
Or maybe hierarchical filesystems are soon to be outmoded, and what we really want is a document database with lookup by tag. So when you run your Rails app it doesn't look in app/models/*.rb for your models, but instead just queries the document database for all documents with the tags "ruby", "model", and a tag for the project name.
Enough daydreaming for one day.
SSH Tunnels
December 28, 2007 at 10:39 PM
And now for one of my favorite bits of black magic from the unix poweruser's toolkit: ssh tunnels.
Like most good tricks, this one is simple. It lets you bounce TCP traffic through an ssh connection. This is handy in a variety of situations, but the one I've used it most often for is to access a website which is available only inside of a corporate LAN. If you've got external ssh access, you can set up a tunnel that will let you point your browser at a local port to access that site.
The syntax is:
ssh -L [local_port]:[site_you_want_to_reach]:[remote_port] [ssh_host] [command]
This can be a little confusing, but there's actually only two variables that matter. So first I'll fill in the defaults you'll probably always want to use:
ssh -L 9999:[site_you_want_to_reach]:80 [ssh_host] "sleep 9000"
Much better. Now we just need to know the site we want to go to, and the host we will tunnel it through. For example, let's say we want to view RubyInside, but tunneled through your remote server example.com:
ssh -L 9999:rubyinside.com:80 user@example.com "sleep 9000"
You can now browse to http://localhost:9999/ and see RubyInside. (This won't work if the remote site uses named virtual hosts. You can rememdy this by adding the hostname (like rubyinside.com) to your /etc/hosts as an alias for localhost, and then point your browser to http://rubyinside.com:9999/.)
Since RubyInside is public, this example is not actually useful - but now that you see how it works, let's look at how it can be used to view a Rails app running inside a LAN on someone's workstation. Let's say the mongrel is running on port 3000 and that the workstation is devbox.localnet, but devbox.localnet is only accessible from your ssh bouncepoint, which we'll call bouncepoint.example.com.
ssh -L 9999:devbox.localnet:3000 user@bouncepoint.example.com "sleep 9000"
ssh tunnels can also be used to bypass those silly content filters that some companies insist on installing on their local LANs. This trick requires that you be able to send outgoing TCP traffic, and you have a remote ssh server listening outside the LAN. (If port 22 is blocked, you'll need to find one that isn't, and then get your remote ssh server to listen on that port.) By tunneling all your web traffic through ssh, you can bypass any content filters - and also get a completely secure connection that is impossible to easedrop on.
For example, if you wanted to view the latest Penny Arcade comic, but found that the client site you're working on for the day blocks it, you might use your remote VPS host to set up a tunnel like this:
ssh -L 9999:penny-arcade.com:80 vpsuser@vpshost.com "sleep 9000"
One final note: if you're wondering what the sleep 9000 is for. ssh requires a command, and when the command is done executing, it exits. So you have to give a command that will stall. I use 9000 because it's plenty long and easy to type. (When you're done, just hitting Ctrl-C in the terminal that's running the tunnel will close it.) But if you wanted it to last indefintely, you could use while [ 1 ]; do sleep 100; done.