Adam @ Heroku
a tornado of razorblades

Battling Wedged Mongrels with a Request Timeout

Posted by Adam Wiggins on June 17, 2008 at 12:06 AM

The dreaded "wedged Mongrel" - your app server stuck on one request, with others piling up, waiting infinitely for it to come free - is a problem all production Rails apps face sooner or later. The solution most commonly used is to restart the app servers frequently, via something like Monit, or just on a cron job.

But such solutions are just a band-aids which hide the real problem, which is that your code is getting stuck in an infinite loop, or waiting on an IO request which never returns. A better solution is to wrap all your actions in a timeout:

class ApplicationController < ActionController::Base
  around_filter :timeout

  def timeout
    require 'timeout'
    Timeout.timeout(30) do
      yield
    end
  end
end

This prevents the wedged app server. And combined with an exception notifier, you'll be able to see which requests are getting wedged, so that you can fix your code. (Periodic app server restarts are still needed to combat memory leakage - another problem entirely.)

I'm surprised that request timeouts aren't a standard part of web frameworks like Rails, application servers like Mongrel, or both. (If you've seen the "timeout" parameter for Thin or Mongrel, don't be fooled - it's not that kind of timeout.) After all, web requests aren't supposed to be long-lasting. Nginx or Apache will time out the request after 90 seconds or so anyway, but this doesn't stop your app server from grinding away infinitely on the request.

But there's a catch with Timeout. It uses Ruby threads, which only works as long as it's Ruby code that's getting stuck or taking too long. The second case - a system call that's getting stuck - is often the problem. So this will time out:

Timeout.timeout(3) do
  sleep 4
  puts 'done'
end

...but this will not:

Timeout.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Good unix jockeys know that SIGALRM is the correct solution here. Back in my MUD days I encountered this technique in the CircleMUD server: it would detect infinite loops and abort with a log message, allowing the game to continue running. "Wow," I said the first time I saw it in action. "How does it know?" That's the magic of SIGALRM.

Philippe Hanrigou and David Vollbracht have implemented a SIGALRM solution for Ruby in the form of SystemTimer. (They also give a great description of green threads and why they don't play well with the underlying OS.) This is a nearly drop-in replacement for Timeout. Try it:

SystemTimer.timeout(3) do
  system 'sleep 4'
  puts 'done'
end

Woot! So now, your final solution for preventing wedged app servers in production:

class ApplicationController
  around_filter :timeout

  def timeout
    require 'system_timer'
    SystemTimer.timeout(30) do
      yield
    end
  end
end
Tags: rails, unix
Hierarchy: previous, next

Comments

There are 7 comments on this post. Post yours →

do you also have the same memory leakege problem with sinatra?

Ethan

"your app server stuck on one request, with others piling up waiting infinitely it to come free"

I would enjoy some clarification on that.

If you're running three mongrels and one is stuck on a query that's returning 10,000 records, doesn't the next request get sent to one of the other mongrels while the first one deals with the long running request?

If not, then what is the point of having more than one mongrel? If mongrel B can't handle a request until mongrel A finishes, then isn't that effectively the same has having only one mongrel?

Another thing that puzzles me about your timeout solution: are you saying that if a request takes too long you just drop it? Doesn't that kind of screw the user who initiated the request? Wouldn't it be better to let the request complete? If you're just dropping requests here and there that doesn't really sound like a "solution."

@s - I've not run into any memory leakage with my Sinatra apps, but to be fair I've used it far less (and for far less complicated) apps than Rails.

@Ethan - Doesn't matter if you have 3 mongrels, 30, or 100. At some point, if you have infinite loops or infinitely stalled IO requests in your code, all your mongrels will get tied up. Having more mongrels helps, but it also is hiding the root problem.

And yes, when a request takes too long I feel the correct action is to throw an exception. (i.e. The user gets the same "something went wrong" page that they get whenever they hit any other error.)

Requests are already timed out by 1. your frontend webserver (Apache etc) and 2. the user getting bored and hitting "stop". Putting a timeout straight into your Ruby code means you can be aware of these problems, instead of your users experiencing problems with your site but you being unaware of it.

I'm firmly of the belief that stuff should fail fast and loudly, because problems tend to go unnoticed or ignored longer than if things sort of limp along.

I think if you set an upper timeout limit to constrain how long a request should take will only result in a better user experience. It'll also force you to think about really solving the problem by either optimizing the slow actions or offloading some of the work to the background. I'd even argue that 30 seconds is too long to wait before timing out, I tend to give up after 10-20 seconds of no activity in most cases.

stephen

Using SystemTimer.timeout in an around_filter led to our mongrels randomly dying. Your mileage may vary.

stephen, can you give us more information? Did the mongrels die with an exception, or no message in the output log? Did you try with Thin? etc.

Anyone have any instructions for Merb and/or Rack handlers?

In my first attempts to use this library I tried to wrap the call in a begin/rescue block. I wanted to explicitly rescue Timeout::Error so I could return a 504 Gateway Timeout error to the client. However I seem to be unable to rescue the exception.

Post a comment

Required fields in bold.