Saturday, March 24, 2007

Updated Grails vs Rails Benchmark

So after a couple of people pointed out my naivety in configuring Rails, I decided to re-run the tests. What I did was configure Rails with a 10 mongrel cluster and the Pound load balancer as per Jared's recommendation. However, to make things more equal I reduced the Grails Tomcat server's thread pool down to 10 by setting maxThreads=10 in Tomcat's server.xml.

The result was that Rails' performance degraded in all except the long running query test, whilst Grails' performance significantly improved in all except the same test. Clearly, since I have only dual core's on my MacBook giving Rails or Grails more processes doesn't necessarily improve things for the shorter tasks. Check out the updated benchmarks.

Again, I'm no Rails performance tuning wizard so if any Rails expert can suggest improvements to the Rails configuration please don't hesitate to shout.

7 comments:

Unknown said...

Check your memory when running under Mongrel, if your system is starting to swap due to RAM requirements of so many Ruby processes, that will have hurt the benchmark.

In testing I've done in the past, I didn't see Mongrel slow down like that, rather I saw minimal increases.

My typical Ruby process size was 30-50MB which means you'll need 300-500MB RAM free so that virtual memory swapping doesn't hurt your benchmark.

Increasing threads doesn't increase RAM requirements like increasing processes does. Not that more threads don't eat more ram, it's just usually an insignificant proportion.

Anonymous said...

I believe rails runs a significant amount of unnecessary code in 'development' mode. Try -e production.

http://www.alrond.com/en/2007/jan/25/performance-test-of-6-leading-frameworks/#comment-23

masukomi said...

I agree about running Rails in production mode for it to be an even comparison but I'm also really curious as to why Grails seemed to perform better under tomcat when you restricted it to 10 threads. I'm assuming the default is more than ten so... that just strikes me as really odd.

Graeme Rocher said...

I did configure rails in production mode, that was a copy-and-paste mistake from Jared's blog.

As for Grails & Tomcat, the more threads doesn't necessarily mean better performance on a machine with only dual cores.

Jochen "blackdrag" Theodorou said...

Maybe it would be really interesting to actually see the overhead produced by Groovy/Grails compared to JSP or something like that... maybe by doing a secret test ;)

I guess the increased performance for only 10 threads for grails is related to a synchronization problem in Groovy. There are some parts here that need optimization, especially the MetaClassRegistry, probably the GroovyClassLoader too. I think I also remember that a multiprocess application gets a penalty over a threaded application when run on a multicore CPU, I am not sure about this. I think it was something about the L2 cache, really not sure. A test on a machine with two real CPUs should show it, but on the other hand the future processor seems to be a multicore CPU. On the other hand future Ruby will maybe change the threading model too.

Then there is another thing about using multiple parallel processes... the general rule here is that you get an increased performance for each possible parallel process. But in real life these processes need to communicate with each other, and they need to be scheduled..

This communication overhead grows with the number of processes and it might happen, that due to this overhead, the overall performance becomes less then the performance with just a single process. I tried to find a picture showing that.. well just imagine the graph for sqrt(x)-x/100 It will grow fast, reach a maximum around 25 and then go down. Unlike this example a real case should never reach 0, it is really just to imagine what it looks like. There are surly better people to explain this ;)

Anyway, I guess it means that Mongrel is using too many processes, while Java is not yet in that state. Maybe the sated state is in Java above one thread and in Mongrel below one process.. I really don't know, running the tests with different amounts of processes/threads would show that.

Then there is one thing about java.util.Random. I am not sure about the performance of rnd in Ruby, but in Java the nextInt method is very slow compared to the rnd function known from many other applications. http://java.sun.com/j2se/1.4.2/docs/api/java/util/Random.html#nextInt(int) explains the reason, rnd is normally not doing such things. But hey, this is no scientific simulation and I guess the test depends not much on the performance of Random... which then tells me, that you maybe should replace it with other code?

The next problem with Random is, that used like this you won't get repeatable sequences. With the additional problem that I don't know how to get this working in Rails since all ruby processes don't share the same random number sequence offset, a simple use of srand won't solve that for Ruby then. Each process would use the same offset, repeating the same numbers. Letting them share that somehow would mean to have less performance for Ruby. On the other hand the Groovy solution uses a shared object here, but not thread safe. Isn't it funny how such a simple piece of code can lead to such problems?

Well, as I said, in the end I would replace that part with other code or keep it like it is. There is no real problem if the data does not really depend on the generated numbers.

oldmoe said...

I know I am too late to join the party but man have I been awed by the benchmark results!

Your machine was SWAPPING with the 10 mongrels + pound + whatever else you had running. This consumed your 1GB of Ram.

If you care to retry again then only run mongrels as your memory can handle (3 would be fine in your case)

And regarding steve's comment above mentioning minimal increases with increasing mongrels:
Unless there is some sort of IO congestion then your performance should increase almost linearly with mongrel instances up to the number of processor cores you have. From that point forward you will start to see minimal increases (depending on your IO usage patterns)

Anonymous said...

[...]The offline scripting tool might add some relief here, but it’s still sort of a pain. I’ve had to separate “build” from “configuration” steps.[...]