Welcome to End Point’s blog

Ongoing observations by End Point people

Cassandra, Thrift, and Fibers in EventMachine

I've been working with Cassandra and EventMachine lately, in an attempt to maximize write throughput for bulk loading situations (and I would prefer to not abandon the pretty Ruby classes I have fronting Cassandra, hence EventMachine rather than hopping over to Java or Scala).

The Thrift client transport for EventMachine requires the use of fibers. The documentation available for how fibers and EventMachine interact is not all that clear just yet, so perhaps documenting my adventures will be of use to somebody else.

A single fiber is traditionally imperative

EventMachine puts the I/O on background threads, but your use of the I/O interface will interact with it as if it's a traditional blocking operation.

#!/usr/bin/env ruby

require 'eventmachine'
require 'thrift_client'
require 'thrift_client/event_machine'
require 'cassandra'

def get_client'Keyspace1',
                :transport_wrapper => nil,
                :transport         => Thrift::EventMachineTransport)

def write(client, key, hash)
  puts "Writing #{key}."
  client.insert('Standard1', key, hash)
  puts "Wrote #{key}."
end do do
    client = get_client
    write(client, 'foo', {'aard' => 'vark'})
    write(client, 'bar', {'platy' => 'pus'})

The Thrift::EventMachine transport performs the actual Thrift network operations (connecting, sending data, receiving data) on a fiber in one of EventMachine's background threads. But it manages the callbacks and errbacks internally so the client behaves in usual blocking manner and does not expose the asyncronous delights going on behind the scenes.

Therefore, in the code snippet above, the "foo" row will be inserted first, and then the "bar" row. Every time. The output always is:

Wrote foo.
Wrote bar.

The above snippet is contrived, but it makes an import point: given two or more Thrift operations (like Cassandra inserts) that are logically independent of each other such that their order does not matter, you're not necessarily gaining a lot if those operations happen in the same fiber.

For concurrency, use multiple fibers

Now let's replace the above code sample's block with this: do
  @done = 0 do
    write(get_client, 'foo', {'aard' => 'vark'})
    @done += 1
  end.resume do
    write(get_client, 'bar', {'platy' => 'pus'})
    @done += 1                 
  EM.add_periodic_timer(1) { EM.stop if @done == 2 } 
You don't know how this is going to play out, but the typical output proves the concurrent operation of the two fibers involved:
Writing foo.
Writing bar.
Wrote foo.
Wrote bar.

If we were writing a larger number of rows out to Cassandra, we could expect to see a greater variety of interleaving between the respective fibers.

Note a critical difference between the two examples. In the single-fiber example, we issue the EM.stop as the final step of the fiber. Because the single fiber proceeds serially, this makes sense. In the multi-fiber example, things run asyncronously, so we have no way of knowing for sure which fiber will complete first. Consequently, it's necessary have some means of signifying that work is done and the EM can stop; in this lame example, the @done instance variable acts as this flag. In a more rigorous example, you might use a queue and a queue's size to organize such things.


Robert said...

thanks for the interesting article. However when I run a test and measure execution time, I don't get what the advantage is:

rounds = 100
start = do
@done = 0
rounds.times { |x| do
write(get_client_em, 'foo' + x.to_s, {'aard' => 'vark'})
@done += 1
EM.add_periodic_timer(1) { EM.stop if @done == rounds }
puts "Fiber: " + ( - start).to_s + "s"

start =
client = get_client
rounds.times { |x|
write(client, 'bar' + x.to_s, {'aard' => 'vark'})
puts "Linear: " + ( - start).to_s + "s"

Fiber: 1.222737s
Linear: 0.066181s (<0.4s if creating a new connection for each request)

Empty Cassandra 0.6.3 1 Node, Ruby 1.9.3, MacOSX

Am I missing something?

Robert said...

OK, if I have more write-heavy tasks, fiber seems to catch up. I don't get it faster than a serial implementation without running into connection errors (125 seems to be the maximum for me) but I get your point...

Ethan Rowe said...


I did some subsequent work with this and never was particularly happy with the performance I was getting in the Ruby space.

Another member of the project team wrote equivalent stuff with C++ and was, unsurprisingly, able to get a considerable improvement in write volume. But ultimately there were performance issues with the Thrift client itself, which I believe is true for Ruby as well.

In any event, given your deployment scenario, you ought to be able to get better overall throughput in the Ruby space through EventMachine, but for a real high-volume processing task, Ruby/Thrift is probably going to disappoint. But if you just need to make a handful of writes/request to Cassandra in a Ruby-based webapp, for instance, then I would think you ought to be able to get some advantage out of EventMachine. Is it going to be worth the trouble, though? I can't say.

So, as usual, it depends on your use case. I should note that this ought to have a bigger impact if your configuration involves writes to multiple nodes (a quorum write, for instance, with a replication factor of 3 or more). In this scenario, I would expect longer I/O waits per write, so having the wait state on a background thread while allowing the main thread to do more processing would be potentially helpful. If you're running Cassandra locally with no replication, it's not representative of a production deployment and the EventMachine stuff could well make no difference: the processor time you want for Ruby-space work while EventMachine parallelizes your I/O is competing with the processing needs of your local Cassandra to service the requests you just issued it.

It's early and I'm foggy-headed, so hopefully something in the above actually makes sense. :)

Thanks for the comments.

Ethan Rowe said...

Clarification: I said that "given your deployment scenario" that you ought to be able to get better throughput. But I don't know your deployment scenario; that should have been "given an appropriate use case and deployment scenario". :)

Robert said...

Hi Ethan,
thanks for the fast and clarifying answer! It does make sense to me.

Since Ruby is not really making use of my Quadcore and with full load Ruby + Fauna + Thrift taking 80% CPU vs. 20% for Java executing Cassandra there is only a chance for a better-than-break-even compensation for EM-overhead if Cassandra spends more CPU- or system time.

I'm just beginning with Ruby and want to use Cassandra for my database. With Cassandra 0.7 on the one end and Rails 3.0 on the other end it's hard enough to get anything working, so I'm concentrating on system design and backend stuff until hopefully things become more stable. So no deployment within the next time...

Any hints for _current_ Ruby/Rails/Cassandra stuff would be very welcome... ;-)

Greetings from Berlin,