Welcome to End Point’s blog

Ongoing observations by End Point people

NoSQL benchmark of Cassandra, HBase, MongoDB

We're excited to have recently worked on an interesting benchmarking project for DataStax, the key company supporting the Cassandra "NoSQL" database for large horizontally-scalable data stores. This was done over the course of about 2 months.

This benchmark compares the performance of MongoDB, HBase, and Cassandra on the widely-used Amazon Web Services (AWS) EC2 cloud instances with local storage in RAID, in configurations ranging from 1-32 database nodes. The software stack included 64-bit Ubuntu 12.04 LTS AMIs, Oracle Java 1.6, and YCSB (Yahoo! Cloud Serving Benchmark) for its lowest-common-denominator NoSQL database performance testing features. Seven different test workloads were used to get a good mix of read, write, modify, and combined scenarios.

Because cloud computing resources are subject to "noisy neighbor" situations of degraded CPU or I/O performance, the tests were run 3 times each on 3 different days, with different EC2 instances to minimize any AWS-related variance.

The project involved some interesting automation challenges for repeatedly spinning up the correct numbers and types of nodes, configuring the node software, running tests, and gathering and collating results data. We kept the AWS costs more reasonable by using Spot Instances for most instances.

You can read more at DataStax's white paper page and see all the details in the white paper itself.


Anonymous said...

Just curious - did you come up with methodology and configuration for each DB in this exercise or did Datastax? Presumably they advised you on the best way to configure Cassandra. Who advised you on the best way to configure the other DBs?

Jon Jensen said...

We determined the configuration ourselves; details are in the whitepaper. There's really nothing unusual in the configuration for any of the 3 databases.

YCSB is a lowest common denominator benchmark, so it doesn't show off the capabilities of any of the databases very well.

To see if we were missing anything, we sought input from the MongoDB mailing list, but nobody gave any suggestions.