VoltDB and Millions of Operations Per Second … So?

I’ve been looking at VoltDB recently, and thinking about comparisons with Coherence (and CloudTran of course, although what follows is purely about Coherence).

The first thing that hits you about VoltDB is the huge performance numbers. For example, this piece about Node.js and VoltDB quotes 675,000 transactions/sec on 8 clients and 12 servers. As each transaction did three reads and one write, this amounted to 2.8 million data operations per second. Each of the 12 servers had 8 cores for a total of 96 server cores, which means the test ran at just under 30,000 data operations per server core per second.

When you talk about databases, there is an assumption of durability – you assume that the data is getting saved to permanent storage. However, VoltDB is an in-memory database, and in fact in the above piece, the headline test was done with “k-factor=0” – which means, no backups at all (and presumably no persistent storage, as disk or SSD wasn’t mentioned). In other words, this is a lot like a data cache – you send a request to the server, cleverly routed to the correct partition, do some calculations with the targetted value plus related data, then store the result. Which raises the question – if you do this sequence of operations in a data cache, do you get similar performance?

To test this out, we loaded up some reference data, then sent a request (in Coherence, an entry processor) which read the targetted value, read two items of reference data, and then stored a computed result. We only had 7 ropey old development boxes available, so we wrote the application so that both client and server applications ran in each JVM, with the client load spread equally across the servers.

The bottom line was that our Coherence rig was more that 1.5X faster than VoltDB *per core* – 44,000 data operations per second versus 29,000 – even though the Coherence test was running client and server on the same machine.

For good measure, we tried “k-factor=1” – i.e. one backup – and got 28,300 operations per second, which is very close to VoltDB with no backups.

You’ll have to forgive me for not publishing more detailed information, but we’d need to get Oracle’s permission to publish a ‘benchmark’ – and this wasn’t a proper benchmark, just a few hours of running through a few scenarios and pretty much straight out of the box config. However, if you do want more detail, you can run it yourself: I’ve attached the Eclipse project and scripts for server-side deployment here… CoherencePerformance.zip

Now…

the point of this article is not the numbers themselves, but to point out that a data grid like Coherence and an in-memory database are likely to give very similar performance when they are doing relatively simple operations because it’s all the rest of the systems that cost the time. In fact, it is uncanny how similar these numbers are to tests we did on another IMDG a few years ago on the same equipment.

In other words, VoltDB may be superfast compared to database systems that store information to disk, but not compared to data grids storing all information in memory.

Transactional Data Cache

Well, as promised I am following up on my blog that talked in general about transactions.  Today I want to discuss transactional Cache with persistence.

First what does it mean?  For me this is transaction that requires ACID like properties for in-memory data grids. They are committed in memory first and returned to the application then simultaneously go to disk and databases for persistence meeting ACID qualities.

What is required? The ability to scale distributed transactions across two to hundreds of nodes and then to disk and databases.   This is no trivial feat.  In fact, the complexity of first making it work is challenging enough.  How will performance be?  We all know 2PC and XA will not scale on performance in this architecture and in fact degrades.  Sure you can throw lots of hardware at the problem but it becomes very expensive to run and more expensive to maintain.  Which brings me to the next level of complexity.  What happens when a node goes down or a database goes down?  How do you dynamically reallocate resources and not lose transactions in the process?   What if your site goes down due to some catastrophic event? How does one handle uptime in worse case scenarios?  As they say the devil is in the details.

Then the question is who can address this complex problem getting performance and scale across let’s say 10 to 100’s of nodes?  It is not a big community of developers.  If it were there would have been an open source initiative already in place? Some developers will try to write it themselves yet is that really the best thing for the business or would an out of the box solution be better?   Businesses have to be aware of risk of a one off solution that requires constant attention and maintenance. And putting too many developers on this may create less than an ideal environment to solve the problem.  It takes dedication of a few very knowledgeable developers with tenacity, resilience, and time such as a few years to work through the use cases and edge cases.  No trivial matter here yet very important work.

Solving this problem for in-memory data grids allows any mission critical application to move to higher performance and scale at extreme affordability.  The cost of the business transaction could be a fraction of what it is today! With a few bright people it will and can happen.  This single-minded focus and effort on transaction management will bring costs down dramatically as well as flexibility for one business.   It will also open the in-memory market from several hundred million today to billions in a few years.   And with it Big Data will be redefined as well.