CloudTran V2 – MVCC and Transactional Replication

Previously on Coherence we’ve supported TopLink Grid and a cache-level API with Pessimistic Locking.  For the 2.0 release, we’ve added two major new features to CloudTran – while continuing to support the previous features.

First,  MVCC (Multi-Version Concurrency Control) is added as an alternative to Pessimistic Locking on the cache-level API.  It supports:

-        consistent reads across multiple distributed cache entries or services.  This effectively gives a snapshot, with all the values being consistent with the time the reading transaction started.  For example, if you start a transaction at 9:01 and you want to read the price of an item which was updated at 9:00 and 9:02, you will get the 9:00 price – being the correct value as of 9:01.

-        multiple simultaneous updates.  If you want to calculate a new price and note it as “the current price” regardless of whether other threads are also calculating the price, then you can turn off update checking – so the price can be continuously updated, using a consistent snapshot of input values for each calculation.

-        non-conflicting updates.  You can specify write conflict checking – i.e. no intervening write can have occurred on any cache entry changed in a transaction.  You can also do the same, but for any entry read by the transaction as well; this is the MVCC equivalent of pessimistic locking and prevents a “write skew anomaly”.

Second, CloudTran 2.0 introduces transactional replication between data centers.  The key here is “transactional” across multiple cache entries (in different partitions); if you don’t need this, the Coherence push replicator or GoldenGate will make more sense depending on your system of reference.

By doing transactional replication, you know that the secondary grid will be consistent – updates are done atomically – so if the primary grid goes down, you can switch over to the secondary grid immediately the outage is detected.

One of the big issues in replication is the time to send to the other data center, which is typically 100ms or more.  It doesn’t make sense in a grid environment to wait that long, so what the CloudTran replicator does is to save the transactions to SSD at the sending data centre, and then relays them to the other grid while the transaction carries on locally.

Writing to SSDs only takes 13ms from our Java code, which is likely to be acceptable in most applications – to get the benefit of knowing the transaction will be replicated eventually (even if the replicator node or data centre goes down).  The replicator supports failover and dual-ported SSD drives, avoiding a single point of failure – it’s like a message appliance, but built on Coherence.

VoltDB and Millions of Operations Per Second … So?

I’ve been looking at VoltDB recently, and thinking about comparisons with Coherence (and CloudTran of course, although what follows is purely about Coherence).

The first thing that hits you about VoltDB is the huge performance numbers. For example, this piece about Node.js and VoltDB quotes 675,000 transactions/sec on 8 clients and 12 servers. As each transaction did three reads and one write, this amounted to 2.8 million data operations per second. Each of the 12 servers had 8 cores for a total of 96 server cores, which means the test ran at just under 30,000 data operations per server core per second.

When you talk about databases, there is an assumption of durability – you assume that the data is getting saved to permanent storage. However, VoltDB is an in-memory database, and in fact in the above piece, the headline test was done with “k-factor=0” – which means, no backups at all (and presumably no persistent storage, as disk or SSD wasn’t mentioned). In other words, this is a lot like a data cache – you send a request to the server, cleverly routed to the correct partition, do some calculations with the targetted value plus related data, then store the result. Which raises the question – if you do this sequence of operations in a data cache, do you get similar performance?

To test this out, we loaded up some reference data, then sent a request (in Coherence, an entry processor) which read the targetted value, read two items of reference data, and then stored a computed result. We only had 7 ropey old development boxes available, so we wrote the application so that both client and server applications ran in each JVM, with the client load spread equally across the servers.

The bottom line was that our Coherence rig was more that 1.5X faster than VoltDB *per core* – 44,000 data operations per second versus 29,000 – even though the Coherence test was running client and server on the same machine.

For good measure, we tried “k-factor=1” – i.e. one backup – and got 28,300 operations per second, which is very close to VoltDB with no backups.

You’ll have to forgive me for not publishing more detailed information, but we’d need to get Oracle’s permission to publish a ‘benchmark’ – and this wasn’t a proper benchmark, just a few hours of running through a few scenarios and pretty much straight out of the box config. However, if you do want more detail, you can run it yourself: I’ve attached the Eclipse project and scripts for server-side deployment here… CoherencePerformance.zip

Now…

the point of this article is not the numbers themselves, but to point out that a data grid like Coherence and an in-memory database are likely to give very similar performance when they are doing relatively simple operations because it’s all the rest of the systems that cost the time. In fact, it is uncanny how similar these numbers are to tests we did on another IMDG a few years ago on the same equipment.

In other words, VoltDB may be superfast compared to database systems that store information to disk, but not compared to data grids storing all information in memory.

Transactional Data Cache

Well, as promised I am following up on my blog that talked in general about transactions.  Today I want to discuss transactional Cache with persistence.

First what does it mean?  For me this is transaction that requires ACID like properties for in-memory data grids. They are committed in memory first and returned to the application then simultaneously go to disk and databases for persistence meeting ACID qualities.

What is required? The ability to scale distributed transactions across two to hundreds of nodes and then to disk and databases.   This is no trivial feat.  In fact, the complexity of first making it work is challenging enough.  How will performance be?  We all know 2PC and XA will not scale on performance in this architecture and in fact degrades.  Sure you can throw lots of hardware at the problem but it becomes very expensive to run and more expensive to maintain.  Which brings me to the next level of complexity.  What happens when a node goes down or a database goes down?  How do you dynamically reallocate resources and not lose transactions in the process?   What if your site goes down due to some catastrophic event? How does one handle uptime in worse case scenarios?  As they say the devil is in the details.

Then the question is who can address this complex problem getting performance and scale across let’s say 10 to 100’s of nodes?  It is not a big community of developers.  If it were there would have been an open source initiative already in place? Some developers will try to write it themselves yet is that really the best thing for the business or would an out of the box solution be better?   Businesses have to be aware of risk of a one off solution that requires constant attention and maintenance. And putting too many developers on this may create less than an ideal environment to solve the problem.  It takes dedication of a few very knowledgeable developers with tenacity, resilience, and time such as a few years to work through the use cases and edge cases.  No trivial matter here yet very important work.

Solving this problem for in-memory data grids allows any mission critical application to move to higher performance and scale at extreme affordability.  The cost of the business transaction could be a fraction of what it is today! With a few bright people it will and can happen.  This single-minded focus and effort on transaction management will bring costs down dramatically as well as flexibility for one business.   It will also open the in-memory market from several hundred million today to billions in a few years.   And with it Big Data will be redefined as well.

I hope I remember this now…

If Coherence is giving you an error message like:

    This cluster node failed to deserialize the config
    java.io.IOException: message contains more data then expected

Assuming you use the same cache-config etc., the answer is

    Member 1 has tangosol.pof.enabled=true.
    Member n has tangosol.pof.enabled=false.

Only the third time I’ve made this mistake.  This is not as stupid as it may seem – the new low-level CloudTran API will add Spring and AspectJ to Coherence, EclipseLink and TopLink Grid so it’s easy to overlook one line of config.

The root cause of this error is probably that you have asked Coherence for a cache before you have read the CloudTran config (e.g. when copy-and-pasting JUnit test suites … easy to end up skipping init code).  Try setting a breakpoint when you initialize the program and see if the CloudTran config properties trace (search for “loading config.properties”).  If not, reorganise so you initialize the CloudTran config properties first.  This will then set tangosol.pof.enabled=true, which is required by CloudTran.

The Transactional Supply Chain

Transaction management is much like FedEx or UPS managing movement of parcels from one location to another starting with payment, tagging with an ID, routing, monitoring/tracking and confirmation of delivery.  The package moves through multiple distribution centers and is confirmed at each location it travels through. When it is delivered, a final confirmation happens.

So using this analogy, let’s start with defining what is the Transactional Supply Chain for in-memory and cloud architectures.  First, it can include a variety of transactions, ACID Transactions for payment, commerce, financial commitment, etc., NO-ACID or what I will call “less than ACID” transactions for messaging, logging, updating catalogues.  Overall, there are several types of transactions, and there is not a one-fit-all scenario — a transaction is specific to the need based upon the business.  In addition, there are multiple data stores where persistence has to happen. For this example, let’s say 16 data stores/databases located across the world.

In the Transactional Supply Chain, let’s assume we have a Transactional Bus that manages transactions for purposes of placing an order, invoicing, updating inventory, shipping and messaging to the buyer.  The buyer may participate in social commerce where he or she brags about buying an item by sending a message (transaction) to others they know. If these others like it, they go and buy it, and we start the transactional bus moving down the transactional supply chain over and over again.  The information about these buyers can be aggregated in real time and sent to a big data analytics engine, a data warehouse, for example, so that analysis will help to understand both individual customers as well as demographic groups, creating more business.  Then once again a new buyer engages, which at some point triggers an ACID commerce transaction to confirm the order and payment, updates inventory, invoicing, shipping information, and tracks this transaction until confirmation of delivery.

The notion that one type of transaction can handle all of this has changed to what type of transaction is best used for in-memory grids and cloud architectures.  That said, whether it starts or ends with an ACID transaction, ACID transactions are an essential part of the transactional supply chain. Without this ACID transaction option we would be writing brittle transaction management code to maintain and monitor—very time consuming and very costly.

Well, if there were an out-of-the-box solution for distributed ACID transactions that could deliver high scalability and exception performance, much of the bally ho above would be unnecessary.  This would be a technology solution that business managers and developers obviously have a crying need for.  So for distributed transactions, if you could do them in a highly scalable way, with consistent performance for commit time, and quickly (e.g., milliseconds) in a scale out/grid environment, whether in-house or in a private, public or hybrid cloud… would this seem a much more reasonable choice?

So the world of transactions is NOT about one single transaction type. It is about what kind of transaction serves the best purpose for the business intent such as socializing, commerce, ordering, etc.  It is about a Transactional Supply Chain where a single database or data source is no longer enough!! Frankly, it is one of the biggest bottlenecks. The choking point for the web and worse for the grid, and even worse for the cloud is the database. Hence having a Transactional Bus to manage the transaction supply chain and the in-transit data movement to any and all data sources located around the world is essential.  Much like an application server layer there needs to be a data server layer.  A transactional bus independent of databases with transaction management and monitoring unbundled for flexibility, performance and scale to meet business needs seems so logical. Then why has someone not done it? Well, as you know, it is a very complex problem and some want to waddle in the “it can’t be done” camp.  Well, it can be done.

As we move to in-memory data grids and cloud architectures, the need for a Transactional Supply Chain will become a requirement and reality.  It will be a critically important business asset. Businesses will have much more flexibility to change and grow, the limits on transaction management will be lifted, and the transactional supply chain will include the full business life cycle of the transaction including such aspects as distributed ACID transactions uniting in-memory and disk for persistence, active to active transactional duplication, workflow management, aggregation for business intelligence, data policy management, active-state transaction tracking for alerts, customer tracking, and more.  These topics and more will be covered in future blogs.

It is an exciting time, and it all starts with the wonderful new way for businesses to be nimble transacting their business free of technology silos, free of limitations on transactions, and free to build your business with no limits!