CloudTran Home

 
  
<< Back Contents  >  3.  Product Concepts and Architecture Forward >>

3.4 Manager Scalability and Repartitioning


   This section describes implementation details. You do not need to read it if you just want to use CloudTran.   
The CloudTran [Transaction] Manager is highly scalable. Tests have proved it is practical to have 100 or more managers running simultaneously. This means that the total transaction throughput of a system can be very high - 10,000s or 100,000s of transactions a second are easily achievable.

The previous section gave an overview of the whole transaction process; this section gives more details of the manager subsystem.

 3.4.1  Scaling
 3.4.2  Repartitioning

3.4.1  Scaling
In this version of CloudTran, all transaction state information is placed into a transaction cache, called "ManagerControlCache". This is a standard Coherence NamedCache, and scales as a normal NamedCache does. Managers are storage-enabled for the ManagerControlCache, so each manager node added to the grid adds more transaction storage and processing capacity.

There are two standard approaches to scaling the transaction manager: with data nodes; and separate from data nodes.

Scaling managers with data nodes   is easy because you can deal with just one type of deployment node. It gives the best levelling of storage and CPU loads and fastest failover and recovery times.

Scaling managers separate from data nodes   is appropriate for large grids. For example, if you have hundreds nodes in the grid, it will be easier to tune and operate the transaction components if they are in a number of separate JVMs.

If managers are separate from data nodes, our recommendation is that there should be at least 3 managers to ensure that transaction components can continue operating with data backup if one manager goes down.
3.4.2  Repartitioning
Much of the work of repartitioning is done by Coherence - determine the new partition locations and move the data to its new location. However, CloudTran also has work to do.

CloudTran guarantees to work correctly in either of the situations that lead to repartitioning:

  • as long as there are no more than 'n' simultaneous Manager failures, where 'n' is the number of backups
  • when nodes are added to the cluster.

These are the same guarantees as for Coherence.

CloudTran saves two types of data in its cache ManagerControlCache service:

  • the transaction control information, such as the transaction ID, the timeout and various optimization flags
  • backups for the data from the various data nodes, with the old and new values for each changed object.
To ensure failover protection is possible, the service must have backup counter >= 1.

An important part of performance optimization is to minimise the writes of this data to the cache. Changes to a backed-up cache cause a network write hop and are therefore expensive, particularly when the data requires more than one packet. There is an optimisation that can avoid this: see the ct.manager.persistLingerTimeMillis config property.

The two main difficulties for CloudTran in handling repartitioning are

  • double-bounce of Entry Processors
  • on-going persistence.
The double-bounce issue arises because all manager operations in CloudTran-Coherence are implemented by EntryProcessors and a repartitioning can cause these to be re-issued. Therefore, all phases of the transaction management process must be idempotent; in other words, CloudTran ensures that second and subsequent calls have no effect on previously completed work and can restart any work not noted as being finished.

The on-going persistence issue arises because the commit() call to the manager can complete before the persistence is complete. This improves the response time experienced by the client application, but it means there must be worker threads in the manager that control the persistence process.

The problem for the manager receiving control of a transaction is that it must determine whether the original manager is still operational - in which case it might have queued the database transactions and will eventually action them. If this is the case, M2 (the second manager to control the transaction) must watch over M1 (the first manager) to ensure it stays healthy and can commit the transaction. If M1 subsequently fails before persisting the transaction, M2 takes over responsibility for persisting to the databases. This is the basic idea here; there are additional failover sequences that are catered for that need further handling, such as repartitioning (M2 takes over, M1 still alive) followed by failure (M2 fails while handling the consequences of the repartitioning).

If M1 has in fact failed, M2 must pick up where M1 left off and take control of the transaction.

All of these failure cases, and others, raise the possibility of a transaction being persisted twice at the database. There is nothing that CloudTran can do to prevent this: a manager may fail after the persist has reached the database but before the successful conclusion has been noted in the transaction manager's backup cache. To handle this, CloudTran first of all switches to use a single database transaction per CloudTran transaction - normally it aggregates many small transactions into one to improve database performance. For each single transaction, if the error is a primary key constraint failure, then it means that an insert has already successfully persisted.

While in a repartitioning state, CloudTran holds off all inbound start(), commit() and abort() operations; the CloudTran API at the client automatically retries after a suitable delay.

Copyright (c) 2008-2013 CloudTran Inc.