3.4 Manager Scalability and Repartitioning
The CloudTran [Transaction] Manager is highly scalable. Tests have proved it is practical to have 100 or more
managers running simultaneously. This means that the total transaction throughput of a system
can be very high - 10,000s or 100,000s of transactions a second are easily achievable.
The previous section gave an overview of the whole transaction process; this section gives more details of the manager subsystem.
This section describes implementation details. You do not need to read it if you just want to use CloudTran.
In this version of CloudTran, all transaction state information is placed into a transaction cache, called "ManagerControlCache".
This is a standard Coherence NamedCache, and scales as a normal NamedCache does.
Managers are storage-enabled for the ManagerControlCache, so each manager node added to the grid adds more transaction storage and processing capacity.
There are two standard approaches to scaling the transaction manager: with data nodes; and separate from data nodes.
Scaling managers with data nodes is easy because you can deal with just one type of deployment node.
It gives the best levelling of storage and CPU loads and fastest failover and recovery times.
Scaling managers separate from data nodes is appropriate for large grids.
For example, if you have hundreds nodes in the grid, it will be easier to tune and operate the transaction components
if they are in a number of separate JVMs.
If managers are separate from data nodes, our recommendation is that there should be at least 3 managers
to ensure that transaction components can continue operating with data backup if one manager goes down.
Much of the work of repartitioning is done by Coherence - determine the new partition locations and
move the data to its new location. However, CloudTran also has work to do.
CloudTran guarantees to work correctly in either of the situations that lead to repartitioning:
- as long as there are no more than 'n' simultaneous Manager failures, where 'n' is the number of backups
- when nodes are added to the cluster.
These are the same guarantees as for Coherence.
CloudTran saves two types of data in its cache ManagerControlCache service:
To ensure failover protection is possible, the service must have backup counter >= 1.
- the transaction control information, such as the transaction ID, the timeout and various optimization flags
- backups for the data from the various data nodes, with the old and new values for each changed object.
An important part of performance optimization is to minimise the writes of this data to the cache.
Changes to a backed-up cache cause a network write hop and are therefore expensive, particularly when the data requires more than one packet.
There is an optimisation that can avoid this: see the
ct.manager.persistLingerTimeMillis config property.
The two main difficulties for CloudTran in handling repartitioning are
The double-bounce issue arises because all manager operations in CloudTran-Coherence are implemented by EntryProcessors
and a repartitioning can cause these to be re-issued. Therefore, all phases of the transaction management
process must be idempotent; in other words, CloudTran ensures that second and subsequent calls have no effect
on previously completed work and can restart any work not noted as being finished.
- double-bounce of Entry Processors
- on-going persistence.
The on-going persistence issue arises because the
commit() call to the manager
can complete before the persistence is complete. This improves the response time experienced by the client application,
but it means there must be worker threads in the manager that control the persistence process.
The problem for the manager receiving control of a transaction is that it must determine
whether the original manager is still operational - in which case it might have queued the
database transactions and will eventually action them. If this is the case,
M2 (the second manager to control the transaction) must watch over M1 (the first manager)
to ensure it stays healthy and can commit the transaction. If M1 subsequently fails
before persisting the transaction, M2 takes over responsibility for persisting to the databases.
This is the basic idea here; there are additional failover sequences that are catered for
that need further handling, such as repartitioning (M2 takes over, M1 still alive) followed by failure
(M2 fails while handling the consequences of the repartitioning).
If M1 has in fact failed, M2 must pick up where M1 left off and take control of the transaction.
All of these failure cases, and others, raise the possibility of a transaction being persisted twice at the database.
There is nothing that CloudTran can do to prevent this: a manager may fail after the persist has reached the
database but before the successful conclusion has been noted in the transaction manager's backup cache.
To handle this, CloudTran first of all switches to use a single database transaction per CloudTran transaction -
normally it aggregates many small transactions into one to improve database performance.
For each single transaction, if the error is a primary key constraint failure, then it means
that an insert has already successfully persisted.
While in a repartitioning state, CloudTran holds off all inbound
the CloudTran API at the client automatically retries after a suitable delay.