CloudTran Home

 
  
<< Back Contents  >  6.  The Low-Level API Forward >>

6.2 MVCC Transaction Handling

In addition to the basic Cache operations, the Low-Level API provides MVCC (Multi-Version Concurrency Control) support and additional options for client transaction handling. Note that these features are not available via the TopLink Grid API.

This section describes the operation of MVCC transactions. The next section describes distributed timestamps, which can optionally be used for MVCC transactions.

All of the statements in this section concerning the operation reads and updates assume that the clients involved have started an MVCC transaction. Clients outside of a transaction can read the cache but they only see the most recently committed value.

 6.2.1  MVCC Compared With Pessimistic Locking
 6.2.2  MVCC or PL Caches
 6.2.3  Isolation Levels
 6.2.4  Update Checks
 6.2.4.1      Write Skew Anomaly
 6.2.5  Client Caching
 6.2.6  Prepare From Manager
 6.2.7  Writes Per Updated Entry
 6.2.8  Writes Per Read Entry
 6.2.9  Scenarios
 6.2.10  Consistent Reads in MVCC

6.2.1  MVCC Compared With Pessimistic Locking
With Pessistic Locking (PL), the cache only holds a single committed value and a single dirty value (i.e. one that is currently part of a transaction).

With MVCC, the cache holds multiple committed values, each with a distinct timestamp.

With PL, there are two types of read:

  • read outside of a transaction. This gets the last committed value. It does not fail if another transaction has it locked - it just reads the committed value.
  • read under a transaction. This locks the cache entry for read and write. It fails if another transaction has the entry locked.
"Read under a transaction" prevents
  • write-write conflicts (two threads trying to write to the same entry)
  • write skew (where a transaction bases a calculation on a value which has been updated while the transaction is in process)
  • catches deadlocks, because if any transactions have entries in common, one of them will win and proceed and the other will fail. The possible downside is that if many entries are involved in transactions, a high percentage of transactions can fail.
However, PL does not give a point-in-time snapshot across multiple entries - whereas MVCC does. With MVCC, CloudTran guarantees that all values read by a transaction are the values most recently committed before the transaction started.

The update checks that CloudTran performs on Pessimistic Locking transactions are built-in - there are no options to specify here:

  • any access to a cache entry from one transactions locks the entry against reads or writes from other transactions. Other transactions will get a TransactionExceptionLockConflict; the application will normally rollback and retry.
  • there are no guards against conflicting updates from the same transaction. This may be an issue in multi-client transactions; if so, the application must protect against it by other means.

6.2.2  MVCC or PL Caches
CloudTran uses MVCC or PL depending on the configuration for the caches involved in a transaction - see the ct.coherence.mvcc configuration property for details.

A transaction in CloudTran can involve many caches. It is possible to have some caches being configured for MVCC and others for PL, but these must not be mixed in one transasction: the caches involved in a transaction must be either all MVCC or all PL.


6.2.3  Isolation Levels
CloudTran uses implied isolation levels. This means there is no mechanism for the programmer to specify READ_COMMITTED or any of the other ANSI/ISO isolation levels. Instead, the following table sets out the actions for reads (i.e. cache.get()) and writes (cache.put/remove()), without and within transactions:
Read No TxThe last committed value. In MVCC caches, this will be the value set by the most-recently committed transaction.
PL The last value written by this transaction, or, if none, the committed value.
MVCC The last value written by this transaction, or, if none, the last value committed before the transaction started.
WriteNo TxRemoves all transactional values and replaces the complete cache entry.
PL If there is an existing dirty value set by another transaction, gets a lock conflict error. Otherwise, adds a dirty value for this transaction.
MVCC Writes the dirty value for this transaction - overwriting a dirty value, if one was previously set by this transaction. There are a number of additional conditions that affect the action on MVCC update, as described below.

6.2.4  Update Checks
For MVCC, you specify the update checks to be applied via DefaultCTxDefinition.mvccUpdatePolicy; the UPDATECHECK_* constants are defined in CTxDefinition. When update checks are present, the manager performs the MVCC transaction as follows:
  1. CloudTran sends a 'prepare' to the cache entry The prepare checks for other read or write operations on this cache entry that conflict with this entry and fails the transaction if any are present. For example, if Tother has previously sent a prepare for a write, a similar prepare from Tthis will fail. The prepare writes a 'pending update' decoration on the entry; this is durable in case of repartitioning.
  2. Once the checks have been done, the manager marks the transaction as committing, at the same time allocating the (commit) timestamp. Note that allocating the commit timestamp at this point guarantees it occurs after all the update checks and before the new values are committed in the cache. The commit timestamp is used when another transaction does a (consistent) read under a transaction: the start timestamp of the reading transaction is checked against the timestamp of the commits against a given entry.
  3. The manager then sends instructions to the caches, for each entry referenced in the transaction, to remove pending update and, if successful, commit any updates.
This approach is equivalent to 2-Phase Commit (2PC) in distributed database transactions, and is necessary for the same reason: the manager cannot commit an entry until it is sure all updates in the transaction are acceptable.

CloudTran does the checks against conflicting updates as the transaction is preparing to update an entry in the cache; the manager sends a "prepare" request to all the cache entries involved in a transaction. If all the prepare operations are successful, the transaction is then committed; otherwise it is rolled back.

Note that the read and update processing uses different mechanisms. Reads choose between committed values based on the transaction commit timestamp; this is the time when the new value first becomes visible to other clients. Updates are prevented by the presence of a pending update from another transaction (because the commit has not happened yet so there is no commit timestamp for pending updates!).
The description below uses Tthis as the current transaction and Tother as another, potentially conflicting, transaction:

There are 3 options:

  1. UPDATECHECK_READWRITE  Updates are allowed if none of the cache entries read or written in Tthis have been updated after Tthis started. In other words, if Tother has updated any entry present in this transaction since the start time of Tthis, Tthis is rolled back.

    This option catches "write skew" conflicts that can occur with MVCC.

    Unlike some database algorithms, CloudTran does not keep the time for each individual read and write (because the time to take the timestamp is non-negligible). Instead, it just keeps the start time and the commit time for each transaction. See Write Skew Anomaly below for the theory behind this option.

    Below are a couple of examples to show interaction of Tthis and Tother with the update check UPDATECHECK_READWRITE.

    Example 1.
    i. Tthis.start() Tthis is started with UPDATECHECK_READWRITE.
    ii. Tother.starts() The Update check setting for Tother is immaterial
    iii. Tthis.get( key1 ) Tthis will continue as Tother has not updated entry( key1 ).
    iv. Tother.put( key1, value1 ) Tother updates entry( key1 )
    v. Tthis.put( key1, value2 ) Tthis will be rolled back. Tother has already amended entry( key1 ), so Tthis will fail the consistency check on the write.

    Example 2.
    i. Tthis.start() Tthis is started with UPDATECHECK_READWRITE.
    ii. Tother starts. Tother is started
    iii. Tthis.get( key1 ) Tthis will continue as Tother has not updated entry( key1 )
    iv. Tother.put( key1, value1 ) Tother updates entry( key1 )
    v. Tthis.get( key1 ) Tthis will be rolled back. Tother has already amended entry( key1 ), so Tthis will fail the consistency check fail on the read.

  2. UPDATECHECK_WRITE  This is the same idea as UPDATECHECK_READWRITE, but only entries updated (not just read) in Tthis are checked for conflicting updates by Tother.

    This option catches "write write" conflicts.

    The test is based on the timestamp associated with the different versions of value in MVCC. If the latest entry ***IM*** was not written by Tthis and has a write timestamp greater than the timestamp taken at the start of this transaction, the update fails because of the write conflict.

    Below are a couple of examples to show interaction of Tthis and Tother with the update check UPDATECHECK_WRITE.

    Example 1.
    i. Tthis.start() Tthis is started with UPDATECHECK_WRITE.
    ii. Tother.starts() The update check setting for Tother is immaterial
    iii. Tthis.get( key1 ) Tthis will continue as Tother has not updated entry( key1 ).
    iv. Tother.put( key1, value1 ) Tother updates entry( key1 )
    v. Tthis.put( key1, value2 ) Tthis will be rolled back. Tother has already amended entry( key1 ), so Tthis will fail the consistency check on the write.

    Example 2.
    i. Tthis.start() Tthis is started with UPDATECHECK_WRITE.
    ii. Tother starts. Tother is started
    iii. Tthis.get( key1 ) Tthis will continue as Tother has not updated entry( key1 )
    iv. Tother.put( key1, value1 ) Tother updates entry( key1 )
    v. Tthis.get( key1 ) Tthis will continue. Although Tother has already amended entry( key1 ), the consistency check for Tthis will only fail on a write and not on the read

  3. UPDATECHECK_NONE   There are no conflict checks when an entry in Tthis is updated. This may be appropriate in a pricing system, that is based on historical data: the price of a financial instrument or a bet will fluctuate based on previous history, but is constantly being updated based on multiple on-going calculations. For highly volatile prices, it is appropriate to let multiple clients change the price without reference to each other.

    Below is an example to show interaction of Tthis and Tother with the update check UPDATECHECK_NONE.

    Example 1.
    i. Tthis.start() Tthis is started with UPDATECHECK_NONE.
    ii. Tother.starts() The update check setting for Tother is immaterial
    iii. Tthis.get( key1 ) Tthis will continue as there are no consistency checks
    iv. Tother.put( key1, value1 ) Tother updates entry( key1 )
    v. Tthis.put( key1, value2 ) Tthis will continue as there are no consistency checks
    iv. Tother.commit() Totherthe entry will now be( key1, value???? )
    iv. Tthis.commit() Tthisthe entry will now be( key1, value???? )


6.2.4.1  Write Skew Anomaly
The MVCC update policy UPDATECHECK_READWRITE is designed to avoid the 'write skew anomaly'. Here is a description of the write skew anomaly from the Wikipedia article on Snapshot Isolation:
In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other.

Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies.

As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 = 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2.

If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 = 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 = 0. Since the transactions must serialize, either T1 happens first, leaving V1 = -$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 - $200) is now -$200), or T2 happens first and similarly prevents T1 from committing.

Write skew has earlier been called a "read-write anomaly": the outcome of a calculation will depend on whether a value is read before or after the value was written. See Concurrency Control in Distributed Database Systems.

The UPDATECHECK_READWRITE policy ensures that transactions will fail if there is a possibility that the read (by Tthis) could have happened before the write (by Tother), and so prevents an undetected write skew anomaly.


6.2.5  Client Caching
With MVCC, it is possible to switch on client caching. When client caching is enabled
  • all reads (cache.get()) are remembered in the transaction's local cache. Reads try the local cache first and only go to the Coherence cache if the entry has not yet been read.
  • all writes (cache.put() etc.) are written into the local cache (in the current thread) if the value has already been read - in which case, they do not write to the Coherence cache until the transaction is committed.
The client cache is attached to a transaction, running on one thread at a particular node. When the transaction ends, the cache is cleared and discarded. In a multi-client transaction, each client has a different cache and there is no sharing between client caches.

The values in the cache are gathered to the client from the various cache storage nodes and are specific to that transaction. For example, if we looked at two different client caches at the same instant, we could well see different values for an entry because of the 'point-in-time' nature of reads in MVCC.

The restrictions on Client Caching are:

  1. It only applies to MVCC; in pessimistic locking, reads and writes always go to the Coherence cache
  2. EntryProcessors and MapTriggers that change a value should not be used when client caching is on, because they will not interact correctly with the client cache. Similarly, calling Invocables that then write into the (remote) cache should not be used. CloudTran catches direct calls to EntryProcessors when client caching is enabled and rolls back the transaction. CloudTran does not catch indirect EntryProcessor calls via Invocables and MapTriggers that change a value; this must be done by the design of the application, e.g. by having different nodes.
  3. In multi-client transactions, the possibility of conflicts arises: two clients can attempt to update the same cache entry. The values are written independently into the two client caches; there is no centralized time stamped on the updates, so the updates are not ordered.

    Because the updates are not ordered, this is considered to be a write conflict - even if UPDATECHECK_NONE is specified - and the transaction manager fails the commit.

    This means that, if you use client caching, you must ensure that the multiple clients do not write to the same cache entry.

    Note that when client caching is off, updates are made directly into the Coherence cache entry, and they are ordered by CloudTran. This means that, when the udpates arrive at the transaction manager from different clients, it can work out the correct final state from the order. However, note that it is still the programmer's responsibility to ensure that the order of updates within a transaction is correct.

The reason for using client caching is that it can reduce the number of network calls made to process a transaction. In certain situations, this can improve performance significantly.

By default, there is no client caching - all updates are made to the Coherence cache. Client caching is enabled by the CTxDefinition.useClientCache field. The default for this field can be overridden by setting the ct.coherence.useClientCache configuration property.

Below is an example to show interaction of Tthis and Tother the update check UPDATECHECK_NONE.

i. Tthis.start() Tthis is started with clientCaching.
ii. Tother starts. Tother is started with clientCaching
iii. Tthis.get( key1 ) Tthis The entry( key1 ) will not exist in the clientCache for Tthis. The Coherence cache will be read and the entry( key1, value1 ) will be added to the client cache for Tthis.
iv. Tother.get( key1 ) Tother The entry( key1 ) will not exist in the clientCache for Tother. The Coherence cache will be read and the entry( key1, value1 ) will be added to the client cache for Tother.
v. Tother.put( key1, value2 ) Tother The entry( key1, value2 ) will be added to the clientCache for Tother, but not to the coherence cache.
vi. Tother.put( key1, value3 ) Tother The entry( key1, value3 ) will be added to the clientCache for Tthis, but not to the coherence cache.
vii. Tthis.get( key1 ) Tthis The entry( key1 ) exists in the clientCache for Tthis. The value( value 3 ) is returned to the client.
viii. Tother.get( key1 ) Tother The entry( key1 ) exists in the clientCache for Tother. The value( value 2 ) is returned to the client. So, two transactions have different values for the same key in their client caches.
ix. Tthis.commit() TthisThe entry( key1, value ) is written to the Coherence cache. From this point on the transaction will follow the consistency checks (as described above) and could potentially fail depending on the update check settings for the transaction.
viii. Tother.get( key1 ) TotherThe entry( key1, value ) is written to the Coherence cache. From this point on the transaction will follow the consistency checks (as described above) and could potentially fail depending on the update check settings for the transaction.


6.2.6  Prepare From Manager
When client caching is disabled, updates (cache.put() or cache.remove()) are written into the distributed cache. There are two ways this can be done.

The default - and the simplest to understand - is where the client sends in a 'Prepare'. The Prepare instruction is the first part of the Two-phase commit protocol. So an update call from the client (i.e. a put or remove call, or EntryProcessor invoke) is actually doing the first step of the commit process - before the commit is called. The benefit of preparing from the client is that it saves an extra call into the grid because

  • If the client prepares, updates into the grid also mark the entry as prepared if successful. Any consistency failures are failed immediately.
  • If the client doesn't send in the prepare, the udpates purely use the cache entries to hold the values, without any consistency checks. The transaction manager must then send another message to each cache node to prepare the updates, and catch any consistencies.

Prepares from the manager for a given transaction are enabled by the CTxDefinition.prepareFromManager property. The default is normally false; this can be overridden by setting the ct.coherence.prepareFromManager configuration property. By default, there is no client caching - all updates are made to the Coherence cache.


6.2.7  Writes Per Updated Entry
The update check and client caching options have implications for the number of writes to each entry updated in a transaction.
  1. With client caching enabled and no update checks (e.g. UPDATECHECK_NONE), there is 1 write per update. The client sends all the changes to the manager at commit time. In a multi-client transaction, the manager then checks for conflicting writes to the same entry. If there are none, it simply commits the entries into the cache immediately.
  2. With client caching enabled but some update checks on a cache entry, there need to be two writes. The first is the Prepare, which does the update check and leave the value in the cache as a placeholder. After all the Prepare messages have finished, the second write either commits or discards the entries.

    The same occurs if client caching is disabled, even if there are no update checks. This is because, when the write to the cache is made, the transaction is not committed and may roll back in the future. Therefore, the client cannot write a committed entry - it must be a Prepare.

  3. Finally, with client cache disabled and update checks being made on the entries, there are three writes.
    1. First, the client, before its 'commit', puts or removes temporary values in the grid. These temporary values are not transactional entries - they are neither "Prepare" or "Commit" records, because the cache is being used as a simple scratchpad. This is a "scratchpad" because it does not affect transactionality until the client(s) call 'commit'. In particular, these entries do not result in the reading transaction getting a ReadConsistencyException.

      The reason for having these temporary put/remove records is to coordinate between the distributed nodes in the grid. In this context, they operate like a standard shared cache (e.g. the JPA Level 2 cache).

    2. Second, once the last client has called 'commit', the transaction manager sends in the 'prepare' request, which is noted in the transaction entry. This is the point where update checks are done. For example, transactions with update policy UPDATECHECK_WRITE will now check each other prepared or commmitted upate against this the proposed write, and fail it if the start time of the requesting transaction is after the commit time of any of the prepared/committed updates.
    3. Finally, the commit is recorded.
Each write to the cache in general take n+1 network messages, where n is the number of backups: 1 message to update the original entry, then n messages to update the backup nodes. Therefore, the total system cost of entry updates at the data nodes is
  • the number of entries updated, times...
  • the number given above (1, 2 or 3 writes)
  • times the number of network messages per write.
The complete list of possible 'update checks' in calculating the number of writes given above is:
  • an UPDATECHECK other than UPDATECHECK_NONE
  • a non-null filter being present on the put
  • the update being made by an entry processor, which can potentially roll back the transaction
  • a MapTrigger being added on the cache holding an entry, which can also roll back the transaction.

6.2.8  Writes Per Read Entry
It may seem odd to talk about the number of writes per read entry, but ...

With an MVCC update policy of UPDATECHECK_READWRITE, an update must fail if there is an on-going transaction with a (committed or prepared) read on the cache entry. This is because the read may contribute to the calculation of the value in another entry. CloudTran keeps track of these reads by noting that a value has been read, on the read value's entry.

This mode therefore involved two writes per read entry: one to add the 'note'; another to remove it.


6.2.9  Scenarios
Here are some scenarios for using the isolation level features:
  • Multiple Bank Balance Updates

    This must avoid write skew. It is naturally a single-client scenario because the calculation must be done in one place.

    This can be done with PL, by first reading within the transaction (and therefore locking) all values to be used in the calculation; then doing the calculations; then doing the updates.

    It can also be done with MVCC, with the UPDATECHECK_READWRITE update policy; if there are any conflicting updates, the transaction will fail and have to be retried. MVCC removes the need to lock all the necessary values first, which is expensive (it is a backed-up cache write). Instead, the values read will be point-in-time consistent because of using MVCC. Client caching can be switched on.

  • Reporting

    Use MVCC to give a consistent read set, as of a point in time. For reporting on large datasets, it makes sense to use multiple clients to collect and analyse the data.

  • eCommerce

    eCommerce is characterized by user 'think time', potentially taking many minutes. It is typically single-client, because it collects the information for the transaction in a single thread, and is therefore suitable for client caching.

    eCommerce is best handled by optimistic locking, so that if an important value has changed (like the stock on hand), the transaction can be abandoned, or retried with different actions. Optimistic locking is handled in TopLink Grid, which uses PL, by sending EntryProcessors into the cache that implement the checks to detect conflicts.

    In MVCC, optimistic locking can be implemented by using the appropriate Filter, which then does the check for conflicts at commit time.

  • Volatile Pricing

    "Volatile pricing" refers to pricing of financial instruments or calculating betting odds. The price is continuously adjusted based on changing positions, so it is likely that multiple pricing calculations for a given instrument or bet happen simultaneously.

    The data for the calculation is read using MVCC, so reflects a consistent point in time. The calculation is typically done in a single client (but based on distributed data) using MVCC with the UPDATECHECK_NONE update policy to support overlapping pricing calculations.

  • Web Services/SOA

    Large-scale web services can be efficiently implemented on a grid by having large 'fact' data aggregated in distributed caches and sending the service requests, as EntryProcessors, to the nodes holding the data. Because many services requests need to be made to implement a single client request, the requests are often processed asynchronously to improve response times.

    The point at which the threads split out would be where the transaction becomes multi-client: each thread separately enrols in the transaction.

    This type of transaction could use MVCC, but, because of the use of EntryProcessors, it should not use client caching: all update transactions are naturally written into the grid, and would be committed without further checking.


6.2.10  Consistent Reads in MVCC
For most reads of cache entries where there are multiple committed values, CloudTran returns the value that was committed by the oldest transaction to commit a value before the requesting transaction started. For example, if the reading transaction started at 9:02 and there are committed values for transactions committed at 9:00, 9:01 and 9:03, then CloudTran returns the value from the 9:01 transaction. This is similar to the operation in databases, where the read of a particular value at the database in an MVCC transaction gets the appropriate value for the reading transaction's start time.

For databases, the read can be strictly ordered with respect to committed values: an updated value is committed either before or after a read.

In CloudTran, life is more complicated because the data caches and the transaction managers are distributed. We have to take prepare requests into account (from Tother). They can interact with read requests in two ways.

First, as noted in previous sections, the read request will fail if the reading transaction is in UPDATECHECK_READWRITE mode.

Second, if the reading transaction is not in UPDATECHECK_READWRITE mode and there is a pending update in the data cache (from Tother), the code doing the consistent read (on behalf of Tthis) cannot know for sure whether Tother has already been committed - and if so, whether the commit occurred before the reading transaction started. Almost always it won't ... but very occasionally, the pending value would be the correct one to return.

This situation arises because of the three distinct actions involved in commit, which happen on two differen nodes:

  1. The prepare request arriving at the cache node. This may come during the client transaction, if client caching is switched off, otherwise when the client commits.
  2. The commit arriving at the transaction manager node (or, in the case of many clients in a transaction, the last client commit arriving at the manager). This is the point where the commit timestamp is allocated.
  3. The cache commit request arriving at the cache node. Note that the commit at the transaction manager occurs before the commit of the updates at the data nodes.
Because the read encounters a prepare request, CloudTran does not know whether to include pending updates in the read or not. (Of course, this is only if Tthis does not itself have any pending updates. If it does, that update is returned.) The read has happened somewhere between the first and third steps above, so Tother may, or may not, have committed.

In this case, the server processing returns an "In Doubt" indication to the client. The CloudTran Client API tries to resolve the In Doubt condition by retrying: if the cache entries reflect the new value when the read is executed after the retry, the read can be sure of the commit time of the new entry and choose the correct value to return.

By default, CloudTran retries a read 10 times at 5ms intervals. If the read is not resolved in that time, CloudTran throws a ReadConsistencyException - which of course the caller can catch and retry again if appropriate. However, there are controls to change the number of retries and time intervals - see the config properties ct.coherence.mvcc.readRetryCount and ct.coherence.mvcc.readRetryDelayMillis.

Copyright (c) 2008-2013 CloudTran Inc.