Data Management

The cloud approach to getting more capacity is to add extra nodes rather than buy a bigger box. This means designing for data distributed across nodes: a typical business transaction will need to scatter and gather data, transactionally, across these nodes.

  • Something old - Distributed Transactions

    Unfortunately, distributed transactions don't work in cloud-scale systems. This means that architects of transactional applications have had to figure out their own work-around for transactionality. The most common approach is to use siloed transactions (e.g. on a few closely related entities); when related data crosses these silos, it may get out of step and need to be fixed up manually when failures occur.

  • Something new - Cloud Transactions

    CloudTran provides Cloud Transactions - a new approach to transactionality that does work well in the cloud.

    The key difference compared to distributed transactions is that the transaction commits in the cloud, so the application must have control of changes to the data and databases (or other persistent stores) are secondary. This means that the application can respond to the user immediately the transaction is committed to a coordinator node: it doesn't have to wait for disk writes at the storage service and the cross-checking (two-phase commit) done by distributed transactions.

    Cloud transactions are

    • fast - a coordinator node can handle thousands of transactions per second
    • reliable - all nodes can have one or more hot backups
    • scalable - there are no limits to the number of nodes or records involved in a transaction
    • fully ACID, across the complete range of entites (i.e. it is not siloed)
    • isolated across nodes, so user see consistent information in the cloud
    • universal - spanning SQL or NoSQL persistent stores and messaging.

  • In-Memory or Caching

    How does the transaction commit in the cloud? There are two approaches:

    • for high-performance systems, the simplest approach is to keep the complete database in memory. This provides the fastest possible read and write performance. CloudTran provides automatic generation for this approach.
    • the other approach is to use a local cache and bring existing data into the nodes' memory if necessary. This means the read performance of the system will be slower - being determined by the database performance on searches that have to go to disk. To support the caching approach, CloudTran will provide lookup/load operations to integrate into a caching mechanism. This approach will have slower reads on average, but create/update performance will be the same as the in-memory method.