1.1 Background
CloudTran is a product for Java developers who need to build high-performance applications using large datasets with strong ACID guarantees.
In essence, CloudTran provides
- a new style of transactionality for high performance commercial applications, combined with
- a high-productivity development and deployment environment.
The rest of this page expands on these points.
1.1.1 Transactionality
|
1.1.1.1 Less time, higher throughput, more data and more business logic
|
There is a growing requirement for applications to respond more quickly, process more transactions per second, handle larger datasets and do more computation per transaction.
These requirements are driven by the growth in mobile applications, globalisation, growth in micropayment volumes
and the constant demand for more data and more business intelligence by consumers and regulators.
The standard architecture for web applications, with scalable web and processing tiers backed by one database(s) cannot handle these requirements past a certain point.
The first fundamental problem is the load on the database during read and searching operations; this can be alleviated by caching, up to a point.
But eventually, these requirements make the database-oriented architecture unviable simply because of the update/write processing.
Quite simply, to meet requirements, the live data must be nearer to the user. This requires a new architectural approach.
One approach is to use a NoSQL solution - where relational databases are not used at all for storing information, transactional integrity is not guaranteed
and the programming paradigm is quite different.
However, NoSQL's lack of transactional integrity and the way it severs links to business intelligence applications
makes it unattractive for business-critical commercial applications.
1.1.1.2 The IMDG Tier
|
The new part of the architecture is the in-memory data grids (IMDG) - a number of nodes that cooperate to provide a unified cache layer.
In other words, the IMDG's primary purpose is to provide data as a shared resource to service operations in the application server tier.
CloudTran supports an architectural style where all the data involved in a transaction is resident in the IMDG - so the IMDG is the system of record.
This does not mean that all the data in an application needs to be in the IMDG.
Non-transactional data, like catalog data, can be in other data stores.
The reason for using this style of architecture is that it is scalable.
It prevents the database from becoming a bottleneck.
Although it is possible to "scale" databases by sharding, this tends to create problems with transactionality and manageability.
Most IMDG's provide
- transactional updates into the grid
- write-behind persistence - fast streaming to a database.
What they do not provide is scalable, ACID transactionality that can be used in real-life deployments.
The normal fallback is distributed transactions using XA and 2PC (2 phase commit), but this approach is slow and unreliable in scalable environments.
Most architects are extremely reluctant to use them.
1.1.1.3 CloudTran Transactionality
|
CloudTran provides a new style of transaction specifically to support in-memory data grids, in private data centers or clouds.
Like distributed transactions, CloudTran transactions provide strong ACID guarantees.
However, CloudTran transactions are much faster and more reliable than distributed transactions.
They impose very little overhead on IMDG operation.
CloudTran transactions also scale. They can involve any number of cache nodes, any amount of data and any number of databases/datastores.
The overhead of a transaction seen by the client is determined by the number of data items and the aggregate size - as you would expect - but hardly affected by
- the scope of the data in the transaction - the number of physical nodes involved
- the number of databases
- the distance (in miles, or milliseconds delay) to the databases.
The current version of CloudTran also provides integration with messaging.
A future version will expose CloudTran as a general purpose transaction manager for integration with other data sources, such as NoSQL.
1.1.2 High Productivity Development and Deployment
|
Distributed, scalable application programming is very difficult and requires new architectural features -
multi-machine communications, failover handling, performance optimisation for physical and virtual networking environments.
CloudTran provides an out-of-the-box architecture to address scalable applications and
a distributed ORM (Object Relation Manager) to integration with IMDG's.
This handles the mapping from the application view - a connected object tree - to distributed data storage,
including failure handling.
Part of the architecture covers integrations with the IMDG of course, but also
- messaging (Mule)
- REST/JSON (Jersey), and
- security (Spring)
- multi-tenancy support with two different data mapping styles
CloudTran also allows developers to specify many deployments (more or less machines, backups etc.) and automatically deploys to standard execution platforms.
The current list of execution platforms is Eclipse (for desktop debugging), Linux, Windows and clouds (using jclouds and whirr) - Amazon EC2, Rackspace, GoGrid
Deployment automation eliminates finger trouble in deployments, and so avoids wasting time during development and catastrophic errors in production deployments.
1.1.3 Benefits
|
The benefits to application developers of using CloudTran are:
-
Performance and scalability. CloudTran applications are fast because they do not need to wait on disk I/O, for read or write.
In addition, CloudTran makes it easy to cluster related data entities in the same virtual machine, further improving performance.
-
Cost per transaction. CloudTran executes thousands of transactions per second on low-cost commodity hardware. It achieves this by a combination of
buffering transactions, so applications can proceed without waiting for database transactions, and aggregating transactions en route to databases which is more efficient
than issuing single transactions.
-
General-purpose data design. It is possible to design a simple application without a viable distributed transaction mechanism.
However, as an application's data domain grows, the design restrictions make maintenance costs rise sharply.
With CloudTran, it is easy to add more data domains and service operations to an application.
-
Links with business intelligence. Because CloudTran transactions can be sent to databases, they can plug into an enterprise's business intelligence infrastructure.
-
Reduce time to market. Application programmers will find that most of CloudTran involves familiar concepts and programming tools;
the new concepts are simple to use and are needed to achieve optimal performance.
This reduces the learning time to build a first application. For example, a realistic Proof of Concept can take only a few days.
-
Risk reduction. There is a big risk with any new architecture - the mechanics of programming are hard, and achieving acceptable performance is even harder.
CloudTran represents over 2 years of architecture refinement and performance tuning.
By re-using the out-of-the-box architecture, you can dramatically reduce the project risk.
1.1.4 Applications
|
The type of applications that are likely to need the large dataset, intense processing and high transaction rates provided by CloudTran are:
- Web Sites To Scale. Web sites that need to scale to handle many users may find themselves locked in
- the platform or language currently implemented does not scale easily.
Developers looking to move platforms will find CloudTran solves the scalability problem and reduces the cost, time and risk of migration.
- Cloud Front, Data Centre Back. Because CloudTran can use remote databases for persistent
without reducing application performance, it is possible to deploy applications in the cloud using the IMDG for holding the live application data
and saving data back to the data center. This reduces security risks and concerns around storing data in pulbic clouds.
- Application Integration.
Web applications and SOA deployments that have millions of users can effectively use CloudTran for its
scalability plus strong transactions, high performance and large data domains.
|