If you're evaluating running ACID transactions in an In-Memory Data Grid (IMDG), you probably have a few questions about scalability, performance, and coding. Take a look at these FAQs for using CloudTran. If your question isn't below, let us know so we can get you an answer and add it to our list!
CloudTran integrates with TopLink Grid, which is an adaptation of TopLink/EclipseLink for Coherence. EclipseLink itself is the reference implementation for JPA2. TopLink Grid provides different ways of integrating with the cache; in some of these, there are restrictions on the Java Persistence Queries you can use. See this link for more information.
The requirements are: performance, persistence (data outside the grid is crucial), failover handling (data in the grid is crucial), scalability, no unbounded delays for clients.
In J2EE or J2EE lite, you had two backend tiers: the app servers (running servlets or sessions) and the database. Survival of the database was crucial - backups needed etc.
In a grid environment, you have three tiers - with cache machines in between the app servers and the database. Now it's not just the database that has to survive: the data in the grid is also critical. It also introduces a need to keep the cache and data tiers consistent.
Here are the basics of the CloudTran algorithm:
This algorithm looks quite different compared to the Prepare/Commit of XA distributed transactions, but the write of the transactional entry into the grid serves the same purpose as the Prepare of XA.
We don't know of any, but it's an obvious evolution.
Yes. They're not “XA distributed transactions“, but they are definitely distributed transactions! Small transactions may only involve 5 or 10 nodes, but there is no limit. In building CloudTran, we had in mind "big transactions", like an app we built for a customer: it committed 20-30,000 rows in one transaction.
There are some today, but not many. However, proving this level of performance is important for two reasons. First, the outlook is for growth in size and performance of applications, so this shows that CloudTran can handle big increases without re-architecting. Second, it shows CloudTran can handle spikes, where instantaneous transaction rates (PER COMMIT) can go up 100s of times the average rate.
The short answer is No, it is not a requirement. However, if your application requires lights-out persistence and linking to other applications, you will need some sort of data feed out of Coherence. The most common feed is to a database, and RAC is suitable because it is highly scalable. The choice of database is up to you.
We typically run with 4GB, although it now makes sense to use more because of the off-heap and elastic data capabilities. For JVM tuning, the most important thing is garbage collection. We use ConcurrentMark Sweep, incrementally. Through experimentation, we found that NewSize/MaxNewSize=80M and MaxTenuringThreshold=1 keeps GCs at around the 20ms mark.
We don’t know of anything you can do at run-time to detect and tune “slow nodes”. We do run the Coherence datagram tester, and if one node is constantly reporting high error rates, we don’t use it. But at run time, you have to live with it. The dreaded “communication delays” message is quite common, particularly if you’re on a busy public cloud with other users hammering the network. We let Coherence handle this level of detail - and eventually remove a node from the cluster if it’s too slow.
RAC is a clustered database; Coherence is an in-memory data grid. I don’t know of any specific integration. You have the same integration as you would with any database.
© Copyright 2012 CloudTran, Inc.| All Rights Reserved.