CloudTran Home

 
  
<< Back Contents  >  7.  The Replicator Forward >>

7.2 Local Store

An important part of the Replicator is the LocalStore. This is effectively a back-up service which records the transactional information to a data store before the information is sent across the wire. The LocalStore's purpose is to provide high-performance backed-up storage for the Replicator.

It stores the outbound Transactional information, keeping a record of the transactions that have been sent. It also needs to record the transactions that have been acknowledged

 7.2.1  The Local Store
 7.2.2  LocalStore API
 7.2.3  LocalStore Implementations
 7.2.4  SSDLocalStore Configuration

7.2.1  The Local Store
The Replicator makes use of the Local Store calling it at various points in the Replication cycle. The Replicator makes use of the LocalStore api which is described in more detail below. A typical sequence of calls the Replicator may make to the LocalStore are shown below.
  1. Start of Day - LocalStore.resync() Resets the LocalStore ready for a new set of transactions
  2. Store transactional information - the LocalStore.storeOubound() This stores the transactional information on the LocalStores back-up medium.
  3. Release transactional information - the LocalStore.releaseOutbound() This notifies the LocalStore that the transaction has been committed in the remote grid. The LocalStore can now release any locks on the transaction. Assuming there is no problem the Replicatior will loop through Items 2 and 3, storing and release transactions. Note: Items 2 and 3 are only synchronous for a given transactions, but they are run on different threads.
  4. Link fails - some connection issue causes the link to go down This is an unexpected action and needs to be resolved. The local system will still be committing and the LocalStore will need to continue.
  5. The link is restored The problem with the link is resolved, but the remote system is now out of sync with the local system
  6. The Local Replicator resyncs with the LocalStore The resync determines what transactions have not made it across the wire. The LocalStore returns the first Transaction Sequence Number to be resent and the number of following transactions that need to be resent.

  7. Retrieve transactional information For each transaction that has not been replicated retrieve the transactional information from the LocalStore and resend.
  8. Continue ... Replication can continue as normal - Storing and releasing transactional information as previously.

7.2.2  LocalStore API
The LocalStore is supported by the LocalStore API, which has the following methods

  • Long[] resync();
    The resync method is called when the data link connection is lost and restored. The purpose of the resync method is used to determine what packets need to be resent for replication. So, the resync method must return the first transaction sequence number to be resent as well as the number of transactions to be resent.

    The resync method is also called at the start of day. In this case the transaction sequence number is the next sequence to use in the replication process.

  • ReplicatorPacket retrieveOutbound( long txSeqNum );
    The retrieveOutbound is used to determine the transactional information for the given transaction sequence number. Usually this method is called after the resync method. The retrieveOutbound should return the transaction information from the data store

  • String storeOutbound( AtomicLong txSeqNumAL, ReplicatorOutboundRequest ror );
    The storeOutbound is called to store the given ReplicatorOutboundRequest to the back-up data store. The ReplicatorOutboundRequest contains an array of transactions. The AtomicLong holds the Transaction Sequence Number of the first transaction. The returning String can be one of:
        RED_ZONE - indicates the localStore service has failed and the caller needs to retry.
        YELLOW_ZONE - indicates the Local Store service has been successful, but is getting hot.
        GREEN_ZONE - indicates the Local Store service has been successful.

  • boolean releaseOutbound( long txSeqNum );
    This method is called to indicate the given transaction sequence number has been replicated. The LocalStore implementation can then release the given transaction sequence number from the data store. This method is not necessarily called for every transaction sequence number, the given transaction sequence number is the highest transaction sequence number that has been replicated.

7.2.3  LocalStore Implementations
Within CloudTran there as two implementations of the LocalStore service
  • The CacheLocalStoreImpl.
  • The SSDLocalStoreImpl.

The CacheLocalStoreImpl backs-up the transactions to a cache. In the abscence of an Elastic Cache this is really only useful for testing purposes.

The SSDLocalStoreImpl uses Solid State Devices (SSDs) as the back-up mechanism. The SSD local store is the recommended implementation of the LocalStore.

An additional further recommendation, which the SSDLocalStoreImpl provides, is to us two SSDs per Replicator. This means there is no single point of failure for the LocalStore and provides the best combination of resilience and speed.

In essence the SSD LocalStore serialises the transactional information to a set number of files. Once serialised the data can be transmitted across the wire and replicated on the remote grid.

As the data is replicated an acknowledgement of the replicated transactions is returned. This acknowledged information is also recorded by the SSD. So should the link fail the local replication service can interogate the SSD to determine the transactions that need to be re-sent to the remote Grid.

If the write of the transaction information to the SSD fails then a TransactionExceptionManagerTooBusy exception is thrown.

The LocalStore implementation is defined by the configuration property: ct.replicator.localStore.class So for the CacheLocalStoreImpl this would be
ct.replicator.localStore.class=com.cloudtran.replicator.localStore.CacheLocalStoreImpl

and for the SSD
ct.replicator.localStore.class=com.cloudtran.replicator.ssdStore.SSDLocalStoreImpl

A bespoke implementation can be written, provided it implements the LocalStore and the ct.replicator.localStore.class property is set.


7.2.4  SSDLocalStore Configuration

There are a number of configuration properties that can be used for the SSD LocalStore. These are shown below along with their defaults

The size of the block size we should round up to. Every write to the transaction log files starts on this boundary - extra bytes at the end of a list of transaction's write are ignored. The default is 4096, which is good for most SSDs and hard disks. Some SSDs may have a larger native block size.
ct.replicator.ssd.blockSize = 4096
This gives the directory location of the files that replicator uses. By default, this is 'replicator/directory1', which is a path below the current directory of the CloudTran isolator application. The default is appropriate for development. In production, the directory should be specified. Although this is an "SSD" property, there is no check that this is actually on an SSD, so it can be on a normal directory. The directories can optionally end in '/' or '\'. On Windows, drives, such as "C:", are allowed. If replication is enabled, you must specify a valid directory name for this property. For initialization (ct.replicator.ssd.init=true), the directory will be created if it doesn't already exist. In production, this directory should ideally be on an SSD for maximum performance. Performance is severely degraded on Linux if this directory is on the system drive.
ct.replicator.ssd.directory1 = replicator/directory1

It is highly recommended that you use a second directory on a separate disk from 'directory1' in production and performance testing; this is specified in 'ct.replicator.ssd.directory2'. In development, CloudTran will work with a single directory. See 'ct.replicator.ssd.directory1' for more information.
ct.replicator.ssd.directory2 = replicator/directory2

This properties is the base filename of all the store files. '_' plus a number (1, 2 etc.) plus the file extension is added to this base to give the full file name.
ct.replicator.ssd.fileBasename = transactionData

This properties set the fileExtension for the store files. The default is 'rep', for replicator. Avoid standard names like 'log', that may get deleted by mistake.
ct.replicator.ssd.fileExtension = rep

This sets the amount of storage allocated for the storage of the replicated packets, in MB. This is per directory (i.e. per drive, normally). The production default is 100,000 or 100GB; the debug default is 40MB.
ct.replicator.ssd.totalSizeMB = 100000/40

This is the size in MB of each file on the disk. This will be used in conjunction with the 'ct.replicator.ssd.totalSizeMB' to determine the number of file. The production default is 50MB; the debug default size is 4MB. When combined with the default 'totalSizeMB' value, this gives 2,000 files in production and 10 files in debug.
ct.replicator.ssd.fileSizeMB	= 50/4

If the 'init' property is set to 'true', the SSDs will be initialised for use by CloudTran - which will then stop. To run normally, and make use of the replicator information after a cluster reboot, this property must be false. To prevent accidental deletion of the replicator information, when this property is true, there must be no files in the directories specified in the ct.replicator.ssd.directory1/2 properties. In the preferred configuration, with two nodes providing the replicator service, the 'init' run must be done on both nodes. Best practice is to omit this property from config.properties file in production: specify it as a system property ('-Dct.replicator.ssd.init=true' on the command line) for an initialization run.
ct.replicator.ssd.init = false

This is the number of threads to use for each 'SSD' drive - to backup for the replicator.
ct.replicator.ssd.threadCount = 2

Copyright (c) 2008-2013 CloudTran Inc.