There is a hook in the CloudTran logger that allows developers to customise the action of disposing of log files.
This section gives the background first. Then it describes the class and method for disposing and how to configure it.
The transaction logs are written to a series of log files.
10.2.1 Log files|
As the application commits transactions, the TxB writes the transaction data and a "persisting" status into the log file.
When a transaction is persisted to all databases, the TxB writes a "persisted" status for the transaction to the current log file.
Log files roll over, as described in the configuration section.
For example, the first log file may have the 'persisting' entry for transactions 1-1000,
and then rolls over to another log file when it gets full.
The second log file has the 'persisting' entry transactions 1001+, until it gets full and so on.
Normally a log file will have some transactions with a 'persisting' entry but no matching 'persisted' entry,
because the log file rolled over before the 'persisted' entries could be written.
In other words, the log entries for transactions can span log file boundaries.
In our example, we could have written the 'persisted' entries for transactions 1-995 in the first log file,
and for 996+ to the second logfile.
When all the 'persisting' entries in one log file log have had matching 'persisted' entries written,
it means that the complete file is no longer needed - all the data we noted in case of a crash is now
redundant because the data store has secured it. In our example, when we have written the
'persisted' entries for transactions 996-1000 in the second log file, the first file becomes redundant.
At this point, we can dispose of the file.
The main assumption of CloudTran logging is that, once a transaction (or part of one)
has been persisted to a database or other persistent store, then the database is responsible for
keeping the data safe.
10.2.2 The Assumption Behind CloudTran Logging|
This therefore means that
CloudTran can delete a log file when it becomes redundant (because all its 'persisting' entries have been committed).
There is a class and method for disposing of the log files.
It is important to dispose of log files so they are no longer in the log directory.
If this is not done
10.2.3 Disposing of log files|
- the amount of replay processing after a crash can be too lengthy to complete in a realistic time
and the value of having logs at all is lost
- the log files will eventually fill up the disk.
To dispose of a redundant file, the transaction logger calls an instance that implements the "com.cloudtran.log.ILogDisposer" interface.
The method that this must implement has the signature
void dispose( File logFile );
The default class for this is com.cloudtran.log.LogDefaultDisposer, which simply deletes the file.
If you wish to implement your own disposer, you must name the class in the "ct.logger.disposer" configuration property, e.g.
ct.logger.disposer = com.myco.myapp.log.LogDisposer
The performance of the transaction logger is affected by a number of configuration parameters:
10.2.4 Transaction Logging Speed and Configuration|
- the number of logger threads, as configured in ct.logger.threadCount.
The default is 3 threas is adequate for most situations.
- the maximum size of an individual log file,
as configured in ct.logger.fileSizeMB.
20MB is the default; a larger file size may help in high-volume systems
- the maximum write size,
as configured in ct.logger.maxWriteSize.
5MB is the default. Changing this may improve performance for large-data transactions.
The log file is written as one or more groups of transactions. The structure of each write is as follows:
10.2.5 Structure of the Log File|
The header is written twice, at the start and at the end of the block.
Between them is the serialised data of the transactions, followed by padding bytes to round up to a disk block boundary for the overall write operation.
The header consist of 5 parts:
Two copies of the header are written so that CloudTran can detect single-bit errors in the header.
If there is problem in the first header, the signature bytes allows the replayer to search for the second header, and if found, use it in preference to the first header.
- H1 - 64 bytes, signature
- H2 - 4 bytes, the offset of the first byte in the file, lowest byte first
- H3 - 4 bytes, the count of transactions that we're writing here
- H4 - 4 bytes, the size of serialized data in this section
- H5 - 8 bytes, the CRC-32 of the serialized data (from the GZIP format, see http://www.ietf.org/rfc/rfc1952.txt)
The CRC of the serialized data allows us to double-check the validity of the data.
Data read from the disk is error checked by the hardware ...
but there are enough war stories of undetected errors from hardware causing havoc to make an extra check useful.