Package com.tibco.patterns.deduplication
Class Deduplicator
- java.lang.Object
-
- com.tibco.patterns.deduplication.Deduplicator
-
public class Deduplicator extends java.lang.ObjectFinds duplicates between a KeyedQuerySource and data held by a TIBCO Patterns - Search engine.
-
-
Constructor Summary
Constructors Constructor Description Deduplicator(int batchSize, KeyedQuerySource querySource, Logger logger, ErrorHandler error_handler, PairStore pairStore)Creates a new Deduplicator.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanfinishedOk()Check if the deduplication finished with no errorsintgetBatchSize()Get the size of query batches.static intgetDefaultBatchSize()booleangetFailed()java.util.List<Host>getHosts()Gets a read-only list of the TIBCO Patterns - Search hosts.booleanisRunning()Check if the deduplicator is running.booleanisWorkComplete()Check if the deduplication is done and ready for shutdown.voidrestartFailedSearchers()Bring searcher counts up to desired levels for all TIBCO Patterns - Search hosts.voidsetHostNetworkEncryption(java.lang.String hostName, int hostPort, boolean netEncryption, javax.net.ssl.SSLSocketFactory factory)Set the network encryption parameters use when communicating with a host.voidsetHostWorkerCount(java.lang.String hostName, int hostPort, int searcherCount)Sets the number of searchers attached to a TIBCO Patterns - Search host engine.voidsetIgnoreEmptyQueries(boolean on)Set ignoring of empty queries.voidshutdown()Shut down all processing.booleanstart()Start the deduplication.voidstopBatching()Prevent the Deduplicator from queuing new batches of queries.voidstopHostSearchers(java.lang.String hostName, int hostPort)Stop all searchers that operate on the specified TIBCO Patterns - Search host.voidstopHostSearchers(java.lang.String hostName, int hostPort, boolean immediate)Stop all searchers that operate on the specified TIBCO Patterns - Search host.voidwaitWorkComplete()Waits indefinitely for the deduplicator to finish its workbooleanwaitWorkComplete(long time, java.util.concurrent.TimeUnit unit)Wait for deduplication work to finish.
-
-
-
Constructor Detail
-
Deduplicator
public Deduplicator(int batchSize, KeyedQuerySource querySource, Logger logger, ErrorHandler error_handler, PairStore pairStore)Creates a new Deduplicator.- Parameters:
batchSize- Number of records in each record batch.querySource- the QueryBuilder use to build queries from records.logger- The log to be used for writing output messages.error_handler- The handler for errors encountered during deduplication. If null is passed, the Deduplicator creates a new instance ofErrorHandler.pairStore- Stores batches of pairs found by the Deduplicator.
-
-
Method Detail
-
getDefaultBatchSize
public static int getDefaultBatchSize()
- Returns:
- the default batch size
-
getHosts
public java.util.List<Host> getHosts()
Gets a read-only list of the TIBCO Patterns - Search hosts.- Returns:
- a read-only list of the TIBCO Patterns - Search hosts.
-
start
public boolean start() throws java.lang.InterruptedExceptionStart the deduplication.- Returns:
- true if the deduplicator must be waited on to complete its work. false if the deduplicator's work is already done.
- Throws:
java.lang.InterruptedException- if this thread is interrupted while the deduplication is starting up.java.lang.IllegalStateException- if the deduplication is already started, or if no hosts have been defined.
-
stopBatching
public void stopBatching()
Prevent the Deduplicator from queuing new batches of queries.
-
shutdown
public void shutdown()
Shut down all processing. Processing ends after currently in-process batches complete. How long this takes depends on batch size and query complexity.- Throws:
java.lang.IllegalStateException- if the Deduplicator is not running.
-
stopHostSearchers
public void stopHostSearchers(java.lang.String hostName, int hostPort) throws java.lang.IllegalStateException, java.lang.IllegalArgumentExceptionStop all searchers that operate on the specified TIBCO Patterns - Search host. Work on any in-process batches is allowed to complete. How long this takes depends on batch size and query complexity.- Parameters:
hostName- Address of TIBCO Patterns - Search host to stop processing on.hostPort- Port of TIBCO Patterns - Search host to stop processing on.- Throws:
java.lang.IllegalArgumentException- if the specified host is not in thejava.lang.IllegalStateException- if the Deduplicator is not running. Deduplicator's host list.
-
stopHostSearchers
public void stopHostSearchers(java.lang.String hostName, int hostPort, boolean immediate) throws java.lang.IllegalStateException, java.lang.IllegalArgumentExceptionStop all searchers that operate on the specified TIBCO Patterns - Search host.- Parameters:
hostName- Address of TIBCO Patterns - Search host to stop processing on.hostPort- Port of TIBCO Patterns - Search host to stop processing on.immediate- Controls whether work on in-process batches is allowed to complete, or if the batch search must exit immediately. If false, how long it takes a batch to finish depends on batch size and query complexity.- Throws:
java.lang.IllegalArgumentException- if the specified host is not in thejava.lang.IllegalStateException- if the Deduplicator is not running. Deduplicator's host list.
-
setHostWorkerCount
public void setHostWorkerCount(java.lang.String hostName, int hostPort, int searcherCount)Sets the number of searchers attached to a TIBCO Patterns - Search host engine. If the Deduplicator is already running, this increases or decreases the number of active search workers. If the number of search workers decreases, work on any in-process batches will be allowed to complete. How long this takes depends on batch size and query complexity.
Memory consumed by Deduplicator depends on the number of search workers times the batch size, times the record size. Deduplications with a large number of searchers and/or a large record size may need a reduced batch size and/or an increased JVM memory limit.If a host already exists, the netEncryption and factory parameters are ignored.
- Parameters:
hostName- Address (DNS name or IP address) of the engine.hostPort- TCP Port of the engine.searcherCount- Number of searcher workers.
-
setHostNetworkEncryption
public void setHostNetworkEncryption(java.lang.String hostName, int hostPort, boolean netEncryption, javax.net.ssl.SSLSocketFactory factory)Set the network encryption parameters use when communicating with a host.- Parameters:
hostName- Address (DNS name or IP address) of the engine.hostPort- TCP Port of the engine.netEncryption- true to enable network encryption, false to disable.factory- An SSL socket factory. Pass null to use the default factory.
-
restartFailedSearchers
public void restartFailedSearchers()
Bring searcher counts up to desired levels for all TIBCO Patterns - Search hosts.- Throws:
java.lang.IllegalStateException- if the Deduplicator is not running.
-
isRunning
public boolean isRunning()
Check if the deduplicator is running.- Returns:
- true if
start()has been called butshutdown()has not been called.
-
isWorkComplete
public boolean isWorkComplete()
Check if the deduplication is done and ready for shutdown.- Returns:
- false if the Deduplicator is still performing work, true otherwise.
-
waitWorkComplete
public boolean waitWorkComplete(long time, java.util.concurrent.TimeUnit unit) throws java.lang.InterruptedException, DedupeExceptionWait for deduplication work to finish.- Parameters:
time- the maximum time to wait for the lockunit- the time unit of the time argument- Returns:
- true if the work was completed within the requested time period.
- Throws:
java.lang.InterruptedException- if the deduplicator was interrupted before all work was finished.DedupeException- if the deduplicator encountered an error that prevented it from completing.
-
waitWorkComplete
public void waitWorkComplete() throws java.lang.InterruptedException, DedupeExceptionWaits indefinitely for the deduplicator to finish its work- Throws:
java.lang.InterruptedException- if the current thread is interrupted while waiting for work to complete, or if the deduplicator was interrupted before all work was finished.DedupeException- if the deduplicator encountered an error.
-
finishedOk
public boolean finishedOk()
Check if the deduplication finished with no errors- Returns:
- Returns false if the deduplicator encountered any errors.
- Throws:
java.lang.IllegalStateException- if the Deduplicator has not completed its work, or has never been started.
-
setIgnoreEmptyQueries
public void setIgnoreEmptyQueries(boolean on)
Set ignoring of empty queries. If on, NetricsException.NOQUERY errors will be silently ignored. See the TIBCO Patterns - Search Java API Reference for more details. Default: off- Parameters:
on- pass true to ignore empty queries, false to allow them to error out.
-
getBatchSize
public int getBatchSize()
Get the size of query batches.- Returns:
- the size of query batches.
-
getFailed
public boolean getFailed()
- Returns:
- the value of
ErrorHandler.getFailed()from the associated error handler.
-
-