Class Deduplicator


  • public class Deduplicator
    extends java.lang.Object
    Finds duplicates between a KeyedQuerySource and data held by a TIBCO Patterns - Search engine.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean finishedOk()
      Check if the deduplication finished with no errors
      int getBatchSize()
      Get the size of query batches.
      static int getDefaultBatchSize()  
      boolean getFailed()  
      java.util.List<Host> getHosts()
      Gets a read-only list of the TIBCO Patterns - Search hosts.
      boolean isRunning()
      Check if the deduplicator is running.
      boolean isWorkComplete()
      Check if the deduplication is done and ready for shutdown.
      void restartFailedSearchers()
      Bring searcher counts up to desired levels for all TIBCO Patterns - Search hosts.
      void setHostNetworkEncryption​(java.lang.String hostName, int hostPort, boolean netEncryption, javax.net.ssl.SSLSocketFactory factory)
      Set the network encryption parameters use when communicating with a host.
      void setHostWorkerCount​(java.lang.String hostName, int hostPort, int searcherCount)
      Sets the number of searchers attached to a TIBCO Patterns - Search host engine.
      void setIgnoreEmptyQueries​(boolean on)
      Set ignoring of empty queries.
      void shutdown()
      Shut down all processing.
      boolean start()
      Start the deduplication.
      void stopBatching()
      Prevent the Deduplicator from queuing new batches of queries.
      void stopHostSearchers​(java.lang.String hostName, int hostPort)
      Stop all searchers that operate on the specified TIBCO Patterns - Search host.
      void stopHostSearchers​(java.lang.String hostName, int hostPort, boolean immediate)
      Stop all searchers that operate on the specified TIBCO Patterns - Search host.
      void waitWorkComplete()
      Waits indefinitely for the deduplicator to finish its work
      boolean waitWorkComplete​(long time, java.util.concurrent.TimeUnit unit)
      Wait for deduplication work to finish.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Deduplicator

        public Deduplicator​(int batchSize,
                            KeyedQuerySource querySource,
                            Logger logger,
                            ErrorHandler error_handler,
                            PairStore pairStore)
        Creates a new Deduplicator.
        Parameters:
        batchSize - Number of records in each record batch.
        querySource - the QueryBuilder use to build queries from records.
        logger - The log to be used for writing output messages.
        error_handler - The handler for errors encountered during deduplication. If null is passed, the Deduplicator creates a new instance of ErrorHandler.
        pairStore - Stores batches of pairs found by the Deduplicator.
    • Method Detail

      • getDefaultBatchSize

        public static int getDefaultBatchSize()
        Returns:
        the default batch size
      • getHosts

        public java.util.List<Host> getHosts()
        Gets a read-only list of the TIBCO Patterns - Search hosts.
        Returns:
        a read-only list of the TIBCO Patterns - Search hosts.
      • start

        public boolean start()
                      throws java.lang.InterruptedException
        Start the deduplication.
        Returns:
        true if the deduplicator must be waited on to complete its work. false if the deduplicator's work is already done.
        Throws:
        java.lang.InterruptedException - if this thread is interrupted while the deduplication is starting up.
        java.lang.IllegalStateException - if the deduplication is already started, or if no hosts have been defined.
      • stopBatching

        public void stopBatching()
        Prevent the Deduplicator from queuing new batches of queries.
      • shutdown

        public void shutdown()
        Shut down all processing. Processing ends after currently in-process batches complete. How long this takes depends on batch size and query complexity.
        Throws:
        java.lang.IllegalStateException - if the Deduplicator is not running.
      • stopHostSearchers

        public void stopHostSearchers​(java.lang.String hostName,
                                      int hostPort)
                               throws java.lang.IllegalStateException,
                                      java.lang.IllegalArgumentException
        Stop all searchers that operate on the specified TIBCO Patterns - Search host. Work on any in-process batches is allowed to complete. How long this takes depends on batch size and query complexity.
        Parameters:
        hostName - Address of TIBCO Patterns - Search host to stop processing on.
        hostPort - Port of TIBCO Patterns - Search host to stop processing on.
        Throws:
        java.lang.IllegalArgumentException - if the specified host is not in the
        java.lang.IllegalStateException - if the Deduplicator is not running. Deduplicator's host list.
      • stopHostSearchers

        public void stopHostSearchers​(java.lang.String hostName,
                                      int hostPort,
                                      boolean immediate)
                               throws java.lang.IllegalStateException,
                                      java.lang.IllegalArgumentException
        Stop all searchers that operate on the specified TIBCO Patterns - Search host.
        Parameters:
        hostName - Address of TIBCO Patterns - Search host to stop processing on.
        hostPort - Port of TIBCO Patterns - Search host to stop processing on.
        immediate - Controls whether work on in-process batches is allowed to complete, or if the batch search must exit immediately. If false, how long it takes a batch to finish depends on batch size and query complexity.
        Throws:
        java.lang.IllegalArgumentException - if the specified host is not in the
        java.lang.IllegalStateException - if the Deduplicator is not running. Deduplicator's host list.
      • setHostWorkerCount

        public void setHostWorkerCount​(java.lang.String hostName,
                                       int hostPort,
                                       int searcherCount)
        Sets the number of searchers attached to a TIBCO Patterns - Search host engine. If the Deduplicator is already running, this increases or decreases the number of active search workers. If the number of search workers decreases, work on any in-process batches will be allowed to complete. How long this takes depends on batch size and query complexity.

        Memory consumed by Deduplicator depends on the number of search workers times the batch size, times the record size. Deduplications with a large number of searchers and/or a large record size may need a reduced batch size and/or an increased JVM memory limit.

        If a host already exists, the netEncryption and factory parameters are ignored.

        Parameters:
        hostName - Address (DNS name or IP address) of the engine.
        hostPort - TCP Port of the engine.
        searcherCount - Number of searcher workers.
      • setHostNetworkEncryption

        public void setHostNetworkEncryption​(java.lang.String hostName,
                                             int hostPort,
                                             boolean netEncryption,
                                             javax.net.ssl.SSLSocketFactory factory)
        Set the network encryption parameters use when communicating with a host.
        Parameters:
        hostName - Address (DNS name or IP address) of the engine.
        hostPort - TCP Port of the engine.
        netEncryption - true to enable network encryption, false to disable.
        factory - An SSL socket factory. Pass null to use the default factory.
      • restartFailedSearchers

        public void restartFailedSearchers()
        Bring searcher counts up to desired levels for all TIBCO Patterns - Search hosts.
        Throws:
        java.lang.IllegalStateException - if the Deduplicator is not running.
      • isRunning

        public boolean isRunning()
        Check if the deduplicator is running.
        Returns:
        true if start() has been called but shutdown() has not been called.
      • isWorkComplete

        public boolean isWorkComplete()
        Check if the deduplication is done and ready for shutdown.
        Returns:
        false if the Deduplicator is still performing work, true otherwise.
      • waitWorkComplete

        public boolean waitWorkComplete​(long time,
                                        java.util.concurrent.TimeUnit unit)
                                 throws java.lang.InterruptedException,
                                        DedupeException
        Wait for deduplication work to finish.
        Parameters:
        time - the maximum time to wait for the lock
        unit - the time unit of the time argument
        Returns:
        true if the work was completed within the requested time period.
        Throws:
        java.lang.InterruptedException - if the deduplicator was interrupted before all work was finished.
        DedupeException - if the deduplicator encountered an error that prevented it from completing.
      • waitWorkComplete

        public void waitWorkComplete()
                              throws java.lang.InterruptedException,
                                     DedupeException
        Waits indefinitely for the deduplicator to finish its work
        Throws:
        java.lang.InterruptedException - if the current thread is interrupted while waiting for work to complete, or if the deduplicator was interrupted before all work was finished.
        DedupeException - if the deduplicator encountered an error.
      • finishedOk

        public boolean finishedOk()
        Check if the deduplication finished with no errors
        Returns:
        Returns false if the deduplicator encountered any errors.
        Throws:
        java.lang.IllegalStateException - if the Deduplicator has not completed its work, or has never been started.
      • setIgnoreEmptyQueries

        public void setIgnoreEmptyQueries​(boolean on)
        Set ignoring of empty queries. If on, NetricsException.NOQUERY errors will be silently ignored. See the TIBCO Patterns - Search Java API Reference for more details. Default: off
        Parameters:
        on - pass true to ignore empty queries, false to allow them to error out.
      • getBatchSize

        public int getBatchSize()
        Get the size of query batches.
        Returns:
        the size of query batches.
      • getFailed

        public boolean getFailed()
        Returns:
        the value of ErrorHandler.getFailed() from the associated error handler.