Class CsvGroupOutput

  • All Implemented Interfaces:
    IGroupOutput

    public class CsvGroupOutput
    extends java.lang.Object
    implements IGroupOutput
    Writes grouping output to a set of three files, one each for pairs, records, and groups.

    The group file will contain one row per group: group id, number of records in the group, number of subgroups, maximum score.

    The records file will have one row per record: group id, record id, position in group, subgroup, linked record key, score to linked record.

    The pairs file will have one row per pair: group id, pair key, first record key, second record key, first subgroup, second subgroup, pair score.

    A group of N records will have exactly N-1 pairs -- just enough to link the records together.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      char delimiter  
      boolean extended  
    • Constructor Summary

      Constructors 
      Constructor Description
      CsvGroupOutput​(CsvPairInput input, java.lang.String group_filename, java.lang.String pair_filename, java.lang.String record_filename)
      Builds a CsvGroupOutput object that infers key mapping from the CsvPairInput object provided.
      CsvGroupOutput​(IStringKeyMapping record_key_mapping, IStringKeyMapping pair_key_mapping, java.lang.String group_filename, java.lang.String pair_filename, java.lang.String record_filename)
      Builds a CsvGroupOutput object with the specified key mapping.
      CsvGroupOutput​(java.lang.String group_filename, java.lang.String pair_filename, java.lang.String record_filename)
      Builds a CsvGroupOutput object that assumes there is no key mapping.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void done()
      Implements IGroupOutput.done().
      void outputGroup​(int id, int num_subgroups, float max_score, java.lang.Iterable<GroupedRecord> records, java.lang.Iterable<GroupedPair> pairs)
      Implements IGroupOutput.outputGroup.
      void startFail()
      Implements IGroupOutput.startFail by writing "UNCLOSED GROUPS:" to each of the output files.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • delimiter

        public char delimiter
      • extended

        public boolean extended
    • Constructor Detail

      • CsvGroupOutput

        public CsvGroupOutput​(java.lang.String group_filename,
                              java.lang.String pair_filename,
                              java.lang.String record_filename)
                       throws java.io.FileNotFoundException
        Builds a CsvGroupOutput object that assumes there is no key mapping.
        Parameters:
        group_filename - Name of file to write groups to. Passing null suppresses the group file.
        pair_filename - Name of file to write spanning pairs to. Passing null suppresses the spanning pairs file.
        record_filename - Name of file to write records to. Passing null suppresses the group file.
        Throws:
        java.io.FileNotFoundException - if any of the specified files could not be created.
      • CsvGroupOutput

        public CsvGroupOutput​(CsvPairInput input,
                              java.lang.String group_filename,
                              java.lang.String pair_filename,
                              java.lang.String record_filename)
                       throws java.io.FileNotFoundException
        Builds a CsvGroupOutput object that infers key mapping from the CsvPairInput object provided.
        Parameters:
        input - A pair source to take record- and pair-key mappings from.
        group_filename - Name of file to write groups to. Passing null suppresses the group file.
        pair_filename - Name of file to write spanning pairs to. Passing null suppresses the spanning pairs file.
        record_filename - Name of file to write records to. Passing null suppresses the group file.
        Throws:
        java.io.FileNotFoundException - if any of the specified files could not be created.
      • CsvGroupOutput

        public CsvGroupOutput​(IStringKeyMapping record_key_mapping,
                              IStringKeyMapping pair_key_mapping,
                              java.lang.String group_filename,
                              java.lang.String pair_filename,
                              java.lang.String record_filename)
                       throws java.io.FileNotFoundException
        Builds a CsvGroupOutput object with the specified key mapping.
        Parameters:
        record_key_mapping - The record key mapping to use. If null, record keys are used un-mapped.
        pair_key_mapping - The pair key mapping to use. If null, pair keys are used un-mapped.
        group_filename - Name of file to write groups to. Passing null suppresses the group file.
        pair_filename - Name of file to write spanning pairs to. Passing null suppresses the spanning pairs file.
        record_filename - Name of file to write records to. Passing null suppresses the group file.
        Throws:
        java.io.FileNotFoundException - if any of the specified files could not be created.
    • Method Detail

      • done

        public void done()
                  throws java.io.IOException
        Implements IGroupOutput.done(). Called by the grouping engine when it completes.
        Specified by:
        done in interface IGroupOutput
        Throws:
        java.io.IOException - if an open file could not be closed.
      • outputGroup

        public void outputGroup​(int id,
                                int num_subgroups,
                                float max_score,
                                java.lang.Iterable<GroupedRecord> records,
                                java.lang.Iterable<GroupedPair> pairs)
        Implements IGroupOutput.outputGroup.
        Specified by:
        outputGroup in interface IGroupOutput
        Parameters:
        id - The group id
        num_subgroups - Number of subgroups within the group.
        max_score - Highest score of any pair in the group.
        records - A list of the records in the group, ordered by position.
        pairs - A list of the highest-scoring pairs that span the group.
        See Also:
        IGroupOutput
      • startFail

        public void startFail()
        Implements IGroupOutput.startFail by writing "UNCLOSED GROUPS:" to each of the output files.
        See Also:
        IGroupOutput