How Big Data Import Works

You can initiate big data import either from FileWatcher or from the TIBCO MDM UI. After the metadata is in place, you can specify the file to be imported and initiate the import. This file is then uploaded into HDFS, and then the import is initiated with Apache SparkContext. The entire import is processed by Apache Spark worker nodes and then returned. Big data import does not have workflow. Therefore you need to provide input either through metadata or Configurator.

To initiate big data import from the UI, see Importing Records and through FileWatcher, see Importing Input Maps Metadata.

  • Processes records row by row parallelly by each worker node as a result you cannot control on the processing of rows, that is, which row is processed and when it is processed. You cannot order the duplicates. The first record is saved, and the subsequent records are marked as duplicates. The records are confirmed immediately.
  • Supports a single data source with only CSV and TEXT formats

    Supports data source files created with the different delimiters such as a pipe, tab, space, colon, comma, semicolon, and other

  • Maps constants within a single quote
  • Maps to only data source attributes. Manual editing is disabled.
  • Supports simple one to one mapping within the data source
  • Supports rulebase. You can define a rulebase on each input map in the Rulebase Designer and can use it to assign values, add AssignIdentity, and check constraints. You can apply the rulebases on individual records and not on the bundle.

    For more information, see the "Advanced Tab for Big Data Import" section in TIBCO MDM Studio Repository Designer User's Guide.

    When you export metadata with the Big Data Import input map and a rulebase, the exported DataServiceQuery XML file contains the rulebase details in the <CatalogInputMaps> section.

  • Supports the ValidateOnly mode to view the errors. The records are not saved but you can view the errors with the possible fix. You can download the errors from the UI.
  • Supports import sequencing based on the repositories to prevent the parallel processing of records in two different events.
  • Supports the Load Import Records web service
  • Supports merge data, that is, if the Merge Data check box is selected, the data is merged with the previous latest version.
  • If the Incremental option is not selected, all records are treated as new and a new version is created. The existing records are not deleted during import.
  • Support multivalue attributes only with specific data source format and mapping.

    Cloned mapping for every multivalue attribute must be done.

  • Supports import initiation by using the Load Import Records web service. For more details, see the "Load Import Records Service" section in TIBCO MDM WebServices Guide.