Cloud Software Group, Inc. EBX®
Documentation > User Guide > EBX® Metadata Management > Advanced data services
Navigation modeDocumentation > User Guide > EBX® Metadata Management > Advanced data services

Metadata harvesting

Overview

Metadata harvesting is the process of gathering metadata from a data source that exists in your system and is supported by TDV and store it as an EBX® Metadata Management application asset.

You can run harvesting for data sources that are already configured in the TDV or for the new sources defined in the EBX® Metadata Management application.

If you start the Harvesting service from an asset or a selection of assets, please, make sure that all assets were previously harvested by the Metadata agent.

If you start the Harvesting service from an instance or a selection of instances, you can create a data source in TDV for any that do not yet exist. See Create data source from instance for more information.

The Harvesting service cannot be run on disabled assets or instances.

Start harvesting service

On the asset hierarchical view or instance view, select one or more assets or instances for which you want to harvest metadata. Select the cloud icon to begin harvesting. The harvesting main view displays.

/harvestAsset_selection.png

Harvesting main view

The harvesting main view displays your current selection and existing configurations.

An existing configuration is a harvesting that was already executed for one or several elements related to your current selection. If no existing configuration is available you will see a main view similar to the one in the screenshot below: 

/harvestAsset_main.png

If you have an existing configuration, you will see something like the screenshot below:

/harvestAsset_mainWithConf.png

From an existing configuration, you can perform the following actions:

See Use an existing configuration for more information.

Create a new configuration

From the main view, click on the bottom right New Configuration button. The new configuration screen displays.

/harvestAsset_newConf.png

In this screen, start by adding a configuration name. This is necessary for displaying more actions.

/harvestAsset_newConf2.png

On the top of this screen, you can see options: 

In the hierarchical view, you can perform multiple actions:

/harvestAsset_hierarchyActions.png

1)

This action allows you to unselect all children and keep the parent node selected. This is only available if the node has children.

2)

Select or unselect the node for harvesting. If the node has children, this action will also select/unselect all children.

3)

Expand more or expand fewer children. Clicking on this icon will display children if they exist, if not you will see No data message. If you have connection issues with TDV, you may see an error here.

/harvestIcon.png

This is not an action but the name of the instance/asset with the logo. The subtitle contains the following storage information: - New: this is a new instance/asset - Stored: instance/asset is already in the EBX® Metadata Management application.

Once you give your configuration a name and if at least one element is selected for harvesting, you will see Save and Save and Run buttons on the bottom:

Data profiling

Data profiling is available for following asset types:

If one of such assets is present in the scope of your harvesting (as a root asset or as a descendant), you can activate data profiling for it.

Please, note that data profiling can take time to be gathered from the physical data source, so we recommend to only activate this option for scheduled configurations and do not use it with direct running.

Following information can be retrieved from the data source:

/dataProfiling.png

Run configuration

If you select Save and Run from the configuration view, you will start the harvesting. First the application needs to communicate with TDV, that’s why you will see the progress view.

After that, the progress table displays.

/harvestAsset_progressTable.png

This table is auto-updated to display the status of the harvesting. You have multiple available actions:

Once you validate results, you will see on the main view that your configuration status is updated.

Use an existing configuration

From the main harvesting view, if you have existing configurations, the following actions are available:

/run.png

1)

Run the configuration. You will see the progress view, then the progress table.

2)

Schedule the configuration.

3)

Configure the current configuration. You will see the configuration screen.

4)

Extend the configuration with current selection. The configuration screen displays and the harvesting scope is merged with your current selection.

5)

Duplicate the current configuration. A copy of this configuration displays on the configurations list.

/progress.png

See the progress table view. This option is only available if the configuration status is Pending for approval.

Selecting Configure displays the configuration screen. This screen is similar to the one evoked in the Create a new configuration section but the instance/asset hierarchy, configuration name, configuration options are pre-populated. You will be able to change them and save it.

Schedule a harvesting

You can schedule one or several executions of an existing harvesting configuration. There are two ways to do so:

Clicking on the button displays a dialog box where you can configure the scheduling.

/scheduleDialogBox.png

To save the scheduling you have to insert the date. Date This field is mandatory Insert the date and time here. You can use the arrow on the right to display the calendar. Interval between multiple scheduling if repeat is set. Unit of the interval. By default, it is in minutes but you can change it to hours or days. Repeat Number of repeats for the scheduling. To repeat this harvesting to infinite, check the Infinite checkbox.

Example: You want to schedule the harvesting every month. The date is 01/01/current year, interval 28 days and repeat is 12.

If you want to cancel harvesting, you have to be sure that the configuration is already scheduled. You can see it from the main view, the configuration will have the status Scheduled and the next execution date.

/scheduledConfiguration.png

By selecting the calendar icon near the configuration, the schedule dialog box will display a Remove schedule button. Click on it to cancel scheduling.

/cancelScheduleConfiguration.png

Create data source from instance

If you wish to harvest metadata for a data source that is not yet configured in TDV, you can configure it directly from the EBX® Metadata Management application.

Prerequisites

To do so, first you need to check that all necessary information is provided in the instance’s attributes and that the instance type is supported for creation by the EBX® Metadata Management application.

Currently, you can create:

A complete list of supported adapters is available in the Administration - Datasource Types table with attribute Creatable? = Yes.

An instance that you want to use to create a data source in TDV must provide necessary properties to connect to the physical data source from TDV. Below is the list of mandatory attributes depending on the type of the data source.

Create and harvest

Start creation of the new harvesting configuration for your instance as described in Create a new configuration section.

Click to expand children of the instance and the Create datasource button displays.

/createDatasource.png

Click on it to show the list of available Metadata Agents. Click on the agent that you wish to use for data source creation to launch the process.

/selectAgentForCreation.png

Wait for the process to complete. In case the creation was successful you will see a message Done in green. Finalize the process by selecting Finish.

/finalizeCreation.png

This should reload the harvesting screen and allow you to see the direct children of the instance.

/creationResults.png

You can now continue with harvesting configuration as described in previous sections.

Specific Cases

Harvesting REST Resources

The EBX® Metadata Management application supports harvesting of REST resources documented manually in TDV. This comes with a specific behavior due to the way resources are documented in TDV. Indeed, the EBX® Metadata Management application represents REST resources according to the RESTful architecture which is not the case in TDV.

Here is the way REST Resources are displayed:

/restArchitecture.png

Harvesting configurations on Assets under Operation in the hierarchy are not possible since these resources are created by the agent and do not exist in TDV.

During the configuration of a harvesting from a REST Datasource, you will see all the operations registered in TDV, but no REST Resource asset as they are created by the EBX® Metadata Management application.

Documentation > User Guide > EBX® Metadata Management > Advanced data services