Column Matching


When you combine data from different data tables in one visualization, you need to consider how the columns in the data tables match. A rule of thumb is that all the columns you are planning to group the visualization by, should exist in all the data tables. For example, columns that should define what a marker is in a scatter plot, columns you wish to set on the category axis of a bar chart, or columns you wish to trellis or color the visualization by. If your data is structured that way, setting up the visualization and matching columns will be easy. However, there are cases when it is in fact OK that some columns in the main data table do not have matches in all the data tables. And, even if the data is set up the recommended way, you may sometimes need to make a few manual adjustments. This will be explained further down in this topic.

Automatic Column Matching

When adding multiple data tables to an analysis, Spotfire will automatically match columns with the same name and data type. For example, if you load the data tables below into Spotfire, a match will automatically be added between the columns named 'Product':

md_example_column_matches_automatic.png

Considering the data available in those two data tables, this match is likely to be the one you want. To compare sales for 2011 and 2012, you probably want to group the visualization by the column named 'Product', like in the bar chart below, where 'Product' is used on the category axis:

md_example_column_matches_automatic_bar_chart.png

Manual Column Matching

If you load the data tables below into Spotfire, matches will automatically be added only between the columns named 'Region' because that is the only column with identical name in all of the data tables.

md_example_underlying_data_tables_2.png

But if you want to compare the performance of the four salespersons rather than sales per region, you need to use a column containing the names of the persons on an axis that groups the visualization in some way; for instance on the category axis of a bar chart. In this case, those columns are named 'Sales Person', 'Salesperson', and 'Sales Rep' respectively, which means they are not identical and have not been matched automatically when loaded. But the columns do match, because they contain the same values, so they can indeed be used to group an axis in the visualization. To do so, the matches must be added manually. To use the data table named 'Sales Current Year' as the main data table, you need to add two matches: 'Salesperson' – 'Sales Person' and 'Salesperson' – 'Sales Rep'. As long as you will use the 'Sales Current Year' as the main data table, it is not necessary to add a match between 'Sales Person' and 'Sales Rep', but it is no harm done adding a match between them as well.

  1. Load the data tables into Spotfire.

  2. Go to Edit > Data Table Properties and open the Column Matches tab.

  3. In the Data tables list, select the data table you are going to use as the main data table in the visualization: 'Sales Current Year'.

  4. Click on the New... button to the right of the list showing the Current matches for the selected data table.

  5. Response: The New Match dialog is opened.

  6. In the Right data table drop-down list, select the 'Sales Previous Year' data table.

  7. Comment: The data table in the Left data table drop-down list is already set to 'Sales Current Year' because this data table was selected in the Data Table Properties dialog.

  8. In the Left column selector, choose 'Salesperson'.

  9. In the Right column selector, choose 'Sales Person'.

  10. Click OK to close the dialog.

  11. Repeat steps 4-8 for the second column match, but select the 'Targets' as the Right Data Table, and 'Sales Rep' as the Right Column instead.

  12. Response: The matches have been added to the list of Current matches for the selected data table, see below.

  13. Click OK to close the Data Table Properties dialog.

  14. Response: The matches have now been added to the selected data tables.

In the example above, the columns had to be matched manually because the column names were not the same, but it could also be the case that the values in the columns are not in the same format.

Perhaps you have two data tables which both contain a string column named ‘Type’. In one data table, the values are written in lowercase (‘apples’), while in the other data table they are written in uppercase (‘APPLES’). Since the columns have the same name and data type they will be matched automatically. However, to be able to combine data from these columns in one visualization, the values in one column must be converted so that they are written in the same way in both columns.

  1. Go to Edit > Data Table Properties and open the Column Matches tab.

  2. In the Data tables list, select the data table of interest.

  3. Select the match between the two columns named ‘Type’ in the Current matches for the selected data table.

  4. Click Edit....

  5. Response: The Edit Match dialog is opened.

  6. Make sure the columns named ‘Type’ are selected in the Left column and Right column selectors.

  7. Open the column selector for the column containing values in uppercase, and select Methods to open the pop-up menu.

  8. Select Lower.

  9. Response: In the Sample value fields you can now see that the values are written as ‘apples’ in both columns.

    md_example_different_format_1.png

  10. Click OK to close the dialog.

  11. Click OK to close the Data Table Properties dialog.

  12. Response: The values in the columns are now in the same format and can be combined in on visualization.

A similar situation might be if you have two columns where date or time values are in different formats. For example, in one data table there might be an integer column named 'Year', with values such as 2011, 2012, etc. And in another data table there might be a column named 'Transaction Date', which contains values in the format datetime. To add a match between these columns, you need to use the method Year on the 'Transaction Date' column. Then the values in the two columns will be in the same format; 2011, 2012, etc.

md_example_different_format_2.png

Viewing the Current Column Matches

When the matches mentioned above have been added, they are shown in the Column Matches tab:

md_example_view_column_matches.png

Note that column matches for the column named 'Region' had already been added automatically.

With the added matches you can now create a visualization where 'Sales Current Year' is the main data table, and the column 'Salesperson' is used on a grouping axis. For example in a bar chart, as seen below:

md_example_column_matches_manual_bar_chart.png

Missing Column Matches

As mentioned above, a rule of thumb is to make sure that all the columns you want to group the visualization by exist in all the data tables that will be used in the visualization. But there are cases when the visualization will be valid even if columns that are used on a grouping axis do not have matches to columns in the other data tables. The examples below illustrate this.

The Visualization is Valid

If you know that a category is unique, you can use it on a grouping axis even if the column only exists in the main data table. The left-most data table below contains information about employees in a sales organization: their names, gender, and office location. The right-most data table contains the total sales for each person.

md_example_missing_match_ok_dt.png

The only possible column match for the data tables is 'Name', because that is the only column that exists in both data tables. The 'Employees' data table contains only categories, while the 'Sales' data table contains the sales figures you probably want to compare. Combining these two data tables would make it possible to look at the data from location and gender perspective as well as just comparing the sales figures per person. And because each person belongs to only one gender and one office location in the 'Employees' data table, you can actually group by all the columns in that data table, provided that you select 'Employees' as the main data table, as seen in the bar chart below:

md_example_missing_match_ok_bar_chart.png

'Gender' and 'Name' define the category axis, while 'Location' defines the colors of the bars.

If you are going to group a visualization by unmatched columns, it is important to make sure the categories are unique. If the categories are not unique, you will end up with a visualization that shows incorrect data, as seen in the next example.

The Visualization is NOT Valid:

The data tables below are similar to the ones in the previous example, but the 'Employees' data table has been updated to reflect that Anna is actually working out of both offices. The 'Sales' data table does not have to be updated due to this change, because the amount Anna sold for last year has not changed, and has nothing to do with which offices she works out of.

md_example_missing_match_not_ok_dt.png

However, this change in the main data table will affect the visualization. Because the bars are colored by 'Location', and Anna appears in both 'East' and 'West', she will now appear twice; once for the East office, and once for the West office. This means that it now looks like Anna has sold for twice the amount she actually has, which is of course incorrect.

md_example_missing_match_not_ok_bar_chart.png

With Anna working out of two offices, 'Location' can no longer be used on a grouping axis in the visualization. However, by removing 'Location' from the color axis, you can turn the bar chart into a valid visualization again:

md_example_missing_match_ok_bar_chart_2.png

Now each person appears only once in the visualization, and while you cannot group by location, you can still group by gender.

Warnings About Missing Matches

As the example above illustrates, it is important to make sure no categories are repeated if you group the visualization by a column that is only located in the main data table. Therefore, Spotfire will display a warning when you group by columns without matches in other data tables:

md_example_missing_match_warning.png

You can click on the icon to see the whole message, which includes a list of all the columns without matches. The message also provides a link to the Visualization Properties, where you can review the current matches used in the visualization, and if necessary, you can add new matches there as well. When you have made sure the visualization does not show repeated categories, you can if you like, turn off future notifications about missing matches in the visualization. But if you do so, it is important to make sure no future changes of the visualization configuration introduce repeated values, because from then on, you will not be notified about this unless you turn on notifications again.

  1. Open the Visualization Properties dialog, and go to the Data page.

  2. Scroll down to Data table matching.

  3. Clear the check box Show notifications about mismatches.

  4. Response: Notifications will no longer be shown in the visualization.

    Comment: If a warning is already shown in the title bar of the visualization, you can open the message and click on the link Click here to review matches to come directly to the Data page in the Visualization Properties dialog, as mentioned above.

See also:

Multiple Data Tables in One Visualization

Column Matching

More Examples