Discovery Guide > Introducing Discovery > About the Relationships Discovered > About the Relationship Probability Score (RPS)
 
About the Relationship Probability Score (RPS)
Discovery controls what relationships are discovered and how they are scored based on Discovery’s internal RPS algorithms. Discovered relationships are assigned a RPS based on these factors:
Column Name Comparison Factor
Index Key Factor
Match Percentage Factor
Number of Matches Factor
Schema Locality Factor
The factors are multiplied by their weights (which total 100%) and then added together to arrive at the total score for the relationship using this formula:
Score = columnNameComparisonFactor * 40% + indexKeyFactor * 30% + matchPercentageFactor * 10% + numberOfMatchesFactor * 10% + schemaLocalityFactor * 10%
 
The factor weights are configurable for non-string data types. See Adjusting the Weights of the RPS Factors for information about changing the weights of these factors.
The following table describes these factors.
Factor
Description
Column Name Comparison
This factor is multiplied by its weight to get the name component of RPS. It ranges from 0 to 1, with 1 being an exact match and 0 being no match.
1.0—The column name of c1 and c2 match exactly.
0.9—The column names match exactly with non-alphanumeric characters removed.
   Example: users.user_id has an 0.9 factor when compared with users.userid.
0.9—One column name ends with the other column name.
   Example: users.user_id is given an 0.9 when compared with sales.sold_to_user_id
   Example: term.term_id is given an 0.9 when compared with pymts.pymt_term_id
0.9—The table name of one column name is part of the other column name.
   Example: issue.id is given an 0.9 when compared with status.issue
0.8-0.5—Column values have similar names (to handle misspelling names).
   Example: cust.user_id is given a factor of 0.5-0.8 when compared with cust.usee_id
Index Key
This factor is multiplied by its weight to get the index key component of RPS. It is in the range from 0 to 1 based on the likelihood that one of the columns in the relationship is a key column:
1.0—The relationship cardinality is one-to-one, many-to-one, or one-to-many; and either column has more than 90% unique values.
0.5—The relationship cardinality is many-to-many with less than 90% unique values in both columns.
Match Percentage Factor
This factor is multiplied by its weight to get the match percentage component of RPS. It is calculated using this formula:
[# matches]/ MIN ([# unique values in c1], [# unique values in c2])
[# matches] is the number of unique values in both column1 and column2.
See Adjusting the Minimum Unique Percentage for information about adjusting the threshold of value uniqueness.
   Example: If the number of unique values in c1 is 100, the number of unique values in c2 is 50, and the number of unique values appearing in both c1 and c2 is 40. In this case, the factor is equal to 40/MIN(50,100)= 40/50=0.8.
Number of Matches Factor
This factor is multiplied by its weight to get the number of matches component of RPS:
1.0—[number of matches] => 10 else
[factor]—[number of matches]/10
By default, if the minimum number of matches is less than 3, the relationship is not discovered.
Schema Locality Factor
This factor is multiplied by its weight to get the schema locality component of RPS:
1.0—Two columns are from the same data source.
0—The columns are not from the same data source.