Working with Scoring Flows and Pipelines

This page explains how you can create Scoring Pipelines and Scoring Flows.

Contents

Overview

  • Scoring Pipeline: A scoring pipeline is a design-time artifact that defines a scoring pipeline. A scoring pipeline defines a data source, a data sink, one or more scoring flows, and zero or more models used in the scoring flow.
  • Scoring Flow: A scoring flow is an ordered sequence of processing steps that operate on data received from a data source and sent to a data sink. The data flowing through a scoring flow can be transformed and augmented by processing steps.

Working with Scoring Flows and Scoring Pipelines

Creating a Scoring Flow

  1. In the Project Explorer pane, under Overview section, click Scoring Flows.
  2. Click Create a scoring flow to create a new scoring flow. You can also click Add one to add a new scoring flow if there are none present.
  3. Select the project name from the list.
  4. Add the flow name. The extension is added automatically.

  5. Click Create.

Authoring a Scoring Flow

  • In the Project Explorer pane, click Scoring Flows.
  • Select the scoring flow.
  • Select a scoring flow template by clicking the Load template flows option under Edit section.

  • The Score template can be configured by following these steps:

    1. Configure the Input block.
      1. Click Input block to open the Input processing step’s configuration form.
      2. Click the drop-down menu on the Load Schema From > Model Schema > and select the schema that matches with the model

      3. Doing this automatically populates the input data source schema fields.

      4. Click Save.
    2. Configure the Score block.
      1. Click Score block to open the Score processing step’s configuration form.
      2. Under the Model section, select audit.pmml model from the drop-down list.
      3. Under the Scoring Field Values section, click the Add Input Schema of Selected Model option.
      4. Next, from Populate unset values with entries list, select “INPUT_DATA” of step “Input”.

      5. Doing so populates all the scoring request fields.

      6. Click Save.
    3. Configure the Output block.
      1. Click Output block to open the Output processing step’s configuration form.
      2. Click the drop-down menu for the Load Schema from option
      3. Select Model Schema > audit.pmml > audit-output.avsc schema.

      4. Next, from Populate unset values with entries list, select the input from the Score step.

      5. Click Save.

    Once every block is configured and saved, you can publish the scoring flow to Published Space.

  • The Python Data Prep, Score, Decisions template can be configured by following these steps:
    1. The CreditRisk project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      1. Click the Input processing step to open its configuration form.
      2. Click Load Schema From > Data Source > select the data source whose schema matches with the model.

      3. Click Save.

    3. Configure the Data Prep block.
      1. Click the Data Prep processing step to open its configuration form.
      2. Under Script, select encoding-categorical-inputs.py.
      3. Under Input Variable Names, check INPUT_DATA (from “Input”).
      4. Under Requirements, select requirements.txt.

      5. Under Output Fields, click Load Schema From > Model Schema > credit-risk-model.pmml > credit-risk-model-input.avsc.

      6. Click Save.

    4. Configure the Score block.
      1. Click the Score processing step to open its configuration form.
      2. Select the credit-risk-model.pmml model from the drop-down list.
      3. Click Add Input Schema of Selected Model and observe 0 of 46 Fields are set.
      4. From Populate unset values with entries list, populate the values in the appropriate fields.

      5. Click Save.

    5. Configure the Decisions block.
      1. Click the Decisions processing step to open its configuration form.
      2. Under Script, select post-scoring-decisioning.py.
      3. Under Input Variable Names, check SCORE (from “Score”).
      4. Under Requirements, select requirements.txt.

      5. Under Parameters, go to 0 of 0 Output Parameters are set and add the following two parameters:

        • Name: PARAM_DEFAULT, Value: 0.2
        • Name: PARAM_POLICY_OVERRIDE, Value: 0.1

      6. Click Scehmas has 0 Entries and click the plus sign (+) to add a new field.

      7. Add the following two fields:
        • Final_Credit_Approval_Decision (string)
        • Policy_Followed (string)

      8. Click Save.

    6. Configure the Output block.
      1. Click the Output processing step to open its configuration form.
      2. Under Output Record, click the plus sign (+) next to 0 of 0 Fields are set and add these 5 fields:
        • Name: Predicted_CreditStanding_Bad
        • Name: Probability_0
        • Name: Probability_1
        • Name: Final_Credit_Approval_Decision
        • Name: Policy_Followed

      3. To populate the values for these fields, click Populate unset values with entries and select “SCORE” of step “Score”.

      4. Next, click Populate unset values with entries again and select “Final_Credit_Approval_Decision” of step “Decisions” and “Policy_Followed” of step “Decisions”.

      5. Click Save.

  • The Score, Compute Metrics, Publish Metrics template can be configured by following these steps:
    1. The Audit project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      1. Click Input block to open the Input processing step’s configuration form.
      2. Click the drop-down menu on the Load Schema From > Model Schema > and select the schema that matches with the model.

      3. Click Save.

    3. Configure the Score block.
      1. Open the Score processing step’s configuration form: click the Score block.
      2. Under Model section, select audit.pmml model from the drop-down list.
      3. Under Scoring Field Values section, click the Add Input Schema of Selected Model option.
      4. Next, from Populate unset values with entries list, select “INPUT_DATA” of step “Input”.

      5. Click Save.

    4. Configure the Create Matrix block.
      1. Click Create Matrix block to open the configuration form.
      2. Click 0 of 0 Fields are set under Column Values section and add the following 2 fields:
        • Name: Observed_Value, Column Value: INPUT_DATA.TARGET_Adjusted
        • Name: Predicted_Value, Column Value: SCORE.scoreOutput.TARGET_Adjusted
      3. Add the Output variable Name as MATRIX.
      4. Under Output Type,select from one of the following options:
        • Row added: A matrix is emitted for each record that is added so long as the minimum number of rows has also been reached.
        • Any selected fields: A matrix is emitted when the value of any of the fields marked as selected in Output Fields have a value change as compared to the previous row of the matrix. If there is no previous row, a matrix is not emitted.
        • All selected fields: A matrix is emitted when the value of all the fields marked as selected in Output Fields have a value change as compared to the previous row of the matrix. If there is no previous row, a matrix is not emitted.
        • Interval: A record is emitted at a set interval. Each field of the output record is replaced by any non-null result of evaluating the input record against the expression given in the Output Tab.
      5. Select a value form the drop-down list for TIme Unit field.
      6. Select the sorting order under the Sort Field section.
      7. Add the time between emitting matrix under the Interval section.
      8. Specifiy the minimum and maximum number of rows in the respective fields.
      9. Checking off the Clear Matrix On Omit box will clear the matrix when emitted.

      10. Click Save.

    5. Configure the Compute Metrics block.
      1. Click Compute Metrics block to open the configuration form.
      2. For Input Variable Field Name, select MATRIX (from “Create Matrix”) from the drop-down list.
      3. For Metrics to Compute section, select Misclassification rate, Chi square, G square, and F1 score from the drop-down list.
      4. Select Observed_Value for Observed Column Name.
      5. Select Predicted_Value for Predicted Field Name.
      6. Under Output Variable Name, add COMPUTED_METRICS.

      7. Click Save.

    6. Configure the Publish Metrics block.
      1. Click Publish Metrics block to open the configuration form.
      2. For Metrics to Publish, select Misclassification rate, Chi square, G square, and F1 score.

      3. Click Save.

    7. Configure the Output block.
      1. Click Output block to open the Output processing step’s configuration form.
      2. Click 0 of 0 Fields are set under Output Record section and add the following 4 fields:
        • Name: Chi_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_chi_square
        • Name: F1_Score, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_f1_score
        • Name: G_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_g_square
        • Name: Misclassification_Rate, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_misclassification_rate

      3. Click Save.

Creating a Scoring Pipeline

  1. In the Project Explorer pane, under Overview section, click Scoring Pipelines.
  2. Click Create a scoring pipeline to create a new scoring pipeline. You can also click Add one to add a new scoring pipeline if there are none present.
  3. If ModelOps contains only one project, the project name will be selected automatically. Else, you can select the project name from the list.
  4. Add pipeline name and description.

  5. Click Create.

Authoring a Scoring Pipeline

  1. In the Project Explorer pane, click Scoring Pipelines.
  2. Select the scoring pipeline.
  3. Select the one of the scoring flows from the drop-down list under the Scoring Flows section.
  4. Select Connect to deployed Data Channels for input and output or Expose as a Service (REST).
  5. Configure the Data Channels or REST service depending on the selection above.
  6. Click Save to save the changes.

Note:

  • Only one scoring flow is allowed when exposing the pipeline as a REST service.
  • Scoring pipelines can be defined to use data channels or be exposed directly as a REST service.
  • When using data channels, the data channels must already be deployed.
  • When using a REST service, a REST API is automatically made available for the pipeline.

Using Data Channels

When a pipeline uses data channels, a Data Source provides the input data and a Data Sink receives the output data after processing by a pipeline.

  1. Select a deployed data source from the Select a deployed Data Source drop-down.
  2. Select a deployed data sink from the Select a deployed Data Sink drop-down.

Using a REST Service

When a pipeline is exposed as a REST service, this configuration controls how the REST API is exposed.

For more details on how this configuration is used, see Using the REST APIs chapter .

Name Default Description
Enable Request Tracing Unchecked Enable tracing.
Endpoint Path Prefix None A unique URL path prefix for the REST API. For more details, see Addresses. Required value.
Login Session Timeout (Seconds) 0 Idle login session expiration. The login session is terminated if it is idle for a duration longer than this timeout value. A value of zero disables expiration.
Public Checked Expose endpoint outside of the ModelOps Kubernetes cluster and update DNS with the REST API URL.
Subdomains None A DNS subdomain for this REST API. Must conform to DNS label restrictions. For more details, see Addresses. Required value.
Unique Identifier Field None Request field to correlate responses. For more details, see REST Pipelines. Required value

Promoting a Scoring Pipeline

  1. In the Project Explorer pane, click Scoring Pipelines.
  2. Select the pipeline that needs to be promoted.
  3. Click Approve present in the right panel.

  4. Turn on the toggle for Development environment and close the pop-up window.

Pipeline Deployment

  1. In the Project Explorer pane, click Deployments.
  2. Click Deploy new and select Scoring Pipeline from the drop-down list.
  3. Add name and description in the respective field.
  4. Select scoring pipeline from the drop down list.
  5. Select a scoring environment from the drop-down list.
  6. Select when you need to schedule the job from the given options (Immediate, Future, or Periodic).
  7. Select the duration for which you need to run the job. You can run the job forever or add the duration as per your needs.

  8. Click Deploy. You can see the deployed pipeline in the list on the main screen.

Note: The Scoring Pipeline Deployments section allows you to select the Run forever option. However, after a single file is processed by a File Data Source, the flow would be stopped and marked Complete and no longer accept new input.