Working with Scoring Flows and Pipelines

This page explains how you can create Scoring Pipelines and Scoring Flows.

Contents

Overview

  • Scoring Pipeline: A scoring pipeline is a design-time artifact that defines a scoring pipeline. A scoring pipeline defines a data source, a data sink, one or more scoring flows, and zero or more models used in the scoring flow.
  • Scoring Flow: A scoring flow is an ordered sequence of processing steps that operate on data received from a data source and sent to a data sink. The data flowing through a scoring flow can be transformed and augmented by processing steps.

Working with Scoring Flows and Scoring Pipelines

Creating a Scoring Flow

  1. In the Project Explorer pane, under Overview section, click Scoring Flows.
  2. Click ADD A SCORING FLOW to create a new scoring flow. You can also click Add one to add a new scoring flow if there are none present.
  3. Select the project name from the list.
  4. Add the flow name. The extension is added automatically.

  5. Click FINISH.

Authoring a Scoring Flow

  • In the Project Explorer pane, under Overview section, click Scoring Flows.
  • Select the scoring flow.
  • Select a scoring flow template by clicking the Load template flows option under Edit section.

  • The Score template can be configured by following these steps:

    1. Configure the Input block.
      1. Click Input block to open the Input processing step’s configuration form.
      2. Click the drop-down menu on the Load Schema From option.
      3. Next, click Data Source and select the appropriate data source.

      4. Doing this automatically populates the input data source schema fields.

      5. Click SAVE.
    2. Configure the Score block.
      1. Click Score block to open the Score processing step’s configuration form.
      2. Under the Model section, select audit.pmml model from the drop-down list.
      3. Under the Scoring Request Fields section, click the Add Input Schema of Selected Model option.
      4. Next, from Populate unset values with entries list, select “INPUT_DATA” of step “Input”.

      5. Doing so populates all the scoring request fields.

      • Click SAVE.
    3. Configure the Output block.
      1. Click Output block to open the Output processing step’s configuration form.
      2. Click the drop-down menu for the Load Schema from option
      3. Select Model Schema > audit.pmml > audit-output.avsc schema.

      4. Next, from Populate unset values with entries list, select the input from the Score step.

      5. Click SAVE.

    Once every block is configured and saved, you can publish the scoring flow to Published Space.

  • The Python Data Prep, Score, Decisions template can be configured by following these steps:
    1. The CreditRisk project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      1. Click the Input processing step to open its configuration form.
      2. Click Load Schema From > Data Source > select the data channel whose schema matches with credit-risk-input.csv.

      3. Click SAVE.

    3. Configure the Data Prep block.
      1. Click the Data Prep processing step to open its configuration form.
      2. Under Script, select encoding-categorical-inputs.py.
      3. Under Input Field Names, check INPUT_DATA (from “Input”).
      4. Under Package Dependencies, select requirements.txt.

      5. Under Output Field Names, click 0 of 0 Output Variables are set caret and click the plus sign (+).

      6. Add DATA_PREP_OUT variable as Schema name.
      7. Set the schema for DATA_PREP_OUT variable by clicking Load Schema From > Model Schema > credit-risk-model.pmml > credit-risk-model-input.avsc.

      8. Click SAVE.

    4. Configure the Score block.
      1. Click the Score processing step to open its configuration form.
      2. Select the credit-risk-model.pmml model from the drop-down list.
      3. Click Add Input Schema of Selected Model and observe 0 of 46 Fields are set.
      4. From Populate unset values with entries list, select “DATA_PREP_OUT” of step “Data Prep”.

      5. Click SAVE.

    5. Configure the Decisions block.
      1. Click the Decisions processing step to open its configuration form.
      2. Under Script, select post-scoring-decisioning.py.
      3. Under Input Field Names, check SCORE (from “Score”).
      4. Under Package Dependencies, select requirements.txt.

      5. Under Properties, go to 0 of 0 Output Variables are set and add the following two parameters:

        • Name: PARAM_DEFAULT, Value: 0.2
        • Name: PARAM_POLICY_OVERRIDE, Value: 0.1

      6. Under Output Field Names, click 0 of 0 Output Variables are set and click the plus sign (+) to add a new field.

      7. Add POST_SCORING_DECISION_OUT for the Name.
      8. Click Schemas not set and add the following two fields:
        • Final_Credit_Approval_Decision (string)
        • Policy_Followed (string)

      9. Click SAVE.

    6. Configure the Output block.
      1. Click the Output processing step to open its configuration form.
      2. Under Output Record, click the plus sign (+) next to 0 of 0 Fields are set and add these 5 fields:
        • Name: Predicted_CreditStanding_Bad
        • Name: Probability_0
        • Name: Probability_1
        • Name: Final_Credit_Approval_Decision
        • Name: Policy_Followed

      3. To populate the values for these fields, click Populate unset values with entries and select “SCORE” of step “Score”.

      4. Next, click Populate unset values with entries again and select “POST_SCORING_DECISION_OUT” of step “Decisions”.

      5. Click SAVE.

  • The Score, Compute Metrics, Publish Metrics template can be configured by following these steps:
    1. The Audit project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      1. Click Input block to open the Input processing step’s configuration form.
      2. Click the drop-down menu on the Load Schema From > Data Channel > and select the schema that matches with the model.

      3. Click SAVE.

    3. Configure the Score block.
      1. Open the Score processing step’s configuration form: click the Score block.
      2. Under Model section, select audit.pmml model from the drop-down list.
      3. Under Scoring Request Fields section, click the Add Input Schema of Selected Model option.
      4. Next, from Populate unset values with entries list, select “INPUT_DATA” of step “Input”.

      5. Click SAVE.

    4. Configure the Create Matrix block.
      1. Click Create Matrix block to open the configuration form.
      2. Under Matrix Fields, click 0 of 0 Fields are set and add the following fields:
        • Name: ObservedValue; Column Value: INPUT_DATA.TARGET_Adjusted
        • Name: PredictedValue; Column Value: SCORE.scoreOutput.TARGET_Adjusted

      3. Click SAVE.

    5. Configure the Compute Matrix block.
      1. Click Compute Matrix block to open the configuration form.
      2. For Input Variable Field Name, select MATRIX (from “Create Matrix”) from the drop-down list.
      3. For Metrics to Compute section, select Misclassification rate, Chi square, G square, and F1 score from the drop-down list.
      4. Select Observed Column Name as ObservedValue
      5. Select Predicted Column Name as PredictedValue

      6. Click SAVE.

    6. Configure the Publish Matrix block.
      1. Click Publish Matrix block to open the configuration form.
      2. For the Metrics to Publish, select Misclassification rate, Chi square, G square, and F1 score.

      3. Click SAVE.

    7. Configure the Output block.
      1. Click Output block to open the Output processing step’s configuration form.
      2. Click 0 of 0 Fields are set under Output Record section and add the following 4 fields:
        • Name: Chi_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_chi_square
        • Name: F1_Score, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_f1_score
        • Name: G_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_g_square
        • Name: Misclassification_Rate, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_misclassification_rate

      3. Click SAVE.

Creating a Scoring Pipeline

  1. In the Project Explorer pane, under Overview section, click Pipelines.
  2. Click ADD A SCORING PIPELINE to create a new scoring pipeline. You can also click Add one to add a new scoring pipeline if there are none present.
  3. Select the project name from the list.
  4. Add pipeline name. The extension is added automatically.

  5. Click FINISH.

Authoring a Scoring Pipeline

  1. In the Project Explorer pane, under Overview section, click Pipelines.
  2. Select the scoring pipeline.
  3. If you are using Data Source and Data Sink as input and output, select the Connect to deployed Data Channels for input and output.
    1. Select the scoring flow from the drop-down list under Scoring Flows section.
    2. Add Data Source and Data Sink from the drop-down list under the Data Channels section.

    A data source is used to supply data to a flow to process it. A data sink is a place where the data processed by the flow is sent.

  4. If you want to use REST Request-Respone, select the Expose via REST Request-Response.

    1. Select the scoring flow from the drop-down list under Scoring Flows section.
    2. Add channel endpoint path prefix, session timeout time, unique identifier, and external host name under their respective field.

  5. Click SAVE to save the changes.

Promoting a Scoring Pipeline

  1. In the Project Explorer pane, click Pipelines.
  2. Select the pipeline that needs to be promoted by clicking the check box.
  3. Click Approve present at the bottom of the pipeline list.

  4. Turn on the toggle for Development environment and click CLOSE.

Pipeline Deployment

  1. In the Project Explorer pane, click Scoring Pipeline Deployments.
  2. Select the DEPLOY A SCORING PIPELINE option.
  3. Add name and description in the respective field.
  4. Select scoring pipeline from the drop-down list.
  5. Select a scoring environment from the drop-down list.
  6. Select when you need to schedule the job from the given options (Immediate, Future, or Periodic).
  7. Select the duration for which you need to run the job. You can run the job forever or add the duration as per your needs.

  8. Click DEPLOY.

Note: The Scoring Pipeline Deployments section allows you to select the Run forever option. However, after a single file is processed by a File Data Source, the flow would be stopped and marked Complete and no longer accept new input.