Working with Scoring Flows and Pipelines

This page explains how you can create Scoring Pipelines and Scoring Flows.

Contents

Overview

  • Scoring Pipeline: A scoring pipeline is a design-time artifact that defines a scoring pipeline. A scoring pipeline defines a data source, a data sink, one or more scoring flows, and zero or more models used in the scoring flow.
  • Scoring Flow: A scoring flow is an ordered sequence of processing steps that operate on data received from a data source and sent to a data sink. The data flowing through a scoring flow can be transformed and augmented by processing steps.

Working with Scoring Flows and Scoring Pipelines

Creating a Scoring Flow

  1. On the main screen, click the drop down menu on Artifacts tab and select Scoring Flow.
  2. Click ADD A SCORING FLOW to create a new scoring flow. You can also click Add one to add a new scoring flow if there are none present.
  3. Select the project name from the list.
  4. Add the flow name. The extension is added automatically.

  5. Click FINISH.

Authoring a Scoring Flow

  • Select the scoring flow.
  • Select a scoring flow template by clicking on the load template flows option under Edit section.

  • The Score template can be configured by following these steps:

    1. Configure the Input block.
      • Click Input block to open the Input processing step’s configuration form.
      • Click the drop down menu on the Load Schema From option.
      • Next, click Data Channel and select the appropriate data channel.

      • Doing this automatically populates the input data source schema fields.

      • Click SAVE.
    2. Configure the Score block.
      • Click Score block to open the Score processing step’s configuration form.
      • Under the Model section, select audit.pmml model from the drop down list.
      • Under the Scoring Request Fields section, click the Add Input Schema of Selected Model option.
      • Next, click the drop down menu for Populate unset values with entries and select “INPUT_DATA” of step “Input”.

      • Doing so populates all the scoring request fields.

      • Click SAVE.
    3. Configure the Output block.
      • Click Output block to open the Output processing step’s configuration form.
      • Click the drop down menu for the Load Schema from option
      • Select Model Schema > audit.pmml > audit-output.avsc schema.

      • Next, click the drop down menu for the Populate unset values with entries option and select the input from the Score step.

      • Click SAVE.

    4. Once every block is configured and saved, you can publish the scoring flow to Published Space.
  • The Python Data Prep, Score, Decisions template can be configured by following these steps:
    1. The CreditRisk project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      • Click the Input processing step to open its configuration form.
      • Click Load Schema From > Data Channel > select the data channel whose schema matches with credit-risk-input.csv.

      • Click SAVE.

    3. Configure the Data Prep block.
      • Click the Data Prep processing step to open its configuration form.
      • Under Script, select encoding-categorical-inputs.py.
      • Under Input Field Names, check INPUT_DATA (from “Input”).
      • Under Package Dependencies, select requirements.txt.

      • Under Output Field Names, click 0 of 0 Output Variables are set caret and click the plus sign (+).

      • Add DATA_PREP_OUT variable as Schema name.
      • Set the schema for DATA_PREP_OUT variable by clicking Load Schema From > Model Schema > credit-risk-model.pmml > credit-risk-model-input.avsc.

      • Click SAVE.

    4. Configure the Score block.
      • Click the Score processing step to open its configuration form.
      • Select the credit-risk-model.pmml model from the drop down list.
      • Click Add Input Schema of Selected Model and observe 0 of 46 Fields are set.
      • Click Populate unset values with entries and select “DATA_PREP_OUT” of step “Data Prep”.

      • Click SAVE.

    5. Configure the Decisions block.
      • Click the Decisions processing step to open its configuration form.
      • Under Script, select post-scoring-decisioning.py.
      • Under Input Field Names, check SCORE (from “Score”).
      • Under Package Dependencies, select requirements.txt.

      • Under Properties, go to 0 of 0 Output Variables are set and add the following two parameters:

        • Name: PARAM_DEFAULT, Value: 0.2
        • Name: PARAM_POLICY_OVERRIDE, Value: 0.1

      • Under Output Field Names, click 0 of 0 Output Variables are set and click the plus sign (+) to add a new field.

      • Add POST_SCORING_DECISION_OUT for the Name.
      • Click Schemas not set and add the following two fields:
        • Final_Credit_Approval_Decision (string)
        • Policy_Followed (string)

      • Click SAVE.

    6. Configure the Output block.
      • Click the Output processing step to open its configuration form.
      • Under Output Record, click the plus sign (+) next to 0 of 0 Fields are set and add these 5 fields:
        • Name: Predicted_CreditStanding_Bad
        • Name: Probability_0
        • Name: Probability_1
        • Name: Final_Credit_Approval_Decision
        • Name: Policy_Followed

      • To populate the values for these fields, click Populate unset values with entries and select “SCORE” of step “Score”.

      • Next, click Populate unset values with entries again and select “POST_SCORING_DECISION_OUT” of step “Decisions”.

      • Click SAVE.

  • The Score, Compute Metrics, Publish Metrics template can be configured by following these steps:
    1. The Audit project works well for this template. Make sure to select this project while creating the scoring flow.
    2. Configure the Input block.
      • Click Input block to open the Input processing step’s configuration form.
      • Click the drop down menu on the Load Schema From > Data Channel > and select the schema that matches with the model.

      • Click SAVE.

    3. Configure the Score block.
      • Click Score block to open the Score processing step’s configuration form.
      • Under Model section, select audit.pmml model from the drop down list.
      • Under Scoring Request Fields section, click the Add Input Schema of Selected Model option.
      • Next, click the drop down menu for Populate unset values with entries and select “INPUT_DATA” of step “Input”.

      • Click SAVE.

    4. Configure the Create Matrix block.
      • Click Create Matrix block to open the configuration form.
      • Under Matrix Fields, click 0 of 0 Fields are set and add the following 2 fields:
        • Name: ObservedValue; Column Value: INPUT_DATA.TARGET_Adjusted
        • Name: PredictedValue; Column Value: SCORE.scoreOutput.TARGET_Adjusted

      • Click SAVE.

    5. Configure the Compute Matrix block.
      • Click Compute Matrix block to open the configuration form.
      • For Input Variable Field Name, select MATRIX (from “Create Matrix”) from the drop down list.
      • For Metrics to Compute section, select Misclassification rate, Chi square, G square, and F1 score from the drop down list.
      • Select Observed Column Name as ObservedValue
      • Select Predicted Column Name as PredictedValue

      • Click SAVE.

    6. Configure the Publish Matrix block.
      • Click Publish Matrix block to open the configuration form.
      • For the Metrics to Publish, select Misclassification rate, Chi square, G square, and F1 score.

      • Click SAVE.

    7. Configure the Output block.
      • Click Output block to open the Output processing step’s configuration form.
      • Click 0 of 0 Fields are set under Output Record section and add the following 4 fields:
        • Name: Chi_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_chi_square
        • Name: F1_Score, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_f1_score
        • Name: G_Square, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_g_square
        • Name: Misclassification_Rate, Sink Field Value: COMPUTED_METRICS.modelops_model_quality_classification_misclassification_rate

      • Click SAVE.

Creating a Scoring Pipeline

  1. On the main screen, click the drop down menu on Artifacts tab and select Scoring Pipeline.
  2. Click ADD A SCORING PIPELINE to create a new scoring pipeline. You can also click Add one to add a new scoring pipeline if there are none present.
  3. Select the project name from the list.
  4. Add pipeline name. The extension is added automatically.

  5. Click FINISH.

Authoring a Scoring Pipeline

  1. On the main screen, click the drop down menu on Artifacts tab and select Scoring Pipeline.
  2. Select the scoring pipeline.
  3. Add Data Source, Scoring Flow, and Data Sink.
  4. A data source provides input data to a scoring flow for processing.
  5. A data sink collects output data from a scoring flow.

  6. Click SAVE to save the changes.

Promoting a Scoring Pipeline

  1. On the main screen, click the drop down menu on Artifacts tab and select Scoring Pipeline.
  2. Select the pipeline by clicking on the check box next to the project name.
  3. Click Approve present on the page.
  4. Select the Development environment and click APPLY.

Pipeline Deployment

  1. On the main screen, click the drop down menu on Deployments tab and select Scoring Pipeline.
  2. Click the DEPLOY A SCORING PIPELINE option on the main menu.
  3. Enter the deployment name.
  4. Add a description.
  5. Select a scoring pipeline from the drop-down list.
  6. Select a scoring environment from the drop-down list. For example:
    • Development
    • Testing
    • Production
  7. Schedule the job.
  8. Click DEPLOY.

Note: Scoring Flow deployment page allows you to select the Run forever option. However, after a single file is processed by a File Data Source, the flow would be stopped and marked Complete and no longer accept new input.