Auto-scaling

Scoring flows and services running in a scoring pipeline are configured to elastically scale using the Horizontal Pod Autoscaler (HPA). By default elastic scaling is configured as follows:

  • both start with a single Pod
  • scale to a maximum of two Pods
  • scale when CPU utilization exceeds 50%

The HPA configurations are automatically created in the same namespace in which the scoring pipeline or service is deployed. The HPA configurations use this naming convention:

  • scoringflow-* - scoring flow HPA configuration
  • scoringservice-* - scoring service HPA configuration

Details on the current status of autoscaling can be displayed using this command:

//
//  Display current autoscaling status - replace <namespace> with actual namespace
//
kubectl get hpa --namespace <namespace>