Auto-scaling
Scoring flows and services running in a scoring pipeline are configured to elastically scale using the Horizontal Pod Autoscaler (HPA). By default elastic scaling is configured as follows:
- both start with a single Pod
- scale to a maximum of two Pods
- scale when CPU utilization exceeds 50%
The HPA configurations are automatically created in the same namespace in which the scoring pipeline or service is deployed. The HPA configurations use this naming convention:
- scoringflow-* - scoring flow HPA configuration
- scoringservice-* - scoring service HPA configuration
Details on the current status of autoscaling can be displayed using this command:
// // Display current autoscaling status - replace <namespace> with actual namespace // kubectl get hpa --namespace <namespace>