External Storage & Multi-Attach Errors
This page will contain all the detailed guidance, examples, and migration steps for handling storage issues when running multiple MMS pods across nodes.
External Storage & Multi-Attach Troubleshooting
When running Model Management Server (artifact-management) in a Kubernetes cluster with autoscaling enabled, pods may be rescheduled or rebalanced across different nodes (for example, after a crash, node drain, or scaling event). If the underlying storage class does not support multi-node access (e.g., ReadWriteMany
), you may encounter Multi-Attach
errors when pods are scheduled on different nodes. This is common with default disk-based storage classes in AKS and EKS.
When to use external storage:
If you see Multi-Attach
errors during autoscaling, you must use a network file system that supports multi-node access (such as Azure Files or AWS EFS).
General Steps:
-
Provision External Storage
- For AKS: Create an Azure File Share.
- For EKS: Create an EFS file system.
-
Create Kubernetes Secret (if required)
- Store storage account credentials as a Kubernetes Secret.
For example, to create a secret for Azure Files:
kubectl create secret generic azure-files-secret \ --from-literal=azurestorageaccountname=<STORAGE_ACCOUNT_NAME> \ --from-literal=azurestorageaccountkey=<STORAGE_KEY> \ --namespace <NAMESPACE>
-
Create PersistentVolume (PV) and PersistentVolumeClaim (PVC)
- Define a static PV that points to your external storage.
- Create a PVC that uses the PV or a StorageClass that supports
ReadWriteMany
. - Note: You need to create PV and PVC for both
artifact-management
anddeploy-storage
volumes to ensure all MMS data is accessible across nodes.
For example, to define a static PersistentVolume and PersistentVolumeClaim for Azure Files:
# artifact-management-pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: artifact-management-pv spec: capacity: storage: 20Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: azurefile-csi-static mountOptions: - dir_mode=0777 - file_mode=0777 - uid=1000 - gid=1000 - mfsymlinks - cache=strict - nosharesock - nobrl csi: driver: file.csi.azure.com volumeHandle: <VOLUME_HANDLE> # <-- make sure this volumeid is unique for every identical share in the cluster readOnly: false volumeAttributes: secretName: azure-files-secret secretNamespace: <NAMESPACE> shareName: <SHARE_NAME> --- # artifact-management-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: artifact-management-rwx namespace: <NAMESPACE> spec: accessModes: - ReadWriteMany storageClassName: azurefile-csi-static resources: requests: storage: 20Gi --- # deploy-storage-pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: deploy-storage-pv spec: capacity: storage: 20Gi accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain storageClassName: azurefile-csi-static mountOptions: - dir_mode=0777 - file_mode=0777 - uid=1000 - gid=1000 - mfsymlinks - cache=strict - nosharesock - nobrl csi: driver: file.csi.azure.com volumeHandle: <VOLUME_HANDLE> # <-- make sure this volumeid is unique for every identical share in the cluster readOnly: false volumeAttributes: secretName: azure-files-secret secretNamespace: <NAMESPACE> shareName: <SHARE_NAME> --- # deploy-storage-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: deploy-storage-rwx namespace: <NAMESPACE> spec: accessModes: - ReadWriteMany storageClassName: azurefile-csi-static resources: requests: storage: 20Gi
-
Update MMS Deployment
- Ensure the MMS (artifact-management) deployment uses the new PVCs for its data volumes.
- In your deployment spec, update the
volumes
section to reference the new PVCs:
# Example snippet from your deployment spec: spec: template: spec: volumes: - name: artifact-management persistentVolumeClaim: claimName: artifact-management-rwx # <-- new PVC name - name: deploy-storage persistentVolumeClaim: claimName: deploy-storage-rwx # <-- new PVC name
-
After making these changes, restart the deployment to pick up the new storage.
-
Alternatively, you can patch the deployment using kubectl:
kubectl -n streaming-web patch deployment artifact-management \ --type='strategic' \ -p=' spec: template: spec: volumes: - name: artifact-management persistentVolumeClaim: claimName: artifact-management-rwx - name: deploy-storage persistentVolumeClaim: claimName: deploy-storage '
- This ensures both
artifact-management
anddeploy-storage
volumes use the new external PVCs.
-
Migrate Existing Data
- To migrate data from the old PVC to the new external storage, follow these steps:
Migration of data:
-
Scale down the deployment to 0:
kubectl -n <NAMESPACE> scale deployment artifact-management --replicas=0
-
Create a temporary Pod for data copy:
# artifact-migration.yaml apiVersion: v1 kind: Pod metadata: name: artifact-move-data namespace: <NAMESPACE> spec: restartPolicy: Never containers: - name: mover image: busybox command: ["/bin/sh", "-c", "cp -av /old-data/. /var/opt/spotfire/streaming-web/data/"] volumeMounts: - name: new mountPath: /var/opt/spotfire/streaming-web/data - name: old mountPath: /old-data volumes: - name: new persistentVolumeClaim: claimName: artifact-management-rwx # new PVC - name: old persistentVolumeClaim: claimName: artifact-management # old PVC
Apply and monitor the pod:
kubectl apply -f artifact-migration.yaml kubectl -n <NAMESPACE> logs artifact-move-data # Wait until the pod status is 'Succeeded' before deleting it. kubectl -n <NAMESPACE> delete pod artifact-move-data
# deploy-storage-migration.yaml apiVersion: v1 kind: Pod metadata: name: artifact-move-data namespace: streaming-web spec: restartPolicy: Never containers: - name: mover image: busybox command: ["/bin/sh", "-c", "cp -av /old-data/. /var/opt/spotfire/streaming-web/deploy-storage/"] volumeMounts: - name: new mountPath: /var/opt/spotfire/streaming-web/deploy-storage - name: old mountPath: /old-data volumes: - name: new persistentVolumeClaim: claimName: deploy-storage-rwx - name: old persistentVolumeClaim: claimName: deploy-storage
Apply and monitor the pod:
kubectl apply -f deploy-storage-migration.yaml kubectl -n <NAMESPACE> logs artifact-move-data # Wait until the pod status is 'Succeeded' before deleting it. kubectl -n <NAMESPACE> delete pod artifact-move-data
Note:
Ensure that the migration/copy pods for bothartifact-management
anddeploy-storage
complete successfully (pod status isSucceeded
) before deleting them. Deleting the pod before completion may result in incomplete data transfer. -
Scale up the deployment:
kubectl -n <NAMESPACE> scale deployment artifact-management --replicas=2
This ensures your data is safely migrated to the new external storage before restarting the application.
Note:
This approach is only required if you encounter multi-attach errors due to autoscaling and pod distribution across nodes. For single-node or non-autoscaled clusters, the default storage class may be sufficient.