Investigation of Kubernetes petset and PVC

  • Update at 2016-08-01: Re-test petset redeployment, previous PVC could be re-claim by re-deployed petset in default namespace.

Abstract

petset introduced from Kubernetes 1.3 provides strong capability to scale out/in an persistent service thru persistent claim abstraction and namespace management. petset is really suitable for an persistent service deployment, e.g. database deployment.

In this article, I will address the extensible usage for petset with both shared storage and iosolated storage. In addition, and I exposed a couple of limitation and solution for database deployment using petset. (alpha. Kubernetes v1.3.2)

    1. namespace of PersistentVolumeClaim seems not supported well in v1.3.2. When user deploy petset to different namespace than default, PersistentVolumeClaim is not bound failure led to failure of overall deployment.
    1. Zombie PV resource

Deploy Petset with shared storage

Petset deployment leverage volumeClaimTemplates to claim storage dynamically for scale out/in. I am trying to attach shared storage for each pod of petset for specific needs when databases deployment require shared meta data across cluster.

Advantage: Scale out / scale in

Pactch an petset to scale out an mysql deployment kubectl patch petset mysql -p '{"spec": {"replicas": <number>}}', newly created mysql become available in 35s.

kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
mysql-0   1/1       Running   0          1h
mysql-1   1/1       Running   0          1h
mysql-2   1/1       Running   0          19m
mysql-3   1/1       Running   0          18m
root@# kubectl patch petset mysql -p '{"spec": {"replicas": 2}}'
"mysql" patched
root@# kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
mysql-0   1/1       Running   0          1h
mysql-1   1/1       Running   0          1h
root@# kubectl patch petset mysql -p '{"spec": {"replicas": 4}}'
"mysql" patched
kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
mysql-0   1/1       Running   0          2h
mysql-1   1/1       Running   0          2h
mysql-2   1/1       Running   0          38s
mysql-3   0/1       Running   0          7s

Advantage: Database Cluster Failover with petset

  • Recover an database cluster using existing data volume after petset is deleted from k8s. Once an petset was re-created (delete/create), all PVCs were kept. previously claimed PVC could be re-claimed by re-created petset. So petset give much flexibility, at mean time, users lost the full control on recovery operation. e.g. Uisng pod/rc, database could be recovered well using existing data PVC, PV or volume easily.

  • So petset give much flexibility, at mean time, users could recovery petset with existing volume claimed.

  • Petset re-claim policy: -

1
2
3
4
5
6
7
8
9
10
11
12
// GetClaimsForPet returns the pvcs for the given pet.
func (v *VolumeIdentityMapper) GetClaimsForPet(pet *api.Pod) []api.PersistentVolumeClaim {
// Strip out the "-(index)" from the pet name and use it to generate
// claim names.
id := strings.Split(pet.Name, "-")
petID := id[len(id)-1]
pvcs := []api.PersistentVolumeClaim{}
for _, pvc := range v.GetClaims(petID) {
pvcs = append(pvcs, pvc)
}
return pvcs
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// GetClaims returns the volume claims associated with the given id.
// The claims belong to the petset. The id should be unique within a petset.
func (v *VolumeIdentityMapper) GetClaims(id string) map[string]api.PersistentVolumeClaim {
petClaims := map[string]api.PersistentVolumeClaim{}
for _, pvc := range v.ps.Spec.VolumeClaimTemplates {
claim := pvc
// TODO: Name length checking in validation.
claim.Name = fmt.Sprintf("%v-%v-%v", claim.Name, v.ps.Name, id)
claim.Namespace = v.ps.Namespace
claim.Labels = v.ps.Spec.Selector.MatchLabels
// TODO: We're assuming that the claim template has a volume QoS key, eg:
// volume.alpha.kubernetes.io/storage-class: anything
petClaims[pvc.Name] = claim
}
return petClaims
}

Limitation 1: namespace of PersistentVolumeClaim

namespace of PersistentVolumeClaim seems not supported well in v1.3.2.

Controller failed to bind an claim even PVC were claimed well. Looks like an bug on petset impl today.

 6m            6m              1       {default-scheduler }                                    Warning         FailedScheduling        [PersistentVolumeClaim is not bound: "datadir-mysql-0", PersistentVolumeClaim is not bound: "datadir-mysql-0", PersistentVolumeClaim is not bound: "datadir-mysql-0", PersistentVolumeClaim is not bound: "datadir-mysql-0"]

 kubectl get pvc --namespace=petset-sharedfs
NAME                STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
datadir-mysql-0     Bound     spv21     0                        33m
datadir-mysql-1     Bound     spv22     0                        33m
datadir-mysql-2     Bound     spv23     0                        33m
shared-head-claim   Bound     spv20     0                        2h

Solution:

  1. Followup issue fix and watch source change.

Limitation 2: Zombie PV resource

Once user delete PVC of petset deployment. PV become ‘Released’ status. These PV resource become zombie since they are not allocated again by k8s controller.

1
2
3
4
spv1 1Gi RWX Released default/datadir-mysql-1 18h
spv12 1Gi RWX Released petset-sharedfs/shared-head-claim 1h
spv13 1Gi RWX Bound default/datadir-mysql-0 4m
spv14 1Gi RWX Released petset-sharedfs/shared-head-claim 4m
  • Pending claim when PV become released status
1
2
3
4
5
6
7
8
9
10
11
12
13
root@# kubectl get pvc --namespace=test
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
shared-head-claim Pending 45s
root# kubectl describe pvc --namespace=test
Name: shared-head-claim
Namespace: test
Status: Pending
Volume:
Labels: <none>
Capacity:
Access Modes:
No events.
  • Controller failed to re-claim PV in released status

E0728 02:18:02.307295 10626 factory.go:517] Error scheduling default mysql-0: [PersistentVolumeClaim is not bound: “datadir-mysql-0”, PersistentVolumeClaim is not bound: “datadir-mysql-0”, PersistentVolumeClaim is not bound: “datadir-mysql-0”, PersistentVolumeClaim is not bound: “datadir-mysql-0”]; retrying

Solution to recycle/recover zombie PV resource

Set PV status from ‘Released’ to ‘Available’ thru etcdctl. Then we could recover Pod of petset using exiting data.

  • Retrieve meta data of PV in released status
1
2
root@# etcdctl get /registry/persistentvolumes/spv14
{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"spv14","selfLink":"/api/v1/persistentvolumes/spv14","uid":"aca33faa-5493-11e6-a022-0cc47a662568","creationTimestamp":"2016-07-28T07:19:51Z","annotations":{"pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"capacity":{"storage":"1Gi"},"nfs":{"server":"169.55.11.79","path":"/gpfs/fs01/shared/prod/spv14"},"accessModes":["ReadWriteMany"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"petset-sharedfs","name":"shared-head-claim","uid":"0ca05fed-5494-11e6-a022-0cc47a662568","apiVersion":"v1","resourceVersion":"46240"},"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Released"}}
  • Update PV meta data thru removing claimRef

"claimRef":{"kind":"PersistentVolumeClaim","namespace":"petset-sharedfs","name":"shared-head-claim","uid":"0ca05fed-5494-11e6-a022-0cc47a662568","apiVersion":"v1","resourceVersion":"46240"}"

1
root@# etcdctl set /registry/persistentvolumes/spv14 '{"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"spv14","selfLink":"/api/v1/persistentvolumes/spv14","uid":"aca33faa-5493-11e6-a022-0cc47a662568","creationTimestamp":"2016-07-28T07:19:51Z","annotations":{"pv.kubernetes.io/bound-by-controller":"yes"}},"spec":{"capacity":{"storage":"1Gi"},"nfs":{"server":"169.55.11.79","path":"/gpfs/fs01/shared/prod/spv14"},"accessModes":["ReadWriteMany"],"persistentVolumeReclaimPolicy":"Retain"},"status":{"phase":"Available"}}'
  • Verify PV become available.

root@:/nfs/petset# etcdctl get /registry/persistentvolumes/spv14

kubectl get pv

spv14     1Gi        RWX           Available                                                 1h