Kubernetes: Distributing Pods of a Deployment across nodes
May 17, 2022 9:14 amSometimes you need to ensure that the pods of a deployment are not deployed to the same node. To achieve this, you can use the pod anti-affinity and configure it so that pods do not get deployed to pods of the same deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: game
name: game
namespace: arcade
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: game
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: game
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- game
topologyKey: kubernetes.io/hostname
containers:
- image: quay.io/mdewald/s3e
name: s3e
This pod anti-affinity definition will not deploy any 2 pods of the deployment onto the same node.
During the roll-out, additional pods are created before old pods are removed. If you have the same number of nodes as replicas, that means the roll-out won’t happen: No node is available to suffice the criteria to deploy an additional pod. So in the best case, you should have more nodes available than the deployment requires replicas.
You can work around this problem by changing from requiredDuringSchedulingIgnoredDuringExecution
to preferredDuringScheduilingIgnoredDuringExecution
:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- game
topologyKey: kubernetes.io/hostname
weight: 100
However, this would allow some of the pods of the deployment to land on the same node during a roll-out of the deployment. After the roll-out, they will be distributed one pod per node again.
If you absolutely don’t want to ever have 2 pods of the same deployment run on the same node but don’t have more nodes than replicas, it can be an option for you to migrate from a Deployment to StatefulSet, which will first terminate each pod before creating a new one:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: game
name: game
namespace: arcade
spec:
replicas: 2
selector:
matchLabels:
app: game
serviceName: ""
template:
metadata:
creationTimestamp: null
labels:
app: game
spec:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- game
topologyKey: kubernetes.io/hostname
containers:
- image: quay.io/mdewald/s3e
name: s3e
This will ensure no pod of the StatefulSet is scheduled to the same node. If you have the same number of nodes as replicas in the StatefulSet the rollout will do the following: One by one, the pods will be removed and the replacement will be scheduled to the same node before the next pod is removed.