Published on

Kubernetes Pod Keeps Restarting for no Apparent Reason

Authors

Today I was busy deploying changes to an environment and I noticed that a pod had been restarted many times, was in a CrashLoopBackOff state and was currently not ready 0/1. I looked at the previous logs of the pod to try debug this further:

kubectl logs your-app-873240921-b87cg -p | less

I could not see any errors in the logs but I did see that the app logged out that it was shutting down. All other pods were fine so this did not seem to be a cluster wide problem. To dig further I described the pod:

kubectl describe pod your-app-873240921-b87cg

This output something similar to the below (most of the output has been left out for brevity, the important bits are at the bottom):

Name:           your-app-873240921-b87cg
Namespace:      your-namespace
...
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
...
Events:
  Type     Reason                 Age               From                      Message
  ----     ------                 ----              ----                      -------
  Normal   Scheduled              11m               default-scheduler         Successfully assigned your-app-873240921-b87cg to your-company2
  Normal   SuccessfulMountVolume  11m               kubelet, your-company2  MountVolume.SetUp succeeded for volume "default-token-nkpqp"
  Normal   Pulling                11m               kubelet, your-company2  pulling image "your.company/your.company/your-app:b512288"
  Normal   Pulled                 10m               kubelet, your-company2  Successfully pulled image "your.company/your.company/your-app:b512288"
  Normal   Created                8m (x2 over 10m)  kubelet, your-company2  Created container
  Normal   Started                8m (x2 over 10m)  kubelet, your-company2  Started container
  Warning  Unhealthy              7m (x5 over 9m)   kubelet, your-company2  Liveness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
  Normal   Killing                7m (x2 over 8m)   kubelet, your-company2  Killing container with id docker://your-app-app:Container failed liveness probe.. Container will be killed and recreated.
  Normal   Pulled                 7m (x2 over 8m)   kubelet, your-company2  Container image "your.company/your.company/your-app:b512288" already present on machine
  Warning  Unhealthy              5m (x14 over 9m)  kubelet, your-company2  Readiness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused
  Warning  FailedSync             1m (x7 over 2m)   kubelet, your-company2  Error syncing pod

Looking at the events section we can see clearly that the Liveness probe failed which resulted in this app being killed as Kubernetes tried to recover it:

Liveness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused

There is also clearly some other issue with the app as the readiness probe also fails which results in the app continuously being restarted:

Readiness probe failed: Get http://1.2.3.4:8000/management/health: dial tcp 1.2.3.4:8000: getsockopt: connection refused

This lead us in the right direction and resulted in us looking at what was causing the app's probes to fail. In our case we increased the pod's hardware and upped the Liveness probes time a bit which did the trick. The app appeared to be taking longer to startup as it did not have enough CPU.