What Is A Health Check In DevOps?
A health check is a small test that tells whether a service is alive and ready to handle traffic. It can be used by load balancers, containers, Kubernetes, monitoring tools, and deployment pipelines.
In DevOps, health checks are important because deployment success is not the same as service health. A process can be running but still unable to serve users correctly.
The deeper idea is that systems need automatic signals to decide whether to send traffic, restart, alert, or roll back.
DevOps Production Playbook
Use this section to understand where the concept fits in a real software delivery system: pipeline stage, production risk, detection signals, rollback, security, and big-company standard.
Teams need a simple and reliable way to know whether a service is alive, ready, and safe to receive traffic.
A health check is a contract between the service and the platform. The service reports whether it can safely serve requests.
After deployment, the platform calls the health endpoint. If the endpoint fails, traffic is not sent to the new version and the deployment is stopped.
HTTP health endpoint, readiness check, liveness check, load balancer check, Docker healthcheck, Kubernetes probes, uptime monitor.
curl -f https://example.com/health; docker inspect container_id; kubectl describe pod app; kubectl get pods; systemctl status app
Health model: process running is not enough -> readiness means can serve traffic -> liveness means should stay alive
Health check always returns OK, check is too shallow, check depends on slow external service, no readiness delay, wrong timeout.
Health endpoint fails, pod restarts, load balancer marks target unhealthy, deployment rollout pauses, uptime monitor alerts.
Good health checks reduce failed deployment damage and recovery time by detecting unhealthy versions before users are heavily affected.
Stop traffic to unhealthy version, roll back deployment, inspect health endpoint logs, fix dependency or config issue, retest readiness.
Do not expose sensitive internal data in health output. Protect detailed diagnostics. Keep public health checks minimal.
A big company expects liveness checks, readiness checks, deployment gates, load balancer integration, and alerting connected to health status.
Create a simple /health endpoint that returns OK. Then break a required config variable and make the health check fail correctly.
Why is a running process not enough to prove a service is healthy? What is the difference between readiness and liveness?
Making health checks fake, ignoring dependency failure, returning too much sensitive information, using only manual browser checks.
Machines need trustworthy signals. This principle applies to web services, workers, databases, pipelines, AI agents, and automation systems.