Linux Systemd Failed Unit Recovery Checklist
Failed systemd units show that a service, mount, timer or socket did not start correctly. A good system admin does not ignore failed units because they often explain hidden production problems.
Core principle
A failed unit is a structured error state. systemd can tell you what failed, when it failed, what command exited and what logs were produced.
Checklist
- List failed units.
- Identify whether the unit is service, mount, socket or timer.
- Check unit status.
- Read journal logs for the unit.
- Check ExecStart or command failure.
- Check config files used by the unit.
- Check dependency failures.
- Fix the confirmed cause.
- Reset failed state after fixing.
- Verify the unit stays active.
Reusable lesson
This applies to Nginx, PHP-FPM, MySQL, Docker, custom workers, mount points, timers, backup jobs and monitoring agents.
When to Use This Checklist
Use this checklist when systemctl --failed shows errors or a Linux service does not start correctly.
Required Tools
SSH access, systemctl, journalctl, unit file, service config, dependency information
Before You Start
Do not clear failed state before reading logs. The failure state is useful evidence.
Structured Checklist Steps
- List failed units.
- Classify unit type.
- Check status.
- Read logs.
- Check ExecStart.
- Check config.
- Check dependencies.
- Fix confirmed cause.
- Reset failed state.
- Verify active state.
Verification Steps
- Failed unit cause is known.
- Config or dependency issue is corrected.
- Failed state is reset after fix.
- Unit becomes active.
- No repeated failure appears.
Rollback Plan
If a unit fix causes new failures, restore the previous unit or config file, run daemon-reload and inspect logs again.
Common Mistakes
- Ignoring systemctl --failed.
- Resetting failed state too early.
- Not reading unit-specific logs.
- Forgetting daemon-reload after unit changes.
- Fixing symptoms but not dependencies.
Related Commands
systemctl --failed
systemctl status unit_name
journalctl -u unit_name --since "30 minutes ago"
systemctl cat unit_name
sudo systemctl daemon-reload
sudo systemctl reset-failed unit_name
sudo systemctl restart unit_name