• Home
  • Cloud VPS
    • Hong Kong VPS
    • US VPS
  • Dedicated Servers
    • Hong Kong Servers
    • US Servers
    • Singapore Servers
    • Japan Servers
  • Company
    • Contact Us
    • Blog
logo logo
  • Home
  • Cloud VPS
    • Hong Kong VPS
    • US VPS
  • Dedicated Servers
    • Hong Kong Servers
    • US Servers
    • Singapore Servers
    • Japan Servers
  • Company
    • Contact Us
    • Blog
ENEN
  • 简体简体
  • 繁體繁體
Client Area

Debian Server Troubleshooting Checklist

March 1, 2026

Troubleshooting a Debian server (Debian 12 “bookworm” or Debian 13 “trixie” in 2026) requires a structured, layered approach that starts with observation (what symptom? when did it start? what changed?), moves to global system state (resource saturation, logs), then drills down to specific subsystems (services, network, storage, kernel). The goal is to identify the contention domain — CPU scheduler, memory reclaim, I/O queues, softirq overload, or application-level blocking — rather than guessing fixes.

This checklist is ordered by speed-to-insight and common failure patterns on production Debian servers. Run commands as root or with sudo. Install helpful tools first if missing: apt update && apt install htop atop sysstat netdata ncdu lsof fail2ban.

Phase 1: Immediate Safety & Global Snapshot (1–3 minutes)

  • Confirm the symptom clearly SSH lag? Commands hang? Web pages timeout? High load but low CPU? Services down? Reboot loop? Note exact error messages, timestamps, recent changes (updates, config edits, traffic spike).
  • Check uptime & loaduptime — load average vs cores (load > 2–3× cores sustained = pressure) High load + low CPU = waiting (I/O, memory reclaim, locks).
  • Resource overviewhtop or top (press 1 for per-core, look for ‘D’ state = uninterruptible I/O wait) free -h + vmstat 1 5 (high si/so = swap thrashing; high wa = I/O wait) df -hT + df -i (full filesystem or inode exhaustion) iostat -xmdz 1 5 (%util near 100% + high await = storage bottleneck)
  • Quick log scanjournalctl -b -p err..emerg (errors since boot) journalctl -xe (recent context) dmesg | tail -50 (kernel issues, OOM killer)

Phase 2: Common High-Impact Checks (Most Frequent Culprits)

  • Disk / Filesystem Fullncdu / (interactive navigator — spot large dirs fast) journalctl –disk-usage → vacuum if huge: journalctl –vacuum-time=2weekslsof +L1 | grep deleted (deleted-but-open files holding space) Fix: apt clean, truncate logs, restart offending process.
  • Memory Pressure / Reclaim Stallscat /proc/pressure/memory (non-zero “some” or “full” = real stalls) sysctl vm.swappiness (60+ on low-RAM server → lower to 10) Enable zram if RAM < 8–16 GB. High kswapd0 CPU or direct reclaim → application memory leak or misbehaving service.
  • High I/O Wait / Storage Latencyiotop (which process writes/reads most) atop (press ‘d’ for disk activity per process) Journal flood, verbose logging, database WAL, Docker overlay2 growth. Fix: limit journal, prune containers, tune DB checkpoints.
  • Network / Connectivity Issuesip addr, ip route (interface up? default gateway?) ping 8.8.8.8 then ping google.com (DNS failure?) ss -s (huge TIME_WAIT or socket backlog) sar -n DEV 1 5 (drops, errors) Firewall blocks, bad DNS, MTU mismatch, high softirq.
  • Service / systemd Failuressystemctl –failed (list failed units) systemctl status <service> (e.g., nginx, postgresql) journalctl -u <service> -b -xe Dependency loops, config errors, port conflicts.
  • SSH / Access Problemsjournalctl -u ssh -b | grep -i fail (brute-force, key issues) fail2ban bans? Wrong keys? Port changed? Firewall rule?
  • Boot / Early Hangsystemd-analyze blame + critical-chain (slow units) Remove quiet from GRUB → verbose boot messages Initramfs drop to shell → wrong UUID, LUKS timeout, missing module.

Phase 3: Deeper / Workload-Specific Checks

  • Web / Application Stack Nginx/Apache slow logs, PHP-FPM pool exhaustion, bad bots → fail2ban on access logs Slow queries → enable DB slow log, EXPLAIN ANALYZE
  • Database PostgreSQL/MySQL → connection limits, autovacuum, WAL size, query planner stats
  • Containers / Virtualization Docker/Podman → docker system df, prune unused images/volumes High steal time → VM overcommit
  • Kernel / Hardwareperf top (kernel hotspots) Thermal throttling → sensors Failing disk → SMART: smartctl -a /dev/sdX

Phase 4: Prevention & Monitoring Setup

  • Install Netdata: real-time dashboard with PSI, I/O, softirq visuals
  • Prometheus + node_exporter + Grafana (long-term trends, alerts on >85% disk, PSI stalls >100 ms/s)
  • Schedule weekly: apt update && apt upgrade, journal vacuum, logrotate force
  • Backup configs (/etc), data, and test restores

This checklist covers ~90% of real-world Debian server incidents: I/O saturation, memory pressure, log floods, bad traffic, misconfigured services. Always change one thing at a time, measure before/after (e.g., with stress-ng or production load), and document findings.

Leave a Reply

You must be logged in to post a comment.

Recent Posts

  • What Is CentOS? A Complete Beginner’s Guide to CentOS Linux in 2026
  • Debian Server Troubleshooting Checklist
  • How to Configure a Firewall on a Debian Server: Theory and Best Practices
  • Debian Boot Process Explained
  • Secure SSH Configuration on Debian Server: Theory and Best Practices

Recent Comments

No comments to show.

Knowledge Base

Access detailed guides, tutorials, and resources.

Live Chat

Get instant help 24/7 from our support team.

Send Ticket

Our team typically responds within 10 minutes.

logo
Alipay Cc-paypal Cc-stripe Cc-visa Cc-mastercard Bitcoin
Cloud VPS
  • Hong Kong VPS
  • US VPS
Dedicated Servers
  • Hong Kong Servers
  • US Servers
  • Singapore Servers
  • Japan Servers
More
  • Contact Us
  • Blog
  • Legal
© 2026 Server.HK | Hosting Limited, Hong Kong | Company Registration No. 77008912
Telegram
Telegram @ServerHKBot