The Complete Server Monitoring Guide: Know When Things Break

Quick Answer: For quick uptime monitoring, deploy Uptime Kuma: docker run -d -p 3001:3001 -v uptime-data:/app/data louislam/uptime-kuma. For full metrics, deploy Prometheus + Grafana + node-exporter. For logs, use journalctl -f or set up Loki.

Need a VPS? Vultr (free credit), DigitalOcean ($200 free credit), or RackNerd (cheap annual deals).

Your server is running. But how do you know it's healthy? How do you find out when something breaks — before your users do? Monitoring answers these questions.

This guide covers everything from simple uptime checks to full observability stacks.

What to Monitor

Category	What to Watch	Why
Uptime	Is the service responding?	Know immediately when something goes down
CPU	Usage percentage	High CPU = performance issues or crypto miners
Memory	Used vs available RAM	Memory leaks crash services
Disk	Space remaining	Full disk = everything breaks
Network	Bandwidth, latency, errors	Detect DDoS, bandwidth limits
Services	Nginx, Docker, databases running?	Restart failed services automatically
Logs	Errors, warnings, unusual patterns	Find root cause of problems
SSL	Certificate expiry	Don't let HTTPS break
Response time	How fast are your endpoints?	Catch slowdowns before users notice

Part 1: Quick Health Checks (No Tools Needed)

You can monitor the basics with commands you already have:

One-Liner Health Check

#!/bin/bash
echo "=== Server Health ==="
echo "Uptime: $(uptime -p)"
echo "Load: $(cat /proc/loadavg | awk '{print $1, $2, $3}')"
echo "CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')% used"
echo "RAM: $(free -m | awk 'NR==2{printf "%dMB/%dMB (%.1f%%)", $3, $2, $3/$2*100}')"
echo "Disk: $(df -h / | awk 'NR==2{print $3 "/" $2 " (" $5 ")"}')"
echo "Docker: $(docker ps -q 2>/dev/null | wc -l) containers running"
echo "Failed services: $(systemctl --failed --no-legend | wc -l)"
echo "Nginx: $(systemctl is-active nginx)"
echo "SSH: $(systemctl is-active ssh)"

Watch Resources Live

# CPU + memory + processes
htop

# Disk I/O
iostat -x 1

# Network bandwidth
iftop
# Or: vnstat -l

# Docker resource usage
docker stats

Cron-Based Alerts

Simple monitoring without any tools — cron checks and emails/Telegram alerts:

#!/bin/bash
# /opt/monitor.sh — run every 5 minutes via cron

# Check disk
DISK_USAGE=$(df / | awk 'NR==2{print $5}' | tr -d '%')
if [ "$DISK_USAGE" -gt 90 ]; then
    echo "ALERT: Disk usage at ${DISK_USAGE}%"
    # Send alert (Telegram, email, webhook)
fi

# Check memory
MEM_USAGE=$(free | awk 'NR==2{printf "%.0f", $3/$2*100}')
if [ "$MEM_USAGE" -gt 90 ]; then
    echo "ALERT: Memory usage at ${MEM_USAGE}%"
fi

# Check if nginx is running
if ! systemctl is-active --quiet nginx; then
    echo "ALERT: Nginx is down! Attempting restart..."
    systemctl restart nginx
fi

# Check if a URL responds
HTTP_CODE=$(curl -so /dev/null -w '%{http_code}' --max-time 10 https://yourdomain.com)
if [ "$HTTP_CODE" != "200" ]; then
    echo "ALERT: Website returned $HTTP_CODE"
fi

# Run every 5 minutes
crontab -e
# Add: */5 * * * * /opt/monitor.sh >> /var/log/monitor.log 2>&1

Part 2: Uptime Monitoring (Uptime Kuma)

Uptime Kuma is the best open-source uptime monitor. Beautiful UI, easy setup, supports HTTP, TCP, ping, DNS, Docker, and more.

Deploy with Docker

# compose.yml
services:
  uptime-kuma:
    image: louislam/uptime-kuma
    restart: unless-stopped
    ports:
      - "3001:3001"
    volumes:
      - uptime-data:/app/data

volumes:
  uptime-data:

docker compose up -d
# Visit http://YOUR_SERVER:3001

What to Monitor

Add these monitors:

Type	Target	Interval
HTTP	`https://yourdomain.com`	60s
HTTP	`https://yourdomain.com/api/health`	30s
TCP	`localhost:5432` (PostgreSQL)	60s
TCP	`localhost:6379` (Redis)	60s
Ping	Your other servers	60s
Docker	Container names	60s
SSL	Your domains (checks expiry)	86400s (daily)

Alerting

Uptime Kuma supports 90+ notification channels:

Telegram — most popular
Discord — webhook
Slack — webhook
Email — SMTP
Webhook — any custom URL
PagerDuty, Opsgenie — enterprise

Set up a Telegram bot notification so you get an instant alert on your phone when anything goes down.

Part 3: Metrics Stack (Prometheus + Grafana)

For detailed server metrics — CPU, RAM, disk, network over time with graphs and dashboards.

Architecture

node-exporter (collects metrics) → Prometheus (stores metrics) → Grafana (visualizes)

Deploy with Docker Compose

# compose.yml
services:
  prometheus:
    image: prom/prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    restart: unless-stopped
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'

volumes:
  prom_data:
  grafana_data:

Prometheus Config

Create prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Start and Configure

docker compose up -d

# Prometheus: http://YOUR_SERVER:9090
# Grafana: http://YOUR_SERVER:3000 (admin/admin)

Grafana Setup

Log in to Grafana (admin/admin)
Add data source → Prometheus → URL: http://prometheus:9090
Import dashboard → ID: 1860 (Node Exporter Full)
You now have CPU, RAM, disk, network, and more in beautiful graphs

What You Get

CPU usage over time with per-core breakdown
Memory used/cached/free trends
Disk I/O read/write speeds
Network bandwidth in/out
System load averages
Disk space trends (predict when you'll run out)

Part 4: Log Monitoring

journalctl (Built-in)

# All logs
journalctl -f                       # Follow live

# Specific service
journalctl -u nginx -f
journalctl -u docker -f

# Errors only
journalctl -p err --since "1 hour ago"

# Failed services
systemctl --failed

Simple Log Monitoring Script

#!/bin/bash
# Watch for errors in key logs
tail -f /var/log/nginx/error.log \
       /var/log/auth.log \
       /var/log/fail2ban.log | \
  grep --line-buffered -iE "error|fail|denied|ban" | \
  while read line; do
    echo "[ALERT] $line"
    # Send to Telegram/Discord here
  done

Loki + Grafana (Advanced Log Aggregation)

For searchable, indexed logs across multiple servers:

# Add to your monitoring compose.yml
services:
  loki:
    image: grafana/loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - loki_data:/loki

  promtail:
    image: grafana/promtail
    restart: unless-stopped
    volumes:
      - /var/log:/var/log:ro
      - ./promtail.yml:/etc/promtail/config.yml
    depends_on:
      - loki

volumes:
  loki_data:

In Grafana: Add data source → Loki → URL: http://loki:3100

Now you can search all your logs from Grafana's Explore panel.

Full reference: Linux Log Files Explained

Part 5: Docker Monitoring

Check Container Health

# Running containers
docker ps

# Resource usage per container
docker stats

# Container logs
docker logs container-name -f --tail 50

# Inspect health check status
docker inspect --format='{{.State.Health.Status}}' container-name

Monitor Docker with Prometheus

Add cAdvisor to your compose stack:

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

Add to prometheus.yml:

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Import Grafana dashboard ID 14282 for Docker container metrics.

Part 6: SSL Certificate Monitoring

Don't let your HTTPS certificates expire.

With Uptime Kuma

Add a monitor of type "HTTP(s) - Keywords" or the built-in SSL check. Set alert threshold to 14 days before expiry.

Manual Check

# Check certificate expiry
echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -enddate

# Script to check and alert
EXPIRY=$(echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -enddate | cut -d= -f2)
DAYS_LEFT=$(( ($(date -d "$EXPIRY" +%s) - $(date +%s)) / 86400 ))
if [ "$DAYS_LEFT" -lt 14 ]; then
    echo "ALERT: SSL expires in $DAYS_LEFT days!"
fi

Online Check

Use our SSL Certificate Checker to check any domain's certificate status.

Part 7: Alerting Best Practices

What to Alert On (Wake You Up)

Service down (HTTP check returns non-200)
Disk > 90% full
SSL certificate expires within 7 days
Server unreachable (ping fails)
Multiple failed SSH attempts (potential attack)

What to Warn About (Check Tomorrow)

CPU > 80% for 15+ minutes
Memory > 85%
Disk > 75%
Response time > 2 seconds
Docker container restarting frequently

What to Just Log

Individual request errors (404s, 500s)
Normal SSH logins
Cron job completions
Package updates available

Alert Fatigue

The #1 monitoring mistake: too many alerts. If you get 50 alerts a day, you stop reading them. Keep alerts to critical issues only.

Part 8: Monitoring Checklist

Minimum Setup (Any Server)

# 1. Install Uptime Kuma
docker run -d -p 3001:3001 -v uptime-data:/app/data --restart unless-stopped louislam/uptime-kuma

# 2. Set up monitors for your services
# 3. Configure Telegram alerts
# 4. Add cron health check script

# Done. 90% of monitoring value for 10 minutes of work.

Full Stack (Production)

Uptime Kuma — external uptime checks + SSL monitoring
Prometheus + node-exporter — server metrics
Grafana — dashboards and visualization
cAdvisor — Docker container metrics
Loki + Promtail — log aggregation
Fail2ban — automatic intrusion response

Resource Requirements

Stack	RAM Needed	Best For
Uptime Kuma only	128MB	Small setups, 1-5 servers
Prometheus + Grafana	512MB-1GB	Medium setups
Full stack (all above)	2GB+	Production, multiple servers

Related Guides

Related Tools

SSL Certificate Checker — check cert expiry
Port Scanner — verify ports are open
Speed Test — test connection speed
Uptime Calculator — SLA downtime calculator
Global Latency Test — ping worldwide

Prerequisites

What to Monitor

Part 1: Quick Health Checks (No Tools Needed)

One-Liner Health Check

Watch Resources Live

Cron-Based Alerts

Part 2: Uptime Monitoring (Uptime Kuma)

Deploy with Docker

What to Monitor

Alerting

Part 3: Metrics Stack (Prometheus + Grafana)

Architecture

Deploy with Docker Compose

Prometheus Config

Start and Configure

Grafana Setup

What You Get

Part 4: Log Monitoring

journalctl (Built-in)

Simple Log Monitoring Script

Loki + Grafana (Advanced Log Aggregation)

Part 5: Docker Monitoring

Check Container Health

Monitor Docker with Prometheus

Part 6: SSL Certificate Monitoring

With Uptime Kuma

Manual Check

Online Check

Part 7: Alerting Best Practices

What to Alert On (Wake You Up)

What to Warn About (Check Tomorrow)

What to Just Log

Alert Fatigue

Part 8: Monitoring Checklist

Minimum Setup (Any Server)

Full Stack (Production)

Resource Requirements

Related Guides

Related Tools