The Complete Server Monitoring Guide: Know When Things Break

7 min read
Intermediate Monitoring Server Uptime DevOps Guide

Prerequisites

  • A Linux server to monitor
  • Docker installed (for monitoring tools)

Quick Answer: For quick uptime monitoring, deploy Uptime Kuma: docker run -d -p 3001:3001 -v uptime-data:/app/data louislam/uptime-kuma. For full metrics, deploy Prometheus + Grafana + node-exporter. For logs, use journalctl -f or set up Loki.

Need a VPS? Vultr (free credit), DigitalOcean ($200 free credit), or RackNerd (cheap annual deals).

Your server is running. But how do you know it's healthy? How do you find out when something breaks — before your users do? Monitoring answers these questions.

This guide covers everything from simple uptime checks to full observability stacks.


What to Monitor

Category What to Watch Why
Uptime Is the service responding? Know immediately when something goes down
CPU Usage percentage High CPU = performance issues or crypto miners
Memory Used vs available RAM Memory leaks crash services
Disk Space remaining Full disk = everything breaks
Network Bandwidth, latency, errors Detect DDoS, bandwidth limits
Services Nginx, Docker, databases running? Restart failed services automatically
Logs Errors, warnings, unusual patterns Find root cause of problems
SSL Certificate expiry Don't let HTTPS break
Response time How fast are your endpoints? Catch slowdowns before users notice

Part 1: Quick Health Checks (No Tools Needed)

You can monitor the basics with commands you already have:

One-Liner Health Check

#!/bin/bash
echo "=== Server Health ==="
echo "Uptime: $(uptime -p)"
echo "Load: $(cat /proc/loadavg | awk '{print $1, $2, $3}')"
echo "CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')% used"
echo "RAM: $(free -m | awk 'NR==2{printf "%dMB/%dMB (%.1f%%)", $3, $2, $3/$2*100}')"
echo "Disk: $(df -h / | awk 'NR==2{print $3 "/" $2 " (" $5 ")"}')"
echo "Docker: $(docker ps -q 2>/dev/null | wc -l) containers running"
echo "Failed services: $(systemctl --failed --no-legend | wc -l)"
echo "Nginx: $(systemctl is-active nginx)"
echo "SSH: $(systemctl is-active ssh)"

Watch Resources Live

# CPU + memory + processes
htop

# Disk I/O
iostat -x 1

# Network bandwidth
iftop
# Or: vnstat -l

# Docker resource usage
docker stats

Cron-Based Alerts

Simple monitoring without any tools — cron checks and emails/Telegram alerts:

#!/bin/bash
# /opt/monitor.sh — run every 5 minutes via cron

# Check disk
DISK_USAGE=$(df / | awk 'NR==2{print $5}' | tr -d '%')
if [ "$DISK_USAGE" -gt 90 ]; then
    echo "ALERT: Disk usage at ${DISK_USAGE}%"
    # Send alert (Telegram, email, webhook)
fi

# Check memory
MEM_USAGE=$(free | awk 'NR==2{printf "%.0f", $3/$2*100}')
if [ "$MEM_USAGE" -gt 90 ]; then
    echo "ALERT: Memory usage at ${MEM_USAGE}%"
fi

# Check if nginx is running
if ! systemctl is-active --quiet nginx; then
    echo "ALERT: Nginx is down! Attempting restart..."
    systemctl restart nginx
fi

# Check if a URL responds
HTTP_CODE=$(curl -so /dev/null -w '%{http_code}' --max-time 10 https://yourdomain.com)
if [ "$HTTP_CODE" != "200" ]; then
    echo "ALERT: Website returned $HTTP_CODE"
fi
# Run every 5 minutes
crontab -e
# Add: */5 * * * * /opt/monitor.sh >> /var/log/monitor.log 2>&1

Part 2: Uptime Monitoring (Uptime Kuma)

Uptime Kuma is the best open-source uptime monitor. Beautiful UI, easy setup, supports HTTP, TCP, ping, DNS, Docker, and more.

Deploy with Docker

# compose.yml
services:
  uptime-kuma:
    image: louislam/uptime-kuma
    restart: unless-stopped
    ports:
      - "3001:3001"
    volumes:
      - uptime-data:/app/data

volumes:
  uptime-data:
docker compose up -d
# Visit http://YOUR_SERVER:3001

What to Monitor

Add these monitors:

Type Target Interval
HTTP https://yourdomain.com 60s
HTTP https://yourdomain.com/api/health 30s
TCP localhost:5432 (PostgreSQL) 60s
TCP localhost:6379 (Redis) 60s
Ping Your other servers 60s
Docker Container names 60s
SSL Your domains (checks expiry) 86400s (daily)

Alerting

Uptime Kuma supports 90+ notification channels:

  • Telegram — most popular
  • Discord — webhook
  • Slack — webhook
  • Email — SMTP
  • Webhook — any custom URL
  • PagerDuty, Opsgenie — enterprise

Set up a Telegram bot notification so you get an instant alert on your phone when anything goes down.


Part 3: Metrics Stack (Prometheus + Grafana)

For detailed server metrics — CPU, RAM, disk, network over time with graphs and dashboards.

Architecture

node-exporter (collects metrics) → Prometheus (stores metrics) → Grafana (visualizes)

Deploy with Docker Compose

# compose.yml
services:
  prometheus:
    image: prom/prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    restart: unless-stopped
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus

  node-exporter:
    image: prom/node-exporter
    restart: unless-stopped
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'

volumes:
  prom_data:
  grafana_data:

Prometheus Config

Create prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Start and Configure

docker compose up -d

# Prometheus: http://YOUR_SERVER:9090
# Grafana: http://YOUR_SERVER:3000 (admin/admin)

Grafana Setup

  1. Log in to Grafana (admin/admin)
  2. Add data source → Prometheus → URL: http://prometheus:9090
  3. Import dashboard → ID: 1860 (Node Exporter Full)
  4. You now have CPU, RAM, disk, network, and more in beautiful graphs

What You Get

  • CPU usage over time with per-core breakdown
  • Memory used/cached/free trends
  • Disk I/O read/write speeds
  • Network bandwidth in/out
  • System load averages
  • Disk space trends (predict when you'll run out)

Part 4: Log Monitoring

journalctl (Built-in)

# All logs
journalctl -f                       # Follow live

# Specific service
journalctl -u nginx -f
journalctl -u docker -f

# Errors only
journalctl -p err --since "1 hour ago"

# Failed services
systemctl --failed

Simple Log Monitoring Script

#!/bin/bash
# Watch for errors in key logs
tail -f /var/log/nginx/error.log \
       /var/log/auth.log \
       /var/log/fail2ban.log | \
  grep --line-buffered -iE "error|fail|denied|ban" | \
  while read line; do
    echo "[ALERT] $line"
    # Send to Telegram/Discord here
  done

Loki + Grafana (Advanced Log Aggregation)

For searchable, indexed logs across multiple servers:

# Add to your monitoring compose.yml
services:
  loki:
    image: grafana/loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - loki_data:/loki

  promtail:
    image: grafana/promtail
    restart: unless-stopped
    volumes:
      - /var/log:/var/log:ro
      - ./promtail.yml:/etc/promtail/config.yml
    depends_on:
      - loki

volumes:
  loki_data:

In Grafana: Add data source → Loki → URL: http://loki:3100

Now you can search all your logs from Grafana's Explore panel.

Full reference: Linux Log Files Explained


Part 5: Docker Monitoring

Check Container Health

# Running containers
docker ps

# Resource usage per container
docker stats

# Container logs
docker logs container-name -f --tail 50

# Inspect health check status
docker inspect --format='{{.State.Health.Status}}' container-name

Monitor Docker with Prometheus

Add cAdvisor to your compose stack:

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

Add to prometheus.yml:

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Import Grafana dashboard ID 14282 for Docker container metrics.


Part 6: SSL Certificate Monitoring

Don't let your HTTPS certificates expire.

With Uptime Kuma

Add a monitor of type "HTTP(s) - Keywords" or the built-in SSL check. Set alert threshold to 14 days before expiry.

Manual Check

# Check certificate expiry
echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -enddate

# Script to check and alert
EXPIRY=$(echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | openssl x509 -noout -enddate | cut -d= -f2)
DAYS_LEFT=$(( ($(date -d "$EXPIRY" +%s) - $(date +%s)) / 86400 ))
if [ "$DAYS_LEFT" -lt 14 ]; then
    echo "ALERT: SSL expires in $DAYS_LEFT days!"
fi

Online Check

Use our SSL Certificate Checker to check any domain's certificate status.


Part 7: Alerting Best Practices

What to Alert On (Wake You Up)

  • Service down (HTTP check returns non-200)
  • Disk > 90% full
  • SSL certificate expires within 7 days
  • Server unreachable (ping fails)
  • Multiple failed SSH attempts (potential attack)

What to Warn About (Check Tomorrow)

  • CPU > 80% for 15+ minutes
  • Memory > 85%
  • Disk > 75%
  • Response time > 2 seconds
  • Docker container restarting frequently

What to Just Log

  • Individual request errors (404s, 500s)
  • Normal SSH logins
  • Cron job completions
  • Package updates available

Alert Fatigue

The #1 monitoring mistake: too many alerts. If you get 50 alerts a day, you stop reading them. Keep alerts to critical issues only.


Part 8: Monitoring Checklist

Minimum Setup (Any Server)

# 1. Install Uptime Kuma
docker run -d -p 3001:3001 -v uptime-data:/app/data --restart unless-stopped louislam/uptime-kuma

# 2. Set up monitors for your services
# 3. Configure Telegram alerts
# 4. Add cron health check script

# Done. 90% of monitoring value for 10 minutes of work.

Full Stack (Production)

  1. Uptime Kuma — external uptime checks + SSL monitoring
  2. Prometheus + node-exporter — server metrics
  3. Grafana — dashboards and visualization
  4. cAdvisor — Docker container metrics
  5. Loki + Promtail — log aggregation
  6. Fail2ban — automatic intrusion response

Resource Requirements

Stack RAM Needed Best For
Uptime Kuma only 128MB Small setups, 1-5 servers
Prometheus + Grafana 512MB-1GB Medium setups
Full stack (all above) 2GB+ Production, multiple servers

Related Guides

Related Tools