Your A100 runs at 3AM. You're asleep.
gpu-monitor is watching.
Monitors your GPU fleet around the clock — alerts you on crash, OOM, overheat, and hardware failures across 20 notification channels. One Python file. Zero dependencies.
pip install gpuwatch
gpu-monitor
Yuxuan Zhang · reacher-z
MIT License · Open source
nvidia-smi is available — no environment setup, no docker, no daemon./metrics, a ready-made Grafana dashboard, and a drop-in docker-compose monitoring stack.pip install gpuwatch
export SLACK_WEBHOOK_URL=https://hooks.slack.com/services/...
# or Discord
export DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...
# or ntfy (zero signup — subscribe on your phone)
export NTFY_URL=https://ntfy.sh/my-gpu-alerts-abc123
gpu-monitor --channels to see all 20 options.
gpu-monitor
GITHUB_PAGES_REPO=your-username/gpu-monitor and GITHUB_PAGES_TOKEN=ghp_.... Stats push automatically every 60 seconds.