Features
Heartbeat Monitoring
The agent pushes a heartbeat to the hub every 30 seconds (configurable). The hub records the last heartbeat time and calculates an offline_deadline_at.
| Parameter | Default | Description |
|---|---|---|
heartbeat_interval_sec | 30 | How often the agent pushes |
grace_multiplier | 3 | Tolerance multiplier |
| Effective grace period | 90s | 30s Γ 3 = 90 seconds before marked offline |
Offline Detection
EZMON uses a deadline-based approach β far more accurate than simple polling:
offline_deadline_at = last_seen_at + (heartbeat_interval_sec Γ grace_multiplier)The Cloudflare Worker Cron checks every minute for agents past their deadline. A single global evaluator handles all agents β not one scheduler per agent.
Host Metrics
| Metric | Library | Data Collected |
|---|---|---|
| CPU | cpu.Percent() | Total usage % |
| Memory | mem.VirtualMemory() | Used, total, % |
| Disk | disk.Usage("/") | Used, total, % (root partition) |
| Load | load.Avg() | 1m, 5m, 15m averages |
| Network | net.IOCounters() | Bytes sent/received |
| Docker | docker ps -q | Running container count |
Metrics are stored as:
- Latest snapshot (
agent_state) β for real-time display - 5-minute buckets (
metric_buckets) β for historical charts, retained 7 days
Incidents
Every time an agent transitions from online to offline, EZMON automatically creates an incident. It resolves when the agent recovers.
Only one open incident per agent is allowed at a time. No spam incidents even if the evaluator runs multiple times while an agent is offline.
Cloud Monitors
Monitor external endpoints without running an agent:
| Type | How It Works | Use Case |
|---|---|---|
| HTTP | HEAD request, verify status code | Check if a website/API is up |
| TLS | HEAD + crt.sh TLS expiry lookup | Monitor SSL certificate expiry |
| Keyword | GET request, search body for keyword | Verify page content |
- Maximum 20 monitors per project
- Checks run in parallel on Cloudflare Workers
- Results retained for 30 days
Notification Channels
Notifications are sent only on state transitions (online β offline or offline β online):
| Channel | Configuration | Notes |
|---|---|---|
| Telegram | Bot token + Chat ID | Via Telegram Bot API |
| Discord | Webhook URL | Embed messages |
| Webhook | Custom URL | POST JSON payload |
Each channel supports targetType: agent, monitor, or all to isolate alerts by source.
Public Status Page
Each project can enable a public status page accessible without login:
- Real-time status of selected agents
- Active incidents
- Uptime summary
URL format: https://your-hub.vercel.app/status/YOUR_PROJECT_SLUG