NetMon: The Ultimate Network Monitoring Toolkit

NetMon Essentials: Setup, Dashboards, and Troubleshooting

Overview

NetMon is a lightweight network monitoring solution designed for real-time visibility, alerting, and traffic analysis across routers, switches, servers, and cloud endpoints. This guide covers quick setup, dashboard configuration, and common troubleshooting steps.

Quick Setup (assumes Linux server)

  1. Prerequisites
    • Linux server (Ubuntu 20.04+ or CentOS 8+)
    • 2+ CPU cores, 4+ GB RAM, 50+ GB disk
    • Open TCP ports: 80 (HTTP), 443 (HTTPS), 162 (SNMP trap), 8086 (optional metrics)
  2. Install
    • Add NetMon repo and install package:

      Code

      sudo apt update sudo apt install netmon
    • Start and enable service:

      Code

      sudo systemctl enable –now netmon
  3. Initial web setup
    • Visit https:// and follow the web installer to create admin user, set time zone, and enable HTTPS (self-signed or provide cert).

Device Discovery & Data Collection

  • SNMP: Enable SNMP v2/v3 on switches/routers; add credentials in Settings → Integrations → SNMP.
  • Agent: Install NetMon agent on servers for metrics:

    Code

  • NetFlow/sFlow: Point exporters to NetMon collector (port ⁄6343) and enable in Integrations.
  • Cloud: Connect AWS/GCP via read-only IAM role or service account to ingest VPC flow logs and instance metrics.

Dashboards (recommended layout)

  • Overview (default): cluster health, active alerts, top talkers, total throughput — use for on-call.
  • Traffic by Protocol: stacked line chart of HTTP/HTTPS/DNS/Other.
  • Device Health: table with CPU, memory, interface errors, up/down status; color-coded thresholds.
  • Top Flows: top 10 source/destination pairs, volumes, ports.
  • Latency & Packet Loss: heatmap by site and time.

Tips:

  • Use templating variables (site, device group, interface) for reusable dashboards.
  • Combine metrics and logs panels for contextual troubleshooting.
  • Set panel refresh to 10s for real-time; 1m for broader views.

Alerting Best Practices

  • Alert types: availability (device down), performance (CPU/memory), traffic anomalies (sudden spike), threshold breaches (packet errors).
  • Severity tiers: Critical (page on-call), High (notify Slack/email), Medium (ticket), Low (log only).
  • Suppression & deduplication: enable 5-minute suppression for flapping interfaces; group alerts by device ID.
  • Alert testing: use synthetic probes and the Test Alert feature before enabling production alerts.

Common Issues & Fixes

  1. No data from device
    • Verify network reachability (ping, traceroute).
    • Check SNMP community/credentials and SNMP version mismatch.
    • Confirm firewall allows SNMP/NetFlow and NetMon ports.
  2. Incorrect metrics (wrong units or missing tags)
    • Ensure agent/collectors use same metric naming and unit conventions.
    • Check exporter versions; update to match NetMon’s schema.
  3. Dashboards slow to load
    • Reduce panel query range; enable downsampling/aggregation.
    • Increase server resources or shard metrics storage.
  4. Excessive false alerts
    • Raise thresholds, enable flapping suppression, or add anomaly-detection baselines.
  5. SSL certificate warnings
    • Install a trusted certificate (Let’s Encrypt) or add CA chain to NetMon trust store.

Maintenance & Scaling

  • Schedule nightly snapshot backups of config and time-series DB.
  • Retention: keep high-resolution metrics for 7–14 days, downsampled for long term (90–365 days).
  • For scale, add read replicas for dashboard queries and separate collectors into regional ingesters.

Quick Commands

  • Check service status:

    Code

    sudo systemctl status netmon
  • Restart collectors:

    Code

    sudo systemctl restart netmon-collector
  • Tail logs:

    Code

    sudo journalctl -u netmon -f

Further Reading

  • Use built-in Docs → Tutorials for device-specific SNMP/NetFlow templates and community-contributed dashboards.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *