Skip to main content
/

Technical Track Record

Case 02: Scapy DHCP Sentinel - Rogue DHCP Detection for NetDevOps

Note: Source code is private due to enterprise compliance and non-disclosure agreements. Detailed system architecture and DevOps documentation are available for review.
Python 3.11ScapyFastAPISQLiteDocker Compose

Briefing

Rogue DHCP servers can cause outages, bad addressing, and traffic diversion within minutes, yet most teams still rely on reactive troubleshooting. I designed Scapy DHCP Sentinel as a proactive local detector: a host-side sensor sends real DHCPDISCOVER probes, captures DHCPOFFER replies, compares responders against a corporate whitelist, and surfaces suspicious servers in an operational UI before they escalate into incidents.

Technical Deep Dive

The architecture cleanly separates the privileged sensor runtime from the web core: Scapy and BPF run on the host for real network-interface access, while FastAPI, SQLite, and the local dashboard stay isolated in a containerized core. I implemented local HTTP ingest, `trusted/suspect/anomaly` classification, auditable persistence for runs and events, a recurring watchdog, and disk spool buffering so temporary core downtime does not erase operational evidence.

NetDevOps Sensor

Rogue DHCP Detection Loop

DetectionDHCPDISCOVER
Classification3 states
PersistenceSQLite
ExecutionWatchdog

Documentation

Local application for proactive detection of rogue DHCP servers in corporate networks, with a host-side sensor based on Scapy, a FastAPI web core, local SQLite persistence, and a locally served operational dashboard.

Overview The project was designed to monitor DHCP replies on the local network and quickly identify unauthorized servers before they cause outages, addressing errors, traffic diversion, or audit risk. The solution adopts a hybrid architecture: the sensor runs natively on the host to gain real access to Scapy and BPF on macOS, while the application core runs in Docker to simplify bootstrap, persistence, and distribution.

The main business flow is straightforward:

  • the sensor sends a DHCPDISCOVER broadcast;
  • active DHCP servers reply with DHCPOFFER;
  • the sensor extracts technical metadata and sends the result to the core;
  • the core compares responders against a whitelist;
  • replies are classified as trusted, suspect, or anomaly;
  • runs, replies, and events are persisted locally and exposed in the UI.

Architecture

code
[Local Browser / UI]
   |
   | GET /api/v1/dashboard
   | POST http://127.0.0.1:8999/scan
   v
[detector-core - FastAPI + SQLite + UI]
   ^
   | POST /api/v1/ingest
   |
[dhcp-sensor-host - FastAPI + Scapy + Watchdog + Spool]
   |
   | DHCPDISCOVER / capture DHCPOFFER
   v
[Local network]

Technology Stack

ComponentTechnologyVersionNotes
LanguagePython3.11 in the containerMain runtime for the core
Core APIFastAPI0.115.12Ingest, dashboard, and health endpoints
Sensor APIFastAPI0.115.12Local scan and watchdog control
ASGI ServerUvicorn0.34.0Runtime for core and sensor
Modeling and validationPydantic2.10.6Configuration and payload schemas
ConfigurationPyYAML6.0.2Reads detector.yml
Network sensorScapy2.5.0DHCPDISCOVER, sniffing, and DHCP parsing
HTTP transportRequests2.32.3Local sensor-to-core delivery
PersistenceSQLitestandard moduleFile-based local database
UIHTML + CSS + vanilla JavaScriptn/aStatic dashboard served by the core
ContainerizationDocker / Docker Composen/aOnly the core is containerized
TestsPytest + httpx8.3.5 / 0.28.1Classification, ingest, and watchdog tests

Main Modules

ModuleResponsibilityMain Files
Core APIState initialization, HTTP endpoints, dashboard, and static filescore/app/main.py
ConfigurationConfiguration schema, validation, and local snapshotcore/app/config.py
ClassificationCompares replies against the whitelistcore/app/classifier.py
PersistenceStores runs, replies, events, and metricscore/app/repository.py, core/app/database.py
ModelsPydantic contracts for ingest, dashboard, and domain objectscore/app/models.py
LoggingStructured JSON line loggingcore/app/logging_utils.py
Sensor APISensor endpoints and HTTP error translationsensor/app/service.py
Sensor runtimeLocks, watchdog, lifecycle, and spool flushsensor/app/runtime.py
DHCP scannerBuilds DHCPDISCOVER, Scapy sniffer, and DHCPOFFER extractionsensor/app/scanner.py
Transport and spoolPOST to the core, temporary storage, and replaysensor/app/transport.py
DashboardLocal interface, i18n, metrics, and operational controlscore/static/index.html, core/static/app.js, core/static/base.css, core/static/dashboard.css
Local operationsBootstrap, scan trigger, and sensor stop flowscripts/bootstrap.sh, scripts/run-scan.sh, scripts/stop-sensor.sh

Technical Flows

  1. Live Scan: the UI or run-scan.sh live calls POST /scan, the sensor performs a real scan with Scapy, forwards the payload to the core, and the core persists the result in SQLite.
  2. Watchdog: the UI or POST /watchdog/start starts a daemon thread in the sensor that repeats live scans at the interval configured in sensor.watchdog_interval_seconds.
  3. Classification: each received reply is compared against trusted_dhcp_servers; in strict, the match must be complete; in lenient, any matching identifier is enough.
  4. Event generation: replies classified as suspect generate a rogue_dhcp_detected event with severity and payload persisted in the events table.
  5. Degraded mode with spool: if the core is unavailable, the sensor stores the raw scan in data/state/spool; on the next successful delivery or startup, it attempts to resend pending files.
  6. Dashboard: the UI queries GET /api/v1/dashboard to display metrics, whitelist entries, run history, recent events, and the configuration summary.
  7. Local operation: bootstrap.sh creates the sensor virtualenv, starts the core in Docker, starts the sensor in the background, and validates both through health checks.

Operation, Permissions, and Controls

  • V1 was designed for local use on localhost.
  • There is no authentication between UI, sensor, and core.
  • The core uses open CORS, acceptable for local scope but not hardened for remote exposure.
  • On macOS, the sensor may require sudo to open /dev/bpf* and perform real Live Scan or Watchdog operations.
  • The process that requires elevated privileges is the sensor, not the core or the UI.
  • The sensor endpoint runs on 127.0.0.1:8999 and the core runs on 127.0.0.1:8000.

Database

EntityKey fieldsNotes
runsid, started_at, completed_at, status, interface_name, transaction_id, trusted_count, suspect_count, anomaly_count, duration_msSummarizes each scan execution
responsesrun_id, source_ip, source_mac, server_identifier, message_type, offered_ip, classification, reason, matched_trusted_serverStores each observed and classified DHCPOFFER
eventsrun_id, response_id, event_type, severity, interface_name, payload_jsonRecords rogue_dhcp_detected incidents

Rules and automations

  • Repository.store_run() stores the execution, each classified reply, and derived events.
  • fetch_metrics() aggregates total runs, suspects, trusted replies, anomalies, and average duration.
  • Referential integrity is preserved with FOREIGN KEY and ON DELETE CASCADE.
  • The schema is created automatically via init_db() during core startup.

Local persistence

  • data/state/detector.db: primary SQLite database.
  • data/state/config-snapshot.json: serialized snapshot of the loaded configuration.
  • data/logs/dhcp-detector.log: structured core log.
  • data/logs/sensor-service.log: sensor stdout/stderr in background mode.
  • data/state/spool/: scans not delivered to the core due to temporary unavailability.

API Response format

  • The core and the sensor return raw JSON, without a single global envelope for every route.
  • The dashboard endpoint returns a consolidated envelope with metrics, runs, events, trusted servers, and config summary.
  • On errors, FastAPI returns detail with HTTP status aligned to the failure.

Core

MethodRouteDescription
GET/api/v1/healthCore health check, validation mode, and total trusted servers
GET/api/v1/configSummary of the loaded configuration
POST/api/v1/ingestReceives a scan from the sensor, classifies it, and persists it
GET/api/v1/dashboardConsolidated envelope for the UI
GET/api/v1/eventsLists recent events
GET/api/v1/runsLists recent runs
GET/Serves the HTML UI with cache-control disabled

Sensor

MethodRouteDescription
GET/healthSensor health check with watchdog state
GET/watchdog/statusSnapshot of the recurring scheduler
POST/watchdog/startStarts the watchdog
POST/watchdog/stopStops the watchdog
POST/scanExecutes live, fixture, or starts watchdog depending on the mode

Relevant status codes and behaviors

  • 409: scan_in_progress when a scan is already running.
  • 403: fixture_mode_disabled when fixture mode is disabled in config.
  • 503: sensor_not_ready or core_not_ready while components are still initializing.
  • 500: unhandled failures, including Scapy/BPF permission errors.

DHCP Scanner

  • The scanner generates a random locally administered client_mac.
  • The transaction_id is also random to correlate only offers related to that discover.
  • The sniffer uses the filter udp and (port 67 or port 68).
  • Only packets with DHCP OFFER and matching xid enter the final payload.
  • fixture mode reads sensor/fixtures/sample-scan.json for demos and tests.
  • The retries field already exists in the payload and configuration, but it does not yet produce actual repeated scans in the current implementation.

Dashboard and Interface

  • The UI is served directly by the core, with no separate frontend build pipeline.
  • The dashboard displays aggregate metrics, recent events, an execution ledger, the whitelist, the operational console, and watchdog state.
  • The main button supports hover, click microinteraction, and execution states.
  • The interface includes a PT-BR / EN language selector persisted in localStorage.
  • The Powered by Eric Barros credit points to https://eric.epico.gold.
  • The frontend consumes the sensor at http://127.0.0.1:8999 and the core at /api/v1.

Configuration Main file

  • data/config/detector.yml

Example

yaml
interface: en0
timeout_seconds: 5
retries: 1
validation_mode: strict
trusted_list_version: "2026-03-10"
trusted_dhcp_servers:
  - name: dhcp-core-01
    ip: 192.168.1.10
    mac: "00:11:22:33:44:55"
    server_identifier: 192.168.1.10
logging:
  level: info
  format: json
  path: ./data/logs/dhcp-detector.log
transport:
  mode: http
  endpoint: http://127.0.0.1:8000/api/v1/ingest
sensor:
  host: 127.0.0.1
  port: 8999
  allow_fixture_mode: true
  default_mode: live
  watchdog_interval_seconds: 60
  watchdog_autostart: false
storage:
  database_path: ./data/state/detector.db
  snapshot_path: ./data/state/config-snapshot.json

Field summary

FieldFunction
interfaceNetwork interface used by the sensor
timeout_secondsCapture window after the discover
retriesReserved field for future retry evolution
validation_modestrict or lenient
trusted_list_versionLogical whitelist version
trusted_dhcp_serversList of official DHCP servers
transport.endpointLocal core URL for ingest
sensor.default_modeDefault mode for the /scan endpoint
sensor.watchdog_interval_secondsInterval between recurring cycles
sensor.watchdog_autostartStarts the watchdog automatically
storage.database_pathSQLite database path
storage.snapshot_pathConfiguration snapshot path

Behavior after editing the file

  • Rebuilding the Docker image is not required.
  • There is also no hot reload in the current runtime.
  • You must restart the core, the sensor, or both depending on the field changed.

Scripts

  • ./scripts/bootstrap.sh: prepares the sensor virtualenv, starts the core, starts the sensor, and validates health checks.
  • ./scripts/run-scan.sh live: runs a manual live scan through the sensor API.
  • ./scripts/run-scan.sh watchdog: starts watchdog mode through the /scan endpoint.
  • ./scripts/stop-sensor.sh: stops the background sensor process through the PID file.

Structure

code
core/
  app/        API, configuration, persistence, and classification
  static/     local web interface
sensor/
  app/        sensor API, runtime, scanner, and transport
  fixtures/   demo payload
config/       configuration example
data/         active config, logs, database, and spool
scripts/      bootstrap and operational utilities
tests/        automated tests

Observability

  • append_json_log() writes structured events to the core log.
  • GET /api/v1/health and GET /health act as minimal health checks.
  • The UI dashboard acts as the local operational panel.
  • The SQLite database works as an auditable trail of executions and incidents.
  • Disk spool enables troubleshooting of delivery failures between sensor and core.

Security

  • There is no authentication or authorization between components in V1.
  • The exposure surface assumes local operation on localhost.
  • The whitelist is a critical asset: incorrect configuration directly affects classification.
  • Using Scapy requires operational care because the sensor may run with elevated privileges.
  • The project correctly separates the privileged sensor runtime from the containerized core runtime.

Deploy

  1. Ensure Docker, python3, and curl are installed on the host.
  2. Adjust data/config/detector.yml according to the interface and trusted servers.
  3. Run ./scripts/bootstrap.sh.
  4. Open http://127.0.0.1:8000 to access the UI.
  5. If you get a permission error on /dev/bpf0, restart only the sensor with sudo.

Common Problems

  • Permission denied: could not open /dev/bpf0: the sensor does not have enough permission to use Scapy on macOS.
  • Cannot connect to the Docker daemon: Docker Desktop or the daemon is not running.
  • UI language does not switch: the browser may be loading stale assets; restart the core and perform a hard refresh.
  • scan_in_progress: a manual scan or watchdog is already running in the sensor.
  • No events in the dashboard: the scan may have returned no_response, the whitelist may be correct, or the sensor may not be capturing on the right interface.
  • Changes in detector.yml are not reflected: restart is required because the configuration is not reloaded automatically.

Current V1 Limits

  • initial focus on macOS;
  • no automatic blocking of rogue servers;
  • no authentication between components;
  • no formal database schema migrations;
  • no real use of retries in the scanner;
  • no native integration with SIEM, NAC, switch, or firewall;
  • no dynamic configuration reload.