Multi-threading Port Scanner: Techniques to Speed Up Network Scans

Designing a Scalable Multi-threading Port Scanner for Large Networks

Goals

  • Scalability: handle millions of port checks across thousands of hosts with predictable resource usage.
  • Throughput: maximize sockets/second while respecting network limits.
  • Accuracy: detect open/closed/filtered reliably.
  • Safety: avoid overwhelming targets or your network; include rate limiting and polite defaults.

Architecture overview

  1. Producer-consumer model: a task producer enqueues (host, port) probes; multiple worker threads consume tasks and perform network I/O.
  2. Work partitioning: shard targets by IP ranges or CIDR blocks to balance load and improve locality.
  3. Rate-control layer: global and per-target throttles to cap probes/sec and concurrent connections per host.
  4. Result aggregation: thread-safe collector buffers results, performs deduping, and writes to storage asynchronously.
  5. I/O model: use non-blocking sockets or an asynchronous event loop where possible; threads drive readiness/timeout handling rather than blocking per-probe.

Concurrency model choices

  • Thread pool + non-blocking sockets (recommended): threads run event loops (epoll/kqueue/IOCP) to manage thousands of sockets per thread. Balanced CPU and memory usage.
  • Async single-threaded event loop: highest scalability for I/O-bound scans but more complex when mixing CPU-bound tasks.
  • Pure thread-per-connection: simple but poor scalability and high resource use — avoid for large networks.

Key components & implementation notes

  • Task queue: bounded, lock-free or with backpressure to avoid unbounded memory growth.
  • Socket handling: set non-blocking, use poll/epoll/kqueue/IOCP; reuse sockets where possible. Ensure proper close-on-exec and linger settings.
  • Timeouts: per-probe connect/read timeouts; use adaptive timeouts based on RTT estimates to reduce wasted wait.
  • Retries & probe types: default to TCP SYN or TCP connect; support UDP with application-layer probes. Retries only for ambiguous timeouts with exponential backoff.
  • Port ordering: scan common ports first or use heuristic ordering to surface important findings earlier.
  • Backoff and politeness: enforce per-host concurrent connection caps and global rate limits; optionally randomize probe timing to avoid traffic bursts.
  • Resource limits: monitor and cap file descriptors, memory, and thread count. Use async I/O to reduce file descriptor per-thread explosion.
  • Error handling: classify errors (refused, timeout, unreachable) and avoid flooding logs with repeats.
  • Logging & metrics: expose metrics (probes/sec, concurrent sockets, errors, queue depth) and structured logs for analysis.
  • Persistence: write incremental results to disk or DB; batch writes to avoid I/O bottlenecks.

Performance tuning

  • Tune thread count: align to number of CPU cores for processing plus additional threads for I/O management; avoid excessive context switching.
  • Socket buffer sizes: increase send/recv buffers for high-throughput environments.
  • Batching & pipelining: group tasks per-host to reuse connections where protocol allows.
  • Affinity: pin threads to CPU cores and schedule tasks to threads that own certain IP shards for cache locality.
  • Measure and iterate: benchmark with realistic target sets; watch TCP stack parameters (TIME_WAIT) and tune kernel (e.g., ephemeral port range, tcp_tw_reuse).

Security and legal considerations

  • Obtain authorization before scanning third-party networks.
  • Provide configurable identity (scan source, contact info) and safe defaults to avoid abuse.
  • Avoid techniques that can be easily mistaken for attacks (e.g., aggressive SYN floods).

Operational features

  • Pause/resume and checkpointing: persist progress to resume large scans.
  • Distributed mode: coordinator assigns shards to worker nodes; use consistent hashing to rebalance.
  • Discovery integration: feed live host discovery (ARP/ICMP) to reduce unnecessary port probes.
  • Filtering & templating: allow port sets, service fingerprints, and scheduling windows.

Example minimal strategy (practical defaults)

  • Use N worker threads (N ~ CPU cores), each running an epoll loop managing up to M sockets (M tuned by FD limits).
  • Global cap: 10,000 concurrent sockets; per-host cap: 50.
  • Default connect timeout: 500 ms, adaptive based on median RTT.
  • Scan common 1,000 ports first, then optional full 65k range.
  • Write results in JSONL batches of 1,000.

If you want, I can provide:

  • a sample Python implementation sketch using selectors/asyncio, or
  • configuration presets for small, medium, and large scans.

Comments

Leave a Reply