Multi-threading Port Scanner: Techniques to Speed Up Network Scans

Designing a Scalable Multi-threading Port Scanner for Large Networks

Scalability: handle millions of port checks across thousands of hosts with predictable resource usage.
Throughput: maximize sockets/second while respecting network limits.
Accuracy: detect open/closed/filtered reliably.
Safety: avoid overwhelming targets or your network; include rate limiting and polite defaults.

Producer-consumer model: a task producer enqueues (host, port) probes; multiple worker threads consume tasks and perform network I/O.
Work partitioning: shard targets by IP ranges or CIDR blocks to balance load and improve locality.
Rate-control layer: global and per-target throttles to cap probes/sec and concurrent connections per host.
Result aggregation: thread-safe collector buffers results, performs deduping, and writes to storage asynchronously.
I/O model: use non-blocking sockets or an asynchronous event loop where possible; threads drive readiness/timeout handling rather than blocking per-probe.

Thread pool + non-blocking sockets (recommended): threads run event loops (epoll/kqueue/IOCP) to manage thousands of sockets per thread. Balanced CPU and memory usage.
Async single-threaded event loop: highest scalability for I/O-bound scans but more complex when mixing CPU-bound tasks.
Pure thread-per-connection: simple but poor scalability and high resource use — avoid for large networks.

Task queue: bounded, lock-free or with backpressure to avoid unbounded memory growth.
Socket handling: set non-blocking, use poll/epoll/kqueue/IOCP; reuse sockets where possible. Ensure proper close-on-exec and linger settings.
Timeouts: per-probe connect/read timeouts; use adaptive timeouts based on RTT estimates to reduce wasted wait.
Retries & probe types: default to TCP SYN or TCP connect; support UDP with application-layer probes. Retries only for ambiguous timeouts with exponential backoff.
Port ordering: scan common ports first or use heuristic ordering to surface important findings earlier.
Backoff and politeness: enforce per-host concurrent connection caps and global rate limits; optionally randomize probe timing to avoid traffic bursts.
Resource limits: monitor and cap file descriptors, memory, and thread count. Use async I/O to reduce file descriptor per-thread explosion.
Error handling: classify errors (refused, timeout, unreachable) and avoid flooding logs with repeats.
Logging & metrics: expose metrics (probes/sec, concurrent sockets, errors, queue depth) and structured logs for analysis.
Persistence: write incremental results to disk or DB; batch writes to avoid I/O bottlenecks.

Tune thread count: align to number of CPU cores for processing plus additional threads for I/O management; avoid excessive context switching.
Socket buffer sizes: increase send/recv buffers for high-throughput environments.
Batching & pipelining: group tasks per-host to reuse connections where protocol allows.
Affinity: pin threads to CPU cores and schedule tasks to threads that own certain IP shards for cache locality.
Measure and iterate: benchmark with realistic target sets; watch TCP stack parameters (TIME_WAIT) and tune kernel (e.g., ephemeral port range, tcp_tw_reuse).

Obtain authorization before scanning third-party networks.
Provide configurable identity (scan source, contact info) and safe defaults to avoid abuse.
Avoid techniques that can be easily mistaken for attacks (e.g., aggressive SYN floods).

Pause/resume and checkpointing: persist progress to resume large scans.
Distributed mode: coordinator assigns shards to worker nodes; use consistent hashing to rebalance.
Discovery integration: feed live host discovery (ARP/ICMP) to reduce unnecessary port probes.
Filtering & templating: allow port sets, service fingerprints, and scheduling windows.

Use N worker threads (N ~ CPU cores), each running an epoll loop managing up to M sockets (M tuned by FD limits).
Global cap: 10,000 concurrent sockets; per-host cap: 50.
Default connect timeout: 500 ms, adaptive based on median RTT.
Scan common 1,000 ports first, then optional full 65k range.
Write results in JSONL batches of 1,000.

If you want, I can provide: