Designing a Scalable Multi-threading Port Scanner for Large Networks
Goals
- Scalability: handle millions of port checks across thousands of hosts with predictable resource usage.
- Throughput: maximize sockets/second while respecting network limits.
- Accuracy: detect open/closed/filtered reliably.
- Safety: avoid overwhelming targets or your network; include rate limiting and polite defaults.
Architecture overview
- Producer-consumer model: a task producer enqueues (host, port) probes; multiple worker threads consume tasks and perform network I/O.
- Work partitioning: shard targets by IP ranges or CIDR blocks to balance load and improve locality.
- Rate-control layer: global and per-target throttles to cap probes/sec and concurrent connections per host.
- Result aggregation: thread-safe collector buffers results, performs deduping, and writes to storage asynchronously.
- I/O model: use non-blocking sockets or an asynchronous event loop where possible; threads drive readiness/timeout handling rather than blocking per-probe.
Concurrency model choices
- Thread pool + non-blocking sockets (recommended): threads run event loops (epoll/kqueue/IOCP) to manage thousands of sockets per thread. Balanced CPU and memory usage.
- Async single-threaded event loop: highest scalability for I/O-bound scans but more complex when mixing CPU-bound tasks.
- Pure thread-per-connection: simple but poor scalability and high resource use — avoid for large networks.
Key components & implementation notes
- Task queue: bounded, lock-free or with backpressure to avoid unbounded memory growth.
- Socket handling: set non-blocking, use poll/epoll/kqueue/IOCP; reuse sockets where possible. Ensure proper close-on-exec and linger settings.
- Timeouts: per-probe connect/read timeouts; use adaptive timeouts based on RTT estimates to reduce wasted wait.
- Retries & probe types: default to TCP SYN or TCP connect; support UDP with application-layer probes. Retries only for ambiguous timeouts with exponential backoff.
- Port ordering: scan common ports first or use heuristic ordering to surface important findings earlier.
- Backoff and politeness: enforce per-host concurrent connection caps and global rate limits; optionally randomize probe timing to avoid traffic bursts.
- Resource limits: monitor and cap file descriptors, memory, and thread count. Use async I/O to reduce file descriptor per-thread explosion.
- Error handling: classify errors (refused, timeout, unreachable) and avoid flooding logs with repeats.
- Logging & metrics: expose metrics (probes/sec, concurrent sockets, errors, queue depth) and structured logs for analysis.
- Persistence: write incremental results to disk or DB; batch writes to avoid I/O bottlenecks.
Performance tuning
- Tune thread count: align to number of CPU cores for processing plus additional threads for I/O management; avoid excessive context switching.
- Socket buffer sizes: increase send/recv buffers for high-throughput environments.
- Batching & pipelining: group tasks per-host to reuse connections where protocol allows.
- Affinity: pin threads to CPU cores and schedule tasks to threads that own certain IP shards for cache locality.
- Measure and iterate: benchmark with realistic target sets; watch TCP stack parameters (TIME_WAIT) and tune kernel (e.g., ephemeral port range, tcp_tw_reuse).
Security and legal considerations
- Obtain authorization before scanning third-party networks.
- Provide configurable identity (scan source, contact info) and safe defaults to avoid abuse.
- Avoid techniques that can be easily mistaken for attacks (e.g., aggressive SYN floods).
Operational features
- Pause/resume and checkpointing: persist progress to resume large scans.
- Distributed mode: coordinator assigns shards to worker nodes; use consistent hashing to rebalance.
- Discovery integration: feed live host discovery (ARP/ICMP) to reduce unnecessary port probes.
- Filtering & templating: allow port sets, service fingerprints, and scheduling windows.
Example minimal strategy (practical defaults)
- Use N worker threads (N ~ CPU cores), each running an epoll loop managing up to M sockets (M tuned by FD limits).
- Global cap: 10,000 concurrent sockets; per-host cap: 50.
- Default connect timeout: 500 ms, adaptive based on median RTT.
- Scan common 1,000 ports first, then optional full 65k range.
- Write results in JSONL batches of 1,000.
If you want, I can provide:
- a sample Python implementation sketch using selectors/asyncio, or
- configuration presets for small, medium, and large scans.
Leave a Reply
You must be logged in to post a comment.