feat(scan): add web crawler and passive subdomain/url discovery

-crawl spiders same-host links/scripts/forms through the shared httpx client so proxy/headers/rate-limit and robots.txt are honored, bounded by -crawl-depth. -passive pulls subdomains from keyless ct feeds (crt.sh, certspotter) and historical urls from wayback, each source isolated so one feed being down doesn't sink the rest and the target sees no traffic.
2026-06-12 19:11:25 -07:00 · 2026-06-09 17:57:42 -07:00
parent 9401aa669e
commit dbe79c495e
10 changed files with 787 additions and 1 deletions
@@ -186,6 +186,26 @@ export SHODAN_API_KEY=your-api-key
 ./sif -u https://example.com -framework
 ```

+### web crawler
+
+`-crawl` - spider the target, following same-host links, scripts and forms
+
+`-crawl-depth` - max recursion depth (default 2). respects robots.txt and stays on the target host.
+
+```bash
+./sif -u https://example.com -crawl -crawl-depth 3
+```
+
+### passive discovery
+
+`-passive` - gather subdomains from certificate transparency (crt.sh, certspotter) and historical urls from the wayback machine
+
+keyless and zero traffic to the target itself - all lookups hit third-party feeds.
+
+```bash
+./sif -u https://example.com -passive
+```
+
 ### whois lookup

 `-whois` - perform whois lookups