Commit Graph

6 Commits

Author SHA1 Message Date
Justin Bollinger
467976fe57 update to ollama_benchmark.py 2026-02-14 13:27:23 -05:00
Justin Bollinger
b6347d5012 feat: default to mistral model only in benchmark
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:00:21 -05:00
Justin Bollinger
b2eb19c130 feat: update LLM attack prompt to CTF-style framing
Replace the denylist-oriented prompt with a capture-the-flag scenario
prompt that focuses on recovering passwords using industry terms and
company name permutations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:36:45 -05:00
Justin Bollinger
9652c26986 feat: add --stdout flag to toggle printing model responses
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:27:05 -05:00
Justin Bollinger
30a6e8c6a7 feat: include candidates and response in benchmark JSON output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:25:48 -05:00
Justin Bollinger
56cf40ba29 feat: add Ollama model benchmark script with context window tuning
Standalone tool to compare multiple Ollama models for password candidate
generation. Tests each model at multiple num_ctx values (default: 2048,
8192, 32768) to find the speed/quality sweet spot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:01:19 -05:00