Files
hate_crack/tests
Justin Bollinger 4eacf9d9ee fix: force LC_ALL=C for sort -u subprocesses to handle non-UTF-8 bytes
macOS `sort` is locale-strict: under LC_COLLATE=en_US.UTF-8 (the default
on most macOS shells) it errors out with "sort: Illegal byte sequence"
when stdin contains bytes that are not valid UTF-8. Cracked-password
streams routinely contain such bytes - hex-encoded fields, mixed
encodings, binary garbage from poorly-encoded source hashes - so this
fires in real fingerprint runs whenever the pot already has any non-
ASCII output.

Symptom in the fingerprint attack: the expander -> sort pipeline emits
"sort: Illegal byte sequence" and produces an empty .expanded file. The
empty-.expanded guard added in the previous patch then triggers the
"no candidates to expand" skip message - which is misleading, because
the user does have cracks; they just got dropped on the sort step.

Pass env={**os.environ, "LC_ALL": "C"} to all three subprocess.Popen
calls that invoke `sort -u`:
  - _write_field_sorted_unique  (main.py:1163)
  - hcatFingerprint expander    (main.py:1544)
  - hcatLMtoNT combinator dedupe (main.py:2995)

LC_ALL=C makes sort byte-collation only. Dedup correctness is
unaffected (byte equality is locale-independent), and hashcat doesn't
care about wordlist order.

Also adds an AST-level test that fails if any future `sort` Popen lacks
an env kwarg, so the locale fix can't silently regress.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 15:04:42 -04:00
..