- test_binexport_accessors.py: type: ignore on .expression accesses guarded by test assertions
- test_freeze_dynamic.py: assert isinstance DynamicFeatureExtractor before compare_extractors
- test_binja_features.py: type: ignore on binaryninja guarded by skipif decorator
- elf.py: fix bug where vdso_guess except handler set symtab_guess=None
- result_document.py: add assert_never after StaticAnalysis/DynamicAnalysis
- binexport2/helpers.py: guard empty operand_expressions with early return
- tests/fixtures.py: restructure kernel32-64.dll_ workaround to single if/else
- remove bytes_rules from _RuleFeatureIndex; bytes_prefix_index is the
only structure needed for candidate selection
- build bytes_prefix_index directly in _index_rules_by_feature() instead
of building bytes_rules then converting, removing one full pass
- add if -1 in bytes_prefix_index guard to avoid temporary object
creation for the short-pattern fallback (almost never taken)
- remove assert isinstance(feature.value, bytes) checks in _match();
add Bytes.value: bytes class-level annotation so mypy narrows the
type without the runtime check
- remove cache structure compatibility block from cache.py per reviewer
request to handle in a separate PR
- update test assertions from bytes_rules to bytes_prefix_index
- Change _match() guard from bytes_rules to bytes_prefix_index
so the guard references the field actually used for candidate selection.
- Update stale comment to describe the prefix-bucket strategy.
- Clarify bytes_rules dataclass comment (retained for logging only).
- Add test_bytes_prefix_index_mixed_short_and_long_patterns covering
rules with both short (<4B) and long (>=4B) patterns exercised together.
Instead of iterating all extracted Bytes features for every bytes-based rule,
build a prefix index keyed by fixed bucket sizes (4, 8, 16, 32, 64, 128, 256)
once per scope evaluation. Each bytes pattern is looked up in the largest
bucket that fits its length, then only candidates sharing that prefix are
compared, replacing the previous O(n) linear scan with an O(1) hash lookup.
Patterns shorter than the minimum bucket still fall back to the full scan.
Adds a test to verify correctness for exact match, startswith match, mismatch,
and short-bytes cases.
Closes: https://github.com/mandiant/capa/issues/2128
* rules: handle empty or invalid YAML documents in Rule.from_yaml
Empty or whitespace-only .yml files caused a cryptic TypeError in
Rule.from_dict (NoneType not subscriptable) when yaml.load returned None.
This made lint.py abort with a stack trace instead of a clear message.
Add an early guard in Rule.from_yaml that raises InvalidRule with a
descriptive message when the parsed document is None or structurally
invalid. get_rules() now logs a warning and skips such files so that
scripts/lint.py completes cleanly even when placeholder .yml files
exist in the rules/ or rules/nursery/ directories.
Fixes#2900.
* changelog: add entry for #2900 empty YAML handling
* rules: fix exception check and add get_rules skip test
- Use e.args[0] instead of str(e) to check the error message.
InvalidRule.__str__ prepends "invalid rule: " so str(e) never
matched the bare message, causing every InvalidRule to be re-raised.
- Add test_get_rules_skips_empty_yaml to cover the get_rules skip path,
confirming that an empty file is warned-and-skipped while a valid
sibling rule is still loaded.
* fix: correct isort import ordering in tests/test_rules.py
Move capa.engine import before capa.rules.cache to satisfy
isort --length-sort ordering.
* loader: skip PE files with unrealistically large section virtual sizes
Some malformed PE samples declare section virtual sizes orders of
magnitude larger than the file itself (e.g. a ~400 KB file with a
900 MB section). vivisect attempts to map these regions, causing
unbounded CPU and memory consumption (see #1989).
Add _is_probably_corrupt_pe() which uses pefile (fast_load=True) to
check whether any section's Misc_VirtualSize exceeds
max(file_size * 128, 512 MB). If the check fires, get_workspace()
raises CorruptFile before vivisect is invoked, keeping the existing
exception handling path consistent.
Thresholds are intentionally conservative to avoid false positives on
large but legitimate binaries. When pefile is unavailable the helper
returns False and behaviour is unchanged.
Fixes#1989.
* changelog: add entry for #1989 corrupt PE large sections
* loader: apply Gemini review improvements
- Extend corrupt-PE check to FORMAT_AUTO so malformed PE files
cannot bypass the guard when format is auto-detected (the helper
returns False for non-PE files so there is no false-positive risk).
- Replace magic literals 128 and 512*1024*1024 with named constants
_VSIZE_FILE_RATIO and _MAX_REASONABLE_VSIZE for clarity.
- Remove redundant int() cast around getattr(Misc_VirtualSize); keep
the `or 0` guard for corrupt files where pefile may return None.
- Extend test to cover FORMAT_AUTO path alongside FORMAT_PE.
* tests: remove mock-only corrupt PE test per maintainer request
williballenthin noted the test doesn't add real value since it only
exercises the mock, not the actual heuristic. Removing it per feedback.
* fix: resolve flake8 NIC002 implicit string concat and add missing test
Fix the implicit string concatenation across multiple lines that caused
code_style CI to fail. Also add the test_corrupt_pe_with_unrealistic_section_size_short_circuits
test that was described in the PR body but not committed.
Catch envi.exc.SegmentationViolation raised by vivisect when processing
malformed ELF files with invalid relocations and convert it to a
CorruptFile exception with a descriptive message.
Closes#2794
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>