* rules: handle empty or invalid YAML documents in Rule.from_yaml
Empty or whitespace-only .yml files caused a cryptic TypeError in
Rule.from_dict (NoneType not subscriptable) when yaml.load returned None.
This made lint.py abort with a stack trace instead of a clear message.
Add an early guard in Rule.from_yaml that raises InvalidRule with a
descriptive message when the parsed document is None or structurally
invalid. get_rules() now logs a warning and skips such files so that
scripts/lint.py completes cleanly even when placeholder .yml files
exist in the rules/ or rules/nursery/ directories.
Fixes#2900.
* changelog: add entry for #2900 empty YAML handling
* rules: fix exception check and add get_rules skip test
- Use e.args[0] instead of str(e) to check the error message.
InvalidRule.__str__ prepends "invalid rule: " so str(e) never
matched the bare message, causing every InvalidRule to be re-raised.
- Add test_get_rules_skips_empty_yaml to cover the get_rules skip path,
confirming that an empty file is warned-and-skipped while a valid
sibling rule is still loaded.
* fix: correct isort import ordering in tests/test_rules.py
Move capa.engine import before capa.rules.cache to satisfy
isort --length-sort ordering.
* loader: skip PE files with unrealistically large section virtual sizes
Some malformed PE samples declare section virtual sizes orders of
magnitude larger than the file itself (e.g. a ~400 KB file with a
900 MB section). vivisect attempts to map these regions, causing
unbounded CPU and memory consumption (see #1989).
Add _is_probably_corrupt_pe() which uses pefile (fast_load=True) to
check whether any section's Misc_VirtualSize exceeds
max(file_size * 128, 512 MB). If the check fires, get_workspace()
raises CorruptFile before vivisect is invoked, keeping the existing
exception handling path consistent.
Thresholds are intentionally conservative to avoid false positives on
large but legitimate binaries. When pefile is unavailable the helper
returns False and behaviour is unchanged.
Fixes#1989.
* changelog: add entry for #1989 corrupt PE large sections
* loader: apply Gemini review improvements
- Extend corrupt-PE check to FORMAT_AUTO so malformed PE files
cannot bypass the guard when format is auto-detected (the helper
returns False for non-PE files so there is no false-positive risk).
- Replace magic literals 128 and 512*1024*1024 with named constants
_VSIZE_FILE_RATIO and _MAX_REASONABLE_VSIZE for clarity.
- Remove redundant int() cast around getattr(Misc_VirtualSize); keep
the `or 0` guard for corrupt files where pefile may return None.
- Extend test to cover FORMAT_AUTO path alongside FORMAT_PE.
* tests: remove mock-only corrupt PE test per maintainer request
williballenthin noted the test doesn't add real value since it only
exercises the mock, not the actual heuristic. Removing it per feedback.
* fix: resolve flake8 NIC002 implicit string concat and add missing test
Fix the implicit string concatenation across multiple lines that caused
code_style CI to fail. Also add the test_corrupt_pe_with_unrealistic_section_size_short_circuits
test that was described in the PR body but not committed.
Catch envi.exc.SegmentationViolation raised by vivisect when processing
malformed ELF files with invalid relocations and convert it to a
CorruptFile exception with a descriptive message.
Closes#2794
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>