Introduces data-driven snapshot tests that regenerate capa freeze files
for a curated set of samples in the tests/data submodule and compare the
bytes against committed fixtures under tests/fixtures/freezes/. Any
change that perturbs feature extraction surfaces as a test failure with
a feature-count delta and a truncated unified diff.
json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.
Closes SURF-90.
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).
Closes SURF-78
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.
Closes SURF-58
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.