json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.
Closes SURF-90.
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).
Closes SURF-78
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.
Closes SURF-58
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.
Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.
The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.
Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.
Fix: add `return` after the two yields in the zero-pattern branch.
Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:
- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)
Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
`_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`)
to skip edges with absent fields, but this also silently drops any edge whose
source or target is vertex index 0 — a valid vertex. Both fields are protobuf
optional integers, so the correct absent-field check is `HasField()`, consistent
with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file.
In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller
that were all being silently discarded.
`get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`,
discarding the file object without explicitly closing it. CPython reference-
counting closes it promptly in practice, but other implementations (PyPy,
Jython) and CPython under GC pressure may defer closure. Use a `with` statement
to guarantee the handle is released immediately after reading.
`RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and
`Field(False, alias="capa/subscope")`, but the actual key set by
`_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused
`is_subscope_rule` to always be `False` in every `RuleMetadata` instance,
making downstream filters in `render/utils.py`, `render/vverbose.py`, and
`scripts/import-to-ida.py` ineffective (though subscope rules are already
excluded from `ResultDocument` before reaching those callers).
`Scopes.from_dict` was decorated with `@classmethod` but named its first
parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the
return statement instead of `cls(...)`. This meant any subclass calling
`SubScopes.from_dict(...)` would get a `Scopes` instance back rather than
a `SubScopes` instance.
Rename the parameter to `cls` and use it in the return statement so
that subclasses receive the correct type.
`functools.lru_cache` has been in the standard library since Python 3.2.
The project requires Python >=3.10, so the `except ImportError` branch
importing `backports.functools_lru_cache` can never execute.
Remove the try/except block and keep only the direct stdlib import.
Also remove `types-backports` from dev dependencies, `backports` from
`[tool.deptry.known_first_party]`, and `types-backports` from the
DEP002 ignore list in `pyproject.toml`.
EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension
had branches for every other EXTENSIONS_* constant except ELF. Since .elf_
files are real test fixtures and a recognised input format, the fix is to add
the missing elif branch (and import FORMAT_ELF) rather than delete the
constant.
Closes#3031