Commit Graph

1070 Commits

Author SHA1 Message Date
Willi Ballenthin 61adf156ee tests: xfail a few known Ghidra analysis failures 2026-05-11 11:14:28 +02:00
Willi Ballenthin a1ff01bc44 fix: Windows path reference in main 2026-05-11 11:14:28 +02:00
Willi Ballenthin a82f4aea88 bump submodules 2026-05-11 11:14:28 +02:00
Willi Ballenthin 9ba497f6f7 idalib: remove custom idalib loading 2026-05-11 11:14:28 +02:00
Willi Ballenthin b5f81e30f0 tests: add negative substring feature test fixture 2026-05-11 11:14:28 +02:00
Willi Ballenthin eb258c719f tests: cleanup tests and fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 2604c91668 fix: lints 2026-05-11 11:14:28 +02:00
Willi Ballenthin 3e2c017dfd tests: ida: better handle stale databases and concurrent access 2026-05-11 11:14:28 +02:00
Willi Ballenthin 018e5b45e5 tests: cleanup tests and fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 251a4e285f tests: consolidate feature test fixtures and runners 2026-05-11 11:14:28 +02:00
Willi Ballenthin 9fd4f8dd74 tests: migrate to data-driven fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 65573944d7 rules: introduce helper to parse features from parts 2026-05-11 11:14:28 +02:00
Willi Ballenthin a28fcce72b fix: linter tests needing placeholder rule sets to function 2026-05-08 17:58:07 +02:00
Willi Ballenthin b505ba7621 fix: remove unused imports and un-suppress F401
closes #2996
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8fca21f808 linter: validate dynamic example offsets
closes #3058
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7a8a0acaa9 fix: remove dead except ValueError clause in capa2sarif.py so JSONDecodeError is caught correctly
json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7d8714098c fix: dedent bulk-process.py main() body so explicit argv is used
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.

Closes SURF-90.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 861f3b8619 fix: FeatureRegexRegistryControlSetMatchIncomplete checks all Regex features
Dedent `return False` out of the `for` loop body so the method examines
every Regex feature instead of short-circuiting after the first one.
2026-05-08 17:58:07 +02:00
Willi Ballenthin bfa09f817b fix: guard MissingStaticScope and MissingDynamicScope against absent scopes dict
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
2026-05-08 17:58:07 +02:00
Willi Ballenthin c5ae9be3e1 fix: MissingExampleOffset lint reads scopes.static instead of obsolete scope key
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 74010ba03f fix: remove dead string literal in test_detect_duplicate_features
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
2026-05-08 17:58:07 +02:00
Willi Ballenthin f93e342e74 fix: remove duplicate Rule.from_yaml call in test_scope_instruction_description 2026-05-08 17:58:07 +02:00
Willi Ballenthin ad538f7ac3 fix: remove unused imports from test_freeze_dynamic.py
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).

Closes SURF-78
2026-05-08 17:58:07 +02:00
Willi Ballenthin cb1951dd90 fix: correct test_json_meta loop to iterate list of function dicts and use correct serialized address format for matched_basic_blocks assertion 2026-05-08 17:58:07 +02:00
Willi Ballenthin f11c99d0e4 fix: remove unreachable StaticAnalysis assert in assert_meta and cover dynamic proto path
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8952151c97 fix: correct self-comparison sa.max == sa.max to sa.max == sb.max in test_proto
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a6dd0faf9f fix: use integer division in get_printable_len for UTF-16 LE operands
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.

Closes SURF-58
2026-05-08 17:58:07 +02:00
Willi Ballenthin 69a1ba862c fix: implement extract_function_loop in dnfile extractor
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d32492d208 fix: remove extract_file_format from FILE_HANDLERS in five extractors
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
2026-05-08 17:58:07 +02:00
Willi Ballenthin e2c8ab4bff fix: replace assert with guard for 2-operand ARM ADD/SUB instructions
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 723ee16ef7 fix: omit trailing ' -> ' suffix in syscall call names when no return value
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1f3a52eea7 fix: assign ConfigDict to model_config in ConciseModel so extra="ignore" is applied
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d367622d0c fix: replace assert with isinstance guard in get_callee for invalid MethodSpec tokens
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d99ba7d909 fix: correct off-by-one in get_dotnet_table_row so row_index=1 is not rejected
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.

Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1b6c26fc35 fix: stop mutating call.api in cape thread.get_calls
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.

The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d1038e51f3 fix: use instruction_indices in is_security_cookie for single-instruction basic blocks
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.

Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 197a84d267 fix: guard get_operand_expressions against empty expression tree
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 5d43fc8fe3 fix: add return after zero-offset yield in extract_insn_offset_features
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.

Fix: add `return` after the two yields in the zero-pattern branch.

Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 06061311fb fix: correct scale/displacement in get_operand_phrase_info 5-expression branch
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:

- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)

Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
2026-05-08 17:58:07 +02:00
Willi Ballenthin 47d418b7de fix: use HasField to check call-graph edge vertex indices
`_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`)
to skip edges with absent fields, but this also silently drops any edge whose
source or target is vertex index 0 — a valid vertex. Both fields are protobuf
optional integers, so the correct absent-field check is `HasField()`, consistent
with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file.

In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller
that were all being silently discarded.
2026-05-08 17:58:07 +02:00
Willi Ballenthin fdd571eaed fix: close file handle in get_file_taste using a with statement
`get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`,
discarding the file object without explicitly closing it. CPython reference-
counting closes it promptly in practice, but other implementations (PyPy,
Jython) and CPython under GC pressure may defer closure. Use a `with` statement
to guarantee the handle is released immediately after reading.
2026-05-08 17:58:07 +02:00
Willi Ballenthin eb81901d71 fix: correct capa/subscope-rule key in RuleMetadata.from_capa
`RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and
`Field(False, alias="capa/subscope")`, but the actual key set by
`_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused
`is_subscope_rule` to always be `False` in every `RuleMetadata` instance,
making downstream filters in `render/utils.py`, `render/vverbose.py`, and
`scripts/import-to-ida.py` ineffective (though subscope rules are already
excluded from `ResultDocument` before reaching those callers).
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1ef6298b45 fix: Scopes.from_dict uses cls instead of self
`Scopes.from_dict` was decorated with `@classmethod` but named its first
parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the
return statement instead of `cls(...)`. This meant any subclass calling
`SubScopes.from_dict(...)` would get a `Scopes` instance back rather than
a `SubScopes` instance.

Rename the parameter to `cls` and use it in the return statement so
that subclasses receive the correct type.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 316aeaf8e5 fix: remove unreachable backports.functools_lru_cache fallback
`functools.lru_cache` has been in the standard library since Python 3.2.
The project requires Python >=3.10, so the `except ImportError` branch
importing `backports.functools_lru_cache` can never execute.

Remove the try/except block and keep only the direct stdlib import.
Also remove `types-backports` from dev dependencies, `backports` from
`[tool.deptry.known_first_party]`, and `types-backports` from the
DEP002 ignore list in `pyproject.toml`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d9402d8041 fix: add missing ELF branch in get_format_from_extension for .elf_ files
EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension
had branches for every other EXTENSIONS_* constant except ELF. Since .elf_
files are real test fixtures and a recognised input format, the fix is to add
the missing elif branch (and import FORMAT_ELF) rather than delete the
constant.

Closes #3031
2026-05-08 17:58:07 +02:00
Willi Ballenthin b9f830619d update submodules 2026-04-23 18:04:10 +03:00
Willi Ballenthin e745fa6aab style: ruff format changed files 2026-04-23 18:04:10 +03:00
Willi Ballenthin aa9f09db89 fix: render_default always returns empty string
Closes #3012
2026-04-23 18:04:10 +03:00
Willi Ballenthin a5082beed0 fix: remove unused gzip import in test_helpers.py 2026-04-23 18:04:10 +03:00
Willi Ballenthin f6f3380fd3 fix: EXTENSIONS_DYNAMIC has inconsistent leading dots
Closes #3028
2026-04-23 18:04:10 +03:00