Commit Graph

1081 Commits

Author SHA1 Message Date
Ange Albertini 7962d97b9a Better test 2026-05-28 14:00:41 +00:00
Ange Albertini 61c24ebcbb RelativeVirtualAddress deprecation warning 2026-05-28 13:09:53 +00:00
Capa Bot 54da63ef2b Sync capa-testfiles submodule 2026-05-20 18:37:49 +00:00
Capa Bot 7fea0cebcb Sync capa-testfiles submodule 2026-05-20 10:08:27 +00:00
Capa Bot 0f1e0a28f5 Sync capa-testfiles submodule 2026-05-20 09:13:46 +00:00
Capa Bot 49bf8315cd Sync capa-testfiles submodule 2026-05-20 08:23:02 +00:00
Capa Bot 8572bd63e9 Sync capa-testfiles submodule 2026-05-20 08:10:43 +00:00
Capa Bot d9014d055e Sync capa-testfiles submodule 2026-05-20 07:49:30 +00:00
Mike Hunhoff a98fd8240e fix duplicate rule candidate evaluation in optimized matching engine (#3080)
* fix duplicate rule candidate evaluation in optimized matching engine

* update CHANGELOG

* update comments
2026-05-18 17:40:55 -06:00
Mike Hunhoff db0e1536ce incorrect bytes() constructor usage in buf_filled_with (#3077) 2026-05-16 13:14:24 +02:00
Capa Bot 4618822884 Sync capa-testfiles submodule 2026-05-13 17:50:02 +00:00
Willi Ballenthin 61adf156ee tests: xfail a few known Ghidra analysis failures 2026-05-11 11:14:28 +02:00
Willi Ballenthin a1ff01bc44 fix: Windows path reference in main 2026-05-11 11:14:28 +02:00
Willi Ballenthin a82f4aea88 bump submodules 2026-05-11 11:14:28 +02:00
Willi Ballenthin 9ba497f6f7 idalib: remove custom idalib loading 2026-05-11 11:14:28 +02:00
Willi Ballenthin b5f81e30f0 tests: add negative substring feature test fixture 2026-05-11 11:14:28 +02:00
Willi Ballenthin eb258c719f tests: cleanup tests and fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 2604c91668 fix: lints 2026-05-11 11:14:28 +02:00
Willi Ballenthin 3e2c017dfd tests: ida: better handle stale databases and concurrent access 2026-05-11 11:14:28 +02:00
Willi Ballenthin 018e5b45e5 tests: cleanup tests and fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 251a4e285f tests: consolidate feature test fixtures and runners 2026-05-11 11:14:28 +02:00
Willi Ballenthin 9fd4f8dd74 tests: migrate to data-driven fixtures 2026-05-11 11:14:28 +02:00
Willi Ballenthin 65573944d7 rules: introduce helper to parse features from parts 2026-05-11 11:14:28 +02:00
Willi Ballenthin a28fcce72b fix: linter tests needing placeholder rule sets to function 2026-05-08 17:58:07 +02:00
Willi Ballenthin b505ba7621 fix: remove unused imports and un-suppress F401
closes #2996
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8fca21f808 linter: validate dynamic example offsets
closes #3058
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7a8a0acaa9 fix: remove dead except ValueError clause in capa2sarif.py so JSONDecodeError is caught correctly
json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7d8714098c fix: dedent bulk-process.py main() body so explicit argv is used
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.

Closes SURF-90.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 861f3b8619 fix: FeatureRegexRegistryControlSetMatchIncomplete checks all Regex features
Dedent `return False` out of the `for` loop body so the method examines
every Regex feature instead of short-circuiting after the first one.
2026-05-08 17:58:07 +02:00
Willi Ballenthin bfa09f817b fix: guard MissingStaticScope and MissingDynamicScope against absent scopes dict
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
2026-05-08 17:58:07 +02:00
Willi Ballenthin c5ae9be3e1 fix: MissingExampleOffset lint reads scopes.static instead of obsolete scope key
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 74010ba03f fix: remove dead string literal in test_detect_duplicate_features
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
2026-05-08 17:58:07 +02:00
Willi Ballenthin f93e342e74 fix: remove duplicate Rule.from_yaml call in test_scope_instruction_description 2026-05-08 17:58:07 +02:00
Willi Ballenthin ad538f7ac3 fix: remove unused imports from test_freeze_dynamic.py
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).

Closes SURF-78
2026-05-08 17:58:07 +02:00
Willi Ballenthin cb1951dd90 fix: correct test_json_meta loop to iterate list of function dicts and use correct serialized address format for matched_basic_blocks assertion 2026-05-08 17:58:07 +02:00
Willi Ballenthin f11c99d0e4 fix: remove unreachable StaticAnalysis assert in assert_meta and cover dynamic proto path
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8952151c97 fix: correct self-comparison sa.max == sa.max to sa.max == sb.max in test_proto
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a6dd0faf9f fix: use integer division in get_printable_len for UTF-16 LE operands
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.

Closes SURF-58
2026-05-08 17:58:07 +02:00
Willi Ballenthin 69a1ba862c fix: implement extract_function_loop in dnfile extractor
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d32492d208 fix: remove extract_file_format from FILE_HANDLERS in five extractors
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
2026-05-08 17:58:07 +02:00
Willi Ballenthin e2c8ab4bff fix: replace assert with guard for 2-operand ARM ADD/SUB instructions
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 723ee16ef7 fix: omit trailing ' -> ' suffix in syscall call names when no return value
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1f3a52eea7 fix: assign ConfigDict to model_config in ConciseModel so extra="ignore" is applied
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d367622d0c fix: replace assert with isinstance guard in get_callee for invalid MethodSpec tokens
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d99ba7d909 fix: correct off-by-one in get_dotnet_table_row so row_index=1 is not rejected
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.

Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1b6c26fc35 fix: stop mutating call.api in cape thread.get_calls
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.

The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d1038e51f3 fix: use instruction_indices in is_security_cookie for single-instruction basic blocks
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.

Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 197a84d267 fix: guard get_operand_expressions against empty expression tree
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 5d43fc8fe3 fix: add return after zero-offset yield in extract_insn_offset_features
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.

Fix: add `return` after the two yields in the zero-pattern branch.

Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 06061311fb fix: correct scale/displacement in get_operand_phrase_info 5-expression branch
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:

- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)

Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
2026-05-08 17:58:07 +02:00