gitea-mirror/capa - capa - Gitea: Git with a cup of tea

mirror of https://github.com/mandiant/capa.git synced 2026-07-28 22:50:59 -07:00

Author	SHA1	Message	Date
Capa Bot	4618822884	Sync capa-testfiles submodule	2026-05-13 17:50:02 +00:00
Willi BallenthinandWilli Ballenthin	61adf156ee	tests: xfail a few known Ghidra analysis failures	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	a1ff01bc44	fix: Windows path reference in main	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	a82f4aea88	bump submodules	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	9ba497f6f7	idalib: remove custom idalib loading	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	b5f81e30f0	tests: add negative substring feature test fixture	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	eb258c719f	tests: cleanup tests and fixtures	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	2604c91668	fix: lints	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	3e2c017dfd	tests: ida: better handle stale databases and concurrent access	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	018e5b45e5	tests: cleanup tests and fixtures	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	251a4e285f	tests: consolidate feature test fixtures and runners	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	9fd4f8dd74	tests: migrate to data-driven fixtures	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	65573944d7	rules: introduce helper to parse features from parts	2026-05-11 11:14:28 +02:00
Willi BallenthinandWilli Ballenthin	a28fcce72b	fix: linter tests needing placeholder rule sets to function	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	b505ba7621	fix: remove unused imports and un-suppress F401 closes #2996	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	8fca21f808	linter: validate dynamic example offsets closes #3058	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	7a8a0acaa9	fix: remove dead except ValueError clause in capa2sarif.py so JSONDecodeError is caught correctly json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError was shadowing the more specific handler, making it unreachable. Keep only the specific except json.JSONDecodeError handler.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	7d8714098c	fix: dedent bulk-process.py main() body so explicit argv is used The entire main() body was indented inside `if argv is None:`, causing main() to silently return None when called with an explicit argv list. Closes SURF-90.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	861f3b8619	fix: FeatureRegexRegistryControlSetMatchIncomplete checks all Regex features Dedent `return False` out of the `for` loop body so the method examines every Regex feature instead of short-circuiting after the first one.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	bfa09f817b	fix: guard MissingStaticScope and MissingDynamicScope against absent scopes dict When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None and "static"/"dynamic" not in None raises TypeError, crashing lint_rule. Add isinstance(scopes, dict) guard so both checks return False (no violation) when scopes is absent, letting MissingScopes report the real problem.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	c5ae9be3e1	fix: MissingExampleOffset lint reads scopes.static instead of obsolete scope key The check was reading rule.meta.get("scope") which no longer exists in the current schema (replaced by scopes.static/scopes.dynamic), causing the lint to never fire for function/basic-block rules missing example offsets.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	74010ba03f	fix: remove dead string literal in test_detect_duplicate_features The triple-quoted string at lines 230-239 was never assigned and contained an incomplete sentence ("The scripts"). Deleted entirely as the surrounding code is self-explanatory.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	f93e342e74	fix: remove duplicate Rule.from_yaml call in test_scope_instruction_description	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	ad538f7ac3	fix: remove unused imports from test_freeze_dynamic.py Remove capa.helpers, capa.features.basicblock (never referenced), and redundant bare capa.features.extractors.base_extractor (covered by the from-import on the next line). Closes SURF-78	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	cb1951dd90	fix: correct test_json_meta loop to iterate list of function dicts and use correct serialized address format for matched_basic_blocks assertion	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	f11c99d0e4	fix: remove unreachable StaticAnalysis assert in assert_meta and cover dynamic proto path Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that blocked dynamic ResultDocument from being validated, removes the incorrect direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd to test_doc_to_pb2 so the dynamic proto serialization path is exercised.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	8952151c97	fix: correct self-comparison sa.max == sa.max to sa.max == sb.max in test_proto RangeStatement.max was compared against itself, so a protobuf round-trip that corrupted the max field would pass undetected. Fix to compare against the protobuf counterpart sb.max.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	a6dd0faf9f	fix: use integer division in get_printable_len for UTF-16 LE operands `get_printable_len` returned a float for UTF-16 LE operands due to `/` instead of `//`, violating the `-> int` annotation and silently propagating a float into `_bb_has_stackstring`'s accumulator. Aligns with the IDA extractor equivalent. Closes SURF-58	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	69a1ba862c	fix: implement extract_function_loop in dnfile extractor The stub always raised NotImplementedError and was not registered in FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET samples. Detects backward branches (target offset < instruction offset) as loops, matching the approach used by other extractors.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	d32492d208	fix: remove extract_file_format from FILE_HANDLERS in five extractors Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in global_features during __init__ and also included extract_file_format in FILE_HANDLERS. This caused find_file_capabilities to emit the Format feature twice, inflating feature counts. Removing extract_file_format from FILE_HANDLERS in all five extractors ensures Format is emitted once via global_features only.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	e2c8ab4bff	fix: replace assert with guard for 2-operand ARM ADD/SUB instructions ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`. The previous code asserted exactly 3 operands before checking if operand[1] was a stack register, causing AssertionError on any 2-operand encoding. The fix converts the assert to a guard condition so 2-operand instructions fall through to the for-loop and are processed normally.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	723ee16ef7	fix: omit trailing ' -> ' suffix in syscall call names when no return value DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall objects that have no return_value attribute, because f" -> {''}" still produces the literal ' -> ' string. Conditionally build the suffix only when a non-empty return value exists.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	1f3a52eea7	fix: assign ConfigDict to model_config in ConciseModel so extra="ignore" is applied ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result. model_config was never set, so the extra="ignore" config was never in effect for any subclass. Assign the result to model_config as intended.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	d367622d0c	fix: replace assert with isinstance guard in get_callee for invalid MethodSpec tokens When resolve_dotnet_token returns an InvalidToken (e.g. malformed or out-of-range MethodSpec table/row index), the assert on line 51 raised AssertionError instead of gracefully returning None. Replaced the assert with the isinstance guard pattern already used elsewhere in the same file.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	d99ba7d909	fix: correct off-by-one in get_dotnet_table_row so row_index=1 is not rejected `get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first valid row, but the condition is equivalent to `row_index <= 1`, silently rejecting it and making the first row of every table unreachable. Changed to `if row_index <= 0`, which correctly rejects only the zero/null token and leaves all valid rows accessible. Added four unit tests against the real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard boundary: row_index=1 returns the first row, row_index=0 returns None, all row indices 1..N succeed, and an out-of-bounds index returns None.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	1b6c26fc35	fix: stop mutating call.api in cape thread.get_calls `get_calls` iterated `generate_symbols` and overwrote `call.api` with each generated symbol name, then yielded a `CallHandle` wrapping the same `call` object. Because the Pydantic model is shared by reference, every previously-yielded handle ended up with `api` equal to the last symbol generated in the final iteration. The correct pattern (used in `call.py:61`) is to leave the model untouched and let the call extractor expand symbol variants via `generate_symbols`. `get_calls` now yields exactly one `CallHandle` per call with the original `api` value preserved.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	d1038e51f3	fix: use instruction_indices in is_security_cookie for single-instruction basic blocks `is_security_cookie` computes the last address in a terminal basic block by iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2 protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0 as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`, raising `KeyError`. Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which already handles single-instruction ranges via `HasField("end_index")`.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	197a84d267	fix: guard get_operand_expressions against empty expression tree `_build_expression_tree` already returns `[]` for the Ghidra bug where an operand has no expressions (see https://github.com/NationalSecurityAgency/ghidra/issues/6817), but `get_operand_expressions` then called the recursive walker unconditionally with `tree_index=0`, which indexed into the empty list and raised `IndexError`. Add an early-return guard so callers receive `[]` instead.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	5d43fc8fe3	fix: add return after zero-offset yield in extract_insn_offset_features `extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`. Execution then fell through to the general `mask_immediate` path, which read `immediate` from the last-matched expression node (a register, not an integer). Since that field defaults to 0, the function emitted duplicate `Offset(0)` and `OperandOffset(0)` features for every such instruction. Fix: add `return` after the two yields in the zero-pattern branch. Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at 0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)` is emitted exactly once.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	06061311fb	fix: correct scale/displacement in get_operand_phrase_info 5-expression branch In the 5-expression branch of get_operand_phrase_info, two cases used expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT value) was intended: - Base + (Index * Scale): scale was expression3 ('' operator), should be expression4 (the numeric scale value) - (Index Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value) Tests added using mimikatz.exe_.ghidra.BinExport: - 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT - 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	47d418b7de	fix: use HasField to check call-graph edge vertex indices `_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`) to skip edges with absent fields, but this also silently drops any edge whose source or target is vertex index 0 — a valid vertex. Both fields are protobuf optional integers, so the correct absent-field check is `HasField()`, consistent with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file. In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller that were all being silently discarded.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	fdd571eaed	fix: close file handle in get_file_taste using a with statement `get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`, discarding the file object without explicitly closing it. CPython reference- counting closes it promptly in practice, but other implementations (PyPy, Jython) and CPython under GC pressure may defer closure. Use a `with` statement to guarantee the handle is released immediately after reading.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	eb81901d71	fix: correct capa/subscope-rule key in RuleMetadata.from_capa `RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and `Field(False, alias="capa/subscope")`, but the actual key set by `_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused `is_subscope_rule` to always be `False` in every `RuleMetadata` instance, making downstream filters in `render/utils.py`, `render/vverbose.py`, and `scripts/import-to-ida.py` ineffective (though subscope rules are already excluded from `ResultDocument` before reaching those callers).	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	1ef6298b45	fix: Scopes.from_dict uses cls instead of self `Scopes.from_dict` was decorated with `@classmethod` but named its first parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the return statement instead of `cls(...)`. This meant any subclass calling `SubScopes.from_dict(...)` would get a `Scopes` instance back rather than a `SubScopes` instance. Rename the parameter to `cls` and use it in the return statement so that subclasses receive the correct type.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	316aeaf8e5	fix: remove unreachable backports.functools_lru_cache fallback `functools.lru_cache` has been in the standard library since Python 3.2. The project requires Python >=3.10, so the `except ImportError` branch importing `backports.functools_lru_cache` can never execute. Remove the try/except block and keep only the direct stdlib import. Also remove `types-backports` from dev dependencies, `backports` from `[tool.deptry.known_first_party]`, and `types-backports` from the DEP002 ignore list in `pyproject.toml`.	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	d9402d8041	fix: add missing ELF branch in get_format_from_extension for .elf_ files EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension had branches for every other EXTENSIONS_* constant except ELF. Since .elf_ files are real test fixtures and a recognised input format, the fix is to add the missing elif branch (and import FORMAT_ELF) rather than delete the constant. Closes #3031	2026-05-08 17:58:07 +02:00
Willi BallenthinandWilli Ballenthin	b9f830619d	update submodules	2026-04-23 18:04:10 +03:00
Willi BallenthinandWilli Ballenthin	e745fa6aab	style: ruff format changed files	2026-04-23 18:04:10 +03:00
Willi BallenthinandWilli Ballenthin	aa9f09db89	fix: render_default always returns empty string Closes #3012	2026-04-23 18:04:10 +03:00
Willi BallenthinandWilli Ballenthin	a5082beed0	fix: remove unused gzip import in test_helpers.py	2026-04-23 18:04:10 +03:00

1 2 3 4 5 ...