Commit Graph

6123 Commits

Author SHA1 Message Date
Willi Ballenthin f5383da728 fix: add missing capa.features.extractors.elf import to ghidra and ida helpers
Without this import, any ELF analysis via the Ghidra or IDA plugin raises
AttributeError: module 'capa.features.extractors' has no attribute 'elf'.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7406e75f96 fix: remove dead view_tab_rulegen assignment from CapaExplorerForm
self.view_tab_rulegen = None was assigned but never read or reassigned anywhere in the codebase. All other attributes in the same block are type-annotated declarations; this one was an actual assignment with no purpose.
2026-05-08 17:58:07 +02:00
Willi Ballenthin ce8cecc059 fix: remove dead reset_query method from CapaExplorerSearchProxyModel
The method delegated to set_query("") but had zero callers; existing
call sites in form.py already call set_query directly.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 3538f8b85f fix: remove unused imports of capa.rules and capa.engine from view.py
Neither module was referenced anywhere in the file; removing them reduces
noise and eliminates unnecessary module load overhead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin ba57a98194 fix: remove unused imports from capa/ida/plugin/form.py
capa.main, capa.render.json, and capa.features.extractors.ida.extractor
were imported but never referenced in the file, adding unnecessary
startup cost.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 42ed5a4086 fix: remove dead trim_function_name from form.py that was never called 2026-05-08 17:58:07 +02:00
Willi Ballenthin f8429009e5 fix: replace get_file_md5/sha256 with version-aware shims in IDA helpers
Remove get_file_md5 and get_file_sha256 (which used different underlying
IDA APIs and duplicated normalization logic) and replace all call sites with
the existing retrieve_input_file_md5/sha256 shims that already handle
IDA <9 vs >=9 return-type differences consistently.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 2eb2cb2e49 fix: rename dragEventEnter to dragEnterEvent in CapaExplorerRulegenEditor
Qt dispatches drag-enter events to dragEnterEvent; the misspelled method
dragEventEnter was dead code and its super() call would raise AttributeError
if ever reached.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1f6b9082a4 fix: guard against None in lessThan else-branch to prevent AttributeError when sorting empty cells 2026-05-08 17:58:07 +02:00
Willi Ballenthin 7f58fa097a fix: add explicit import capa.loader in ida plugin form.py
capa.loader.compute_layout was called without capa.loader being explicitly
imported; it worked only via capa.main's transitive import. Adding the
explicit import prevents a future AttributeError if import order changes.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 412e6ad725 fix: initialize f=None before try in load_capa_function_results to prevent UnboundLocalError
If idaapi.get_func() raises before assignment, the except handler previously
referenced the unbound variable f, causing a secondary UnboundLocalError that
masked the original exception. Also handles the case where get_func() returns
None by falling back to idaapi.get_screen_ea() in the error log.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a18595bf89 fix: handle NOT CompoundStatement in render_capa_doc_statement_node so NOT rules render children in IDA plugin tree view
Previously, the elif for CompoundStatement+NOT was unreachable (the outer
if already matched all CompoundStatement), causing NOT statements to return
None and their children to be orphaned/dropped from the tree.
2026-05-08 17:58:07 +02:00
Willi Ballenthin da9ccfaef3 fix: use next(iter(addrs)) to avoid mutating the feature cache in parse_features_for_tree
addrs.pop() removed the element from the cached set, so after the first
render the cache entry was empty and subsequent renders showed no address.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a6dd0faf9f fix: use integer division in get_printable_len for UTF-16 LE operands
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.

Closes SURF-58
2026-05-08 17:58:07 +02:00
Willi Ballenthin 14a1d9981f fix: break thunk chain loop after resolving import to avoid duplicate API features 2026-05-08 17:58:07 +02:00
Willi Ballenthin 27d7741991 fix: pass insn instead of oper to getOperValue/getOperAddr in viv insn extractor
Four call sites in capa/features/extractors/viv/insn.py passed `oper`
(the operand object) as the first argument to getOperValue/getOperAddr,
where the API expects `insn` (the enclosing opcode). Silent today because
i386ImmOper ignores the argument, but would produce wrong values for
Amd64RipRelOper which uses op.va + op.size + self.imm.

Closes SURF-56
2026-05-08 17:58:07 +02:00
Willi Ballenthin 69a1ba862c fix: implement extract_function_loop in dnfile extractor
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
2026-05-08 17:58:07 +02:00
Willi Ballenthin f9df8f0a5c fix: remove dead find_process function and helpers.py from cape extractor
helpers.py contained only find_process, which was never called anywhere in the codebase. Its signature used dict-style field access while the rest of the cape extractor migrated to Pydantic models, so calling it today would raise a TypeError.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 74785d74fb fix: remove dead interface_extract_* stub functions from viv extractors
Three stub functions (interface_extract_basic_block_XXX, interface_extract_function_XXX,
interface_extract_instruction_XXX) each raised NotImplementedError unconditionally and
were never called or referenced anywhere in the codebase. They were leftover interface
documentation from an earlier design; the handler-list type annotations already document
the expected signatures.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 9f11491133 fix: remove unused import of capa.features.extractors.strings from binexport2 intel insn.py
The module was imported on line 18 but never referenced anywhere in the file.
It has no side effects, so the import was purely dead code adding noise.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d32492d208 fix: remove extract_file_format from FILE_HANDLERS in five extractors
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
2026-05-08 17:58:07 +02:00
Willi Ballenthin e2c8ab4bff fix: replace assert with guard for 2-operand ARM ADD/SUB instructions
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 723ee16ef7 fix: omit trailing ' -> ' suffix in syscall call names when no return value
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 52e8fdfc92 fix: use AbsoluteVirtualAddress for string addresses in Ghidra and IDA file extractors
block.getStart().getOffset() and seg.start_ea both return virtual addresses,
not file offsets. Wrapping them in FileOffsetAddress was semantically wrong for
PE/ELF binaries where VA != file offset. Switch to AbsoluteVirtualAddress to
match what the value actually represents.
2026-05-08 17:58:07 +02:00
Willi Ballenthin b348867e55 fix: use .value.value for LLIL_CONST call destinations in binja insn.py
In extract_function_calls_from, the LLIL_CONST branches passed a RegisterValue
object to AbsoluteVirtualAddress instead of an int. Change dest.value to
dest.value.value and indirect_src.value to indirect_src.value.value, matching
the pattern used everywhere else in the file for LLIL_CONST and LLIL_CONST_PTR.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1bf6ae9149 fix: remove duplicate getPrevLocation call and dead loc variable
get_previous_instructions called vw.getPrevLocation twice with identical
arguments; the first result was assigned to loc and used only as a None
gate, but the inner guard already covered that case. Collapsed the two
nested guards into one, removing the redundant call and dead variable.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 56fcdd32ed fix: unpack getByteDef offset to correctly check ENDBRANCH at target address
getByteDef returns (offset, segment_bytes); the old code indexed [1] to get
segment_bytes and called startswith() on the whole buffer, which checked whether
the segment itself begins with ENDBRANCH rather than the target address.
Unpacking both values and slicing _buf[_offset:] fixes the check.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 4b99c506fa fix: correct inverted loop structure in extract_function_loop
The outer while loop over dests and inner for loop over s_addrs were
swapped, causing s_addrs to be exhausted after the first iteration and
dests.next() to be called multiple times per destination. Fix uses the
block's first start address as a fixed source and iterates dests in the
inner while loop, matching the IDA and Binja extractor pattern.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 885b2c88b8 fix: initialize addr to None in Ghidra import extractors to prevent UnboundLocalError
In both extract_file_import_names (file.py) and get_file_imports (helpers.py), addr was
assigned only inside a conditional branch. If no data reference existed for an external
function, addr was unbound and the subsequent AbsoluteVirtualAddress(addr) or
import_dict.setdefault(addr, ...) raised UnboundLocalError. Fix initializes addr to None
before the inner loop, breaks on the first data reference found, and skips the function
with continue if no data reference was found.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1f3a52eea7 fix: assign ConfigDict to model_config in ConciseModel so extra="ignore" is applied
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d367622d0c fix: replace assert with isinstance guard in get_callee for invalid MethodSpec tokens
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d99ba7d909 fix: correct off-by-one in get_dotnet_table_row so row_index=1 is not rejected
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.

Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
2026-05-08 17:58:07 +02:00
Willi Ballenthin c46eabee24 fix: correct wrong dict key in VMRay _compute_monitor_threads assertion
In `_compute_monitor_threads`, the uniqueness assertion indexed
`monitor_threads_by_monitor_process` by `thread_id` instead of
`process_id`. Because the dict is a `defaultdict(list)`, each lookup on
a novel thread ID creates a fresh empty list, making the assertion
vacuously true. Duplicate thread IDs within a process are never caught.

Line 242 immediately below uses the correct key `process_id` when
appending, so the data structure is populated correctly; only the guard
was broken.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 3d1581503b fix: add explicit assert_never import in cape extractor.py
`format_call` referenced `capa.helpers.assert_never` without importing
`capa.helpers`, causing a NameError at runtime whenever a call argument
had an unexpected type. The module-qualified reference relied on an
implicit import-order dependency from another module having already
imported `capa.helpers`. Replace with an explicit `from capa.helpers
import assert_never`, matching the pattern used in the sibling
`call.py`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1b6c26fc35 fix: stop mutating call.api in cape thread.get_calls
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.

The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d1038e51f3 fix: use instruction_indices in is_security_cookie for single-instruction basic blocks
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.

Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 197a84d267 fix: guard get_operand_expressions against empty expression tree
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 5d43fc8fe3 fix: add return after zero-offset yield in extract_insn_offset_features
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.

Fix: add `return` after the two yields in the zero-pattern branch.

Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 218bee954d fix: format ValueError message in binexport2 extractor
`ValueError` takes a single message string. Passing a second argument stores both as a tuple in `args` without any string formatting, so the feature value never appears in the error message. Use an f-string so the feature value is interpolated into the message.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 06061311fb fix: correct scale/displacement in get_operand_phrase_info 5-expression branch
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:

- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)

Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
2026-05-08 17:58:07 +02:00
Willi Ballenthin 47d418b7de fix: use HasField to check call-graph edge vertex indices
`_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`)
to skip edges with absent fields, but this also silently drops any edge whose
source or target is vertex index 0 — a valid vertex. Both fields are protobuf
optional integers, so the correct absent-field check is `HasField()`, consistent
with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file.

In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller
that were all being silently discarded.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1820eb09b9 fix: change bare if to elif in get_backend_from_cli DRAKVUF branch
`get_backend_from_cli` had a bare `if` at the FORMAT_DRAKVUF branch where
`elif` was expected. After `if input_format == FORMAT_CAPE: return BACKEND_CAPE`,
the DRAKVUF branch opened a new `if` rather than continuing the chain with
`elif`. The remaining branches used `elif`/`else` correctly. There was no
functional bug because the FORMAT_CAPE branch returns early, but the break
in the chain was misleading and looked like a merge artifact.
2026-05-08 17:58:07 +02:00
Willi Ballenthin fdd571eaed fix: close file handle in get_file_taste using a with statement
`get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`,
discarding the file object without explicitly closing it. CPython reference-
counting closes it promptly in practice, but other implementations (PyPy,
Jython) and CPython under GC pressure may defer closure. Use a `with` statement
to guarantee the handle is released immediately after reading.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1a7d7ca5d6 fix: correct off-by-one in call_count debug log for dynamic analysis
`enumerate` is 0-based, so after the loop `call_count` held `n - 1` for
`n` calls processed. The debug log at the end of `find_thread_capabilities`
therefore reported one fewer event than was actually analyzed. Replace
`enumerate` with an explicit `call_count += 1` counter so the log is
accurate.
2026-05-08 17:58:07 +02:00
Willi Ballenthin eb81901d71 fix: correct capa/subscope-rule key in RuleMetadata.from_capa
`RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and
`Field(False, alias="capa/subscope")`, but the actual key set by
`_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused
`is_subscope_rule` to always be `False` in every `RuleMetadata` instance,
making downstream filters in `render/utils.py`, `render/vverbose.py`, and
`scripts/import-to-ida.py` ineffective (though subscope rules are already
excluded from `ResultDocument` before reaching those callers).
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1ef6298b45 fix: Scopes.from_dict uses cls instead of self
`Scopes.from_dict` was decorated with `@classmethod` but named its first
parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the
return statement instead of `cls(...)`. This meant any subclass calling
`SubScopes.from_dict(...)` would get a `Scopes` instance back rather than
a `SubScopes` instance.

Rename the parameter to `cls` and use it in the return statement so
that subclasses receive the correct type.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 316aeaf8e5 fix: remove unreachable backports.functools_lru_cache fallback
`functools.lru_cache` has been in the standard library since Python 3.2.
The project requires Python >=3.10, so the `except ImportError` branch
importing `backports.functools_lru_cache` can never execute.

Remove the try/except block and keep only the direct stdlib import.
Also remove `types-backports` from dev dependencies, `backports` from
`[tool.deptry.known_first_party]`, and `types-backports` from the
DEP002 ignore list in `pyproject.toml`.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 16fb277980 fix: remove Python 2-only Result.__nonzero__
`Result.__nonzero__` is the Python 2 boolean hook; Python 3 calls
`__bool__`, which is already defined immediately above it.
`__nonzero__` is never invoked at runtime in Python 3 and adds noise
that misleads readers into thinking it serves a purpose.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d9402d8041 fix: add missing ELF branch in get_format_from_extension for .elf_ files
EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension
had branches for every other EXTENSIONS_* constant except ELF. Since .elf_
files are real test fixtures and a recognised input format, the fix is to add
the missing elif branch (and import FORMAT_ELF) rather than delete the
constant.

Closes #3031
2026-05-08 17:58:07 +02:00
Capa Bot c57422f94e Sync capa rules submodule 2026-05-07 10:50:09 +00:00