In `_compute_monitor_threads`, the uniqueness assertion indexed
`monitor_threads_by_monitor_process` by `thread_id` instead of
`process_id`. Because the dict is a `defaultdict(list)`, each lookup on
a novel thread ID creates a fresh empty list, making the assertion
vacuously true. Duplicate thread IDs within a process are never caught.
Line 242 immediately below uses the correct key `process_id` when
appending, so the data structure is populated correctly; only the guard
was broken.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`format_call` referenced `capa.helpers.assert_never` without importing
`capa.helpers`, causing a NameError at runtime whenever a call argument
had an unexpected type. The module-qualified reference relied on an
implicit import-order dependency from another module having already
imported `capa.helpers`. Replace with an explicit `from capa.helpers
import assert_never`, matching the pattern used in the sibling
`call.py`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.
The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.
Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.
Fix: add `return` after the two yields in the zero-pattern branch.
Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`ValueError` takes a single message string. Passing a second argument stores both as a tuple in `args` without any string formatting, so the feature value never appears in the error message. Use an f-string so the feature value is interpolated into the message.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:
- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)
Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`)
to skip edges with absent fields, but this also silently drops any edge whose
source or target is vertex index 0 — a valid vertex. Both fields are protobuf
optional integers, so the correct absent-field check is `HasField()`, consistent
with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file.
In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller
that were all being silently discarded.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`get_backend_from_cli` had a bare `if` at the FORMAT_DRAKVUF branch where
`elif` was expected. After `if input_format == FORMAT_CAPE: return BACKEND_CAPE`,
the DRAKVUF branch opened a new `if` rather than continuing the chain with
`elif`. The remaining branches used `elif`/`else` correctly. There was no
functional bug because the FORMAT_CAPE branch returns early, but the break
in the chain was misleading and looked like a merge artifact.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`,
discarding the file object without explicitly closing it. CPython reference-
counting closes it promptly in practice, but other implementations (PyPy,
Jython) and CPython under GC pressure may defer closure. Use a `with` statement
to guarantee the handle is released immediately after reading.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`enumerate` is 0-based, so after the loop `call_count` held `n - 1` for
`n` calls processed. The debug log at the end of `find_thread_capabilities`
therefore reported one fewer event than was actually analyzed. Replace
`enumerate` with an explicit `call_count += 1` counter so the log is
accurate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and
`Field(False, alias="capa/subscope")`, but the actual key set by
`_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused
`is_subscope_rule` to always be `False` in every `RuleMetadata` instance,
making downstream filters in `render/utils.py`, `render/vverbose.py`, and
`scripts/import-to-ida.py` ineffective (though subscope rules are already
excluded from `ResultDocument` before reaching those callers).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`Scopes.from_dict` was decorated with `@classmethod` but named its first
parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the
return statement instead of `cls(...)`. This meant any subclass calling
`SubScopes.from_dict(...)` would get a `Scopes` instance back rather than
a `SubScopes` instance.
Rename the parameter to `cls` and use it in the return statement so
that subclasses receive the correct type.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`functools.lru_cache` has been in the standard library since Python 3.2.
The project requires Python >=3.10, so the `except ImportError` branch
importing `backports.functools_lru_cache` can never execute.
Remove the try/except block and keep only the direct stdlib import.
Also remove `types-backports` from dev dependencies, `backports` from
`[tool.deptry.known_first_party]`, and `types-backports` from the
DEP002 ignore list in `pyproject.toml`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`Result.__nonzero__` is the Python 2 boolean hook; Python 3 calls
`__bool__`, which is already defined immediately above it.
`__nonzero__` is never invoked at runtime in Python 3 and adds noise
that misleads readers into thinking it serves a purpose.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension
had branches for every other EXTENSIONS_* constant except ELF. Since .elf_
files are real test fixtures and a recognised input format, the fix is to add
the missing elif branch (and import FORMAT_ELF) rather than delete the
constant.
Closes#3031
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Change capa-rules version in installation guide
Updated the installation instructions to reflect the newest version of capa-rules.
* add md files from /doc to bumpversion.toml
* adjust rule installation command
* bump to 9.4.0