self.view_tab_rulegen = None was assigned but never read or reassigned anywhere in the codebase. All other attributes in the same block are type-annotated declarations; this one was an actual assignment with no purpose.
capa.main, capa.render.json, and capa.features.extractors.ida.extractor
were imported but never referenced in the file, adding unnecessary
startup cost.
Remove get_file_md5 and get_file_sha256 (which used different underlying
IDA APIs and duplicated normalization logic) and replace all call sites with
the existing retrieve_input_file_md5/sha256 shims that already handle
IDA <9 vs >=9 return-type differences consistently.
Qt dispatches drag-enter events to dragEnterEvent; the misspelled method
dragEventEnter was dead code and its super() call would raise AttributeError
if ever reached.
capa.loader.compute_layout was called without capa.loader being explicitly
imported; it worked only via capa.main's transitive import. Adding the
explicit import prevents a future AttributeError if import order changes.
If idaapi.get_func() raises before assignment, the except handler previously
referenced the unbound variable f, causing a secondary UnboundLocalError that
masked the original exception. Also handles the case where get_func() returns
None by falling back to idaapi.get_screen_ea() in the error log.
Previously, the elif for CompoundStatement+NOT was unreachable (the outer
if already matched all CompoundStatement), causing NOT statements to return
None and their children to be orphaned/dropped from the tree.
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.
Closes SURF-58
Four call sites in capa/features/extractors/viv/insn.py passed `oper`
(the operand object) as the first argument to getOperValue/getOperAddr,
where the API expects `insn` (the enclosing opcode). Silent today because
i386ImmOper ignores the argument, but would produce wrong values for
Amd64RipRelOper which uses op.va + op.size + self.imm.
Closes SURF-56
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
helpers.py contained only find_process, which was never called anywhere in the codebase. Its signature used dict-style field access while the rest of the cape extractor migrated to Pydantic models, so calling it today would raise a TypeError.
Three stub functions (interface_extract_basic_block_XXX, interface_extract_function_XXX,
interface_extract_instruction_XXX) each raised NotImplementedError unconditionally and
were never called or referenced anywhere in the codebase. They were leftover interface
documentation from an earlier design; the handler-list type annotations already document
the expected signatures.
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
block.getStart().getOffset() and seg.start_ea both return virtual addresses,
not file offsets. Wrapping them in FileOffsetAddress was semantically wrong for
PE/ELF binaries where VA != file offset. Switch to AbsoluteVirtualAddress to
match what the value actually represents.
In extract_function_calls_from, the LLIL_CONST branches passed a RegisterValue
object to AbsoluteVirtualAddress instead of an int. Change dest.value to
dest.value.value and indirect_src.value to indirect_src.value.value, matching
the pattern used everywhere else in the file for LLIL_CONST and LLIL_CONST_PTR.
get_previous_instructions called vw.getPrevLocation twice with identical
arguments; the first result was assigned to loc and used only as a None
gate, but the inner guard already covered that case. Collapsed the two
nested guards into one, removing the redundant call and dead variable.
getByteDef returns (offset, segment_bytes); the old code indexed [1] to get
segment_bytes and called startswith() on the whole buffer, which checked whether
the segment itself begins with ENDBRANCH rather than the target address.
Unpacking both values and slicing _buf[_offset:] fixes the check.
The outer while loop over dests and inner for loop over s_addrs were
swapped, causing s_addrs to be exhausted after the first iteration and
dests.next() to be called multiple times per destination. Fix uses the
block's first start address as a fixed source and iterates dests in the
inner while loop, matching the IDA and Binja extractor pattern.
In both extract_file_import_names (file.py) and get_file_imports (helpers.py), addr was
assigned only inside a conditional branch. If no data reference existed for an external
function, addr was unbound and the subsequent AbsoluteVirtualAddress(addr) or
import_dict.setdefault(addr, ...) raised UnboundLocalError. Fix initializes addr to None
before the inner loop, breaks on the first data reference found, and skips the function
with continue if no data reference was found.
ConciseModel called ConfigDict(extra="ignore") as a bare expression, discarding the result.
model_config was never set, so the extra="ignore" config was never in effect for any subclass.
Assign the result to model_config as intended.
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.
Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
In `_compute_monitor_threads`, the uniqueness assertion indexed
`monitor_threads_by_monitor_process` by `thread_id` instead of
`process_id`. Because the dict is a `defaultdict(list)`, each lookup on
a novel thread ID creates a fresh empty list, making the assertion
vacuously true. Duplicate thread IDs within a process are never caught.
Line 242 immediately below uses the correct key `process_id` when
appending, so the data structure is populated correctly; only the guard
was broken.
`format_call` referenced `capa.helpers.assert_never` without importing
`capa.helpers`, causing a NameError at runtime whenever a call argument
had an unexpected type. The module-qualified reference relied on an
implicit import-order dependency from another module having already
imported `capa.helpers`. Replace with an explicit `from capa.helpers
import assert_never`, matching the pattern used in the sibling
`call.py`.
`get_calls` iterated `generate_symbols` and overwrote `call.api` with
each generated symbol name, then yielded a `CallHandle` wrapping the
same `call` object. Because the Pydantic model is shared by reference,
every previously-yielded handle ended up with `api` equal to the last
symbol generated in the final iteration.
The correct pattern (used in `call.py:61`) is to leave the model
untouched and let the call extractor expand symbol variants via
`generate_symbols`. `get_calls` now yields exactly one `CallHandle`
per call with the original `api` value preserved.
`is_security_cookie` computes the last address in a terminal basic block by
iterating `bb.instruction_index` and indexing `ir.end_index - 1`. The BinExport2
protobuf spec omits `end_index` for single-element ranges, so protobuf returns 0
as the default. `0 - 1 = -1`, and -1 is not a key in `insn_address_by_index`,
raising `KeyError`.
Use `BinExport2Index.instruction_indices` to enumerate instruction indices, which
already handles single-instruction ranges via `HasField("end_index")`.
`_build_expression_tree` already returns `[]` for the Ghidra bug where
an operand has no expressions (see
https://github.com/NationalSecurityAgency/ghidra/issues/6817), but
`get_operand_expressions` then called the recursive walker
unconditionally with `tree_index=0`, which indexed into the empty list
and raised `IndexError`. Add an early-return guard so callers receive
`[]` instead.
`extract_insn_offset_features` in the x86/x64 BinExport2 extractor handled
zero-offset patterns (e.g. `mov [reg], reg`) in a nested branch but was
missing a `return` after yielding `Offset(0)` and `OperandOffset(0)`.
Execution then fell through to the general `mask_immediate` path, which read
`immediate` from the last-matched expression node (a register, not an
integer). Since that field defaults to 0, the function emitted duplicate
`Offset(0)` and `OperandOffset(0)` features for every such instruction.
Fix: add `return` after the two yields in the zero-pattern branch.
Tests: add `FEATURE_COUNT_TESTS_BE2_INTEL` covering `MOV [EDI], CX` at
0x401125 in mimikatz, asserting each of `Offset(0)` and `OperandOffset(1,0)`
is emitted exactly once.
`ValueError` takes a single message string. Passing a second argument stores both as a tuple in `args` without any string formatting, so the feature value never appears in the error message. Use an f-string so the feature value is interpolated into the message.
In the 5-expression branch of get_operand_phrase_info, two cases used
expression3 (the OPERATOR node) where expression4 (the IMMEDIATE_INT
value) was intended:
- Base + (Index * Scale): scale was expression3 ('*' operator), should be expression4 (the numeric scale value)
- (Index * Scale) + Displacement: displacement was expression3 ('+' operator), should be expression4 (the numeric displacement value)
Tests added using mimikatz.exe_.ghidra.BinExport:
- 0x40194d: MOVZX DI, [EAX + ECX * 1] — verifies scale=1 as IMMEDIATE_INT
- 0x401fd4: JMP [EAX * 4 + switchdataD_00402017] — verifies displacement=4202519 as IMMEDIATE_INT
`_index_vertex_edges` used truthiness checks (`if not edge.source_vertex_index`)
to skip edges with absent fields, but this also silently drops any edge whose
source or target is vertex index 0 — a valid vertex. Both fields are protobuf
optional integers, so the correct absent-field check is `HasField()`, consistent
with `_index_flow_graph_edges` and `_index_call_graph_vertices` in the same file.
In mimikatz.exe_.ida.BinExport, vertex 0 at 0x401000 has 2 callees and 1 caller
that were all being silently discarded.
`get_backend_from_cli` had a bare `if` at the FORMAT_DRAKVUF branch where
`elif` was expected. After `if input_format == FORMAT_CAPE: return BACKEND_CAPE`,
the DRAKVUF branch opened a new `if` rather than continuing the chain with
`elif`. The remaining branches used `elif`/`else` correctly. There was no
functional bug because the FORMAT_CAPE branch returns early, but the break
in the chain was misleading and looked like a merge artifact.
`get_file_taste` opened a file handle with `sample_path.open("rb").read(8)`,
discarding the file object without explicitly closing it. CPython reference-
counting closes it promptly in practice, but other implementations (PyPy,
Jython) and CPython under GC pressure may defer closure. Use a `with` statement
to guarantee the handle is released immediately after reading.
`enumerate` is 0-based, so after the loop `call_count` held `n - 1` for
`n` calls processed. The debug log at the end of `find_thread_capabilities`
therefore reported one fewer event than was actually analyzed. Replace
`enumerate` with an explicit `call_count += 1` counter so the log is
accurate.
`RuleMetadata.from_capa` used `rule.meta.get("capa/subscope", False)` and
`Field(False, alias="capa/subscope")`, but the actual key set by
`_extract_subscope_rules_rec` is `"capa/subscope-rule"`. This caused
`is_subscope_rule` to always be `False` in every `RuleMetadata` instance,
making downstream filters in `render/utils.py`, `render/vverbose.py`, and
`scripts/import-to-ida.py` ineffective (though subscope rules are already
excluded from `ResultDocument` before reaching those callers).
`Scopes.from_dict` was decorated with `@classmethod` but named its first
parameter `self` instead of `cls`, and hard-coded `Scopes(...)` in the
return statement instead of `cls(...)`. This meant any subclass calling
`SubScopes.from_dict(...)` would get a `Scopes` instance back rather than
a `SubScopes` instance.
Rename the parameter to `cls` and use it in the return statement so
that subclasses receive the correct type.
`functools.lru_cache` has been in the standard library since Python 3.2.
The project requires Python >=3.10, so the `except ImportError` branch
importing `backports.functools_lru_cache` can never execute.
Remove the try/except block and keep only the direct stdlib import.
Also remove `types-backports` from dev dependencies, `backports` from
`[tool.deptry.known_first_party]`, and `types-backports` from the
DEP002 ignore list in `pyproject.toml`.
`Result.__nonzero__` is the Python 2 boolean hook; Python 3 calls
`__bool__`, which is already defined immediately above it.
`__nonzero__` is never invoked at runtime in Python 3 and adds noise
that misleads readers into thinking it serves a purpose.
EXTENSIONS_ELF = "elf_" was defined but never used: get_format_from_extension
had branches for every other EXTENSIONS_* constant except ELF. Since .elf_
files are real test fixtures and a recognised input format, the fix is to add
the missing elif branch (and import FORMAT_ELF) rather than delete the
constant.
Closes#3031