Removes capa.engine, capa.helpers, capa.features, and capa.features.insn
imports that were never referenced in each script. Adds missing capa.loader
import to show-capabilities-by-function.py which was already being used.
json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.
Closes SURF-90.
When all runs for a backend fail, durations_by_backend[backend] is empty,
causing StatisticsError from statistics.quantiles (needs >= 2 points) and
statistics.mean (needs >= 1 point). Print placeholder messages instead.
Without parentheses, Python's operator precedence caused `kid.name != "Some"`
to only guard the `Not` branch; `And` and `Or` kids named `"Some"` would
bypass the Some-handling block and enter recursive convert_rule unguarded.
The `or "currentcontrolset" in pat` branch triggered the lint for any
regex containing "currentcontrolset", even unrelated paths like
HKLM\Software\CurrentControlSet that don't need the system\\ fix.
Fix by requiring "system\\\\" in both branches of the condition.
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
The condition was skipping FUNCTION-scope rules instead of keeping them,
causing the script to never annotate any functions. Invert to match the
correct logic in import-to-bn.py.
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).
Closes SURF-78
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
When "type" is absent from node_data, node_data.get(None) returns None
and "description" in None raises TypeError. Guard with early returns.
Closes SURF-73
Previously, a single feat_dict was allocated before the inner loop and
the same object reference was appended on every iteration, causing all
sub-match entries to be identical by the end of the loop.
Closes SURF-72
self.view_tab_rulegen = None was assigned but never read or reassigned anywhere in the codebase. All other attributes in the same block are type-annotated declarations; this one was an actual assignment with no purpose.
capa.main, capa.render.json, and capa.features.extractors.ida.extractor
were imported but never referenced in the file, adding unnecessary
startup cost.
Remove get_file_md5 and get_file_sha256 (which used different underlying
IDA APIs and duplicated normalization logic) and replace all call sites with
the existing retrieve_input_file_md5/sha256 shims that already handle
IDA <9 vs >=9 return-type differences consistently.
Qt dispatches drag-enter events to dragEnterEvent; the misspelled method
dragEventEnter was dead code and its super() call would raise AttributeError
if ever reached.
capa.loader.compute_layout was called without capa.loader being explicitly
imported; it worked only via capa.main's transitive import. Adding the
explicit import prevents a future AttributeError if import order changes.
If idaapi.get_func() raises before assignment, the except handler previously
referenced the unbound variable f, causing a secondary UnboundLocalError that
masked the original exception. Also handles the case where get_func() returns
None by falling back to idaapi.get_screen_ea() in the error log.
Previously, the elif for CompoundStatement+NOT was unreachable (the outer
if already matched all CompoundStatement), causing NOT statements to return
None and their children to be orphaned/dropped from the tree.
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.
Closes SURF-58
Four call sites in capa/features/extractors/viv/insn.py passed `oper`
(the operand object) as the first argument to getOperValue/getOperAddr,
where the API expects `insn` (the enclosing opcode). Silent today because
i386ImmOper ignores the argument, but would produce wrong values for
Amd64RipRelOper which uses op.va + op.size + self.imm.
Closes SURF-56
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
helpers.py contained only find_process, which was never called anywhere in the codebase. Its signature used dict-style field access while the rest of the cape extractor migrated to Pydantic models, so calling it today would raise a TypeError.
Three stub functions (interface_extract_basic_block_XXX, interface_extract_function_XXX,
interface_extract_instruction_XXX) each raised NotImplementedError unconditionally and
were never called or referenced anywhere in the codebase. They were leftover interface
documentation from an earlier design; the handler-list type annotations already document
the expected signatures.
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
block.getStart().getOffset() and seg.start_ea both return virtual addresses,
not file offsets. Wrapping them in FileOffsetAddress was semantically wrong for
PE/ELF binaries where VA != file offset. Switch to AbsoluteVirtualAddress to
match what the value actually represents.
In extract_function_calls_from, the LLIL_CONST branches passed a RegisterValue
object to AbsoluteVirtualAddress instead of an int. Change dest.value to
dest.value.value and indirect_src.value to indirect_src.value.value, matching
the pattern used everywhere else in the file for LLIL_CONST and LLIL_CONST_PTR.
get_previous_instructions called vw.getPrevLocation twice with identical
arguments; the first result was assigned to loc and used only as a None
gate, but the inner guard already covered that case. Collapsed the two
nested guards into one, removing the redundant call and dead variable.
getByteDef returns (offset, segment_bytes); the old code indexed [1] to get
segment_bytes and called startswith() on the whole buffer, which checked whether
the segment itself begins with ENDBRANCH rather than the target address.
Unpacking both values and slicing _buf[_offset:] fixes the check.
The outer while loop over dests and inner for loop over s_addrs were
swapped, causing s_addrs to be exhausted after the first iteration and
dests.next() to be called multiple times per destination. Fix uses the
block's first start address as a fixed source and iterates dests in the
inner while loop, matching the IDA and Binja extractor pattern.