Commit Graph

6146 Commits

Author SHA1 Message Date
Willi Ballenthin 8fca21f808 linter: validate dynamic example offsets
closes #3058
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8e464e6041 fix: formatting 2026-05-08 17:58:07 +02:00
Willi Ballenthin 555bbdecda fix: guard getByteDef against None for unmapped addresses in viv insn extractor 2026-05-08 17:58:07 +02:00
Willi Ballenthin c8d47085ee fix: remove unused imports from cache-ruleset.py, detect-binexport2-capabilities.py, show-capabilities-by-function.py
Removes capa.engine, capa.helpers, capa.features, and capa.features.insn
imports that were never referenced in each script. Adds missing capa.loader
import to show-capabilities-by-function.py which was already being used.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7a8a0acaa9 fix: remove dead except ValueError clause in capa2sarif.py so JSONDecodeError is caught correctly
json.JSONDecodeError is a subclass of ValueError, so the broader except ValueError
was shadowing the more specific handler, making it unreachable. Keep only the
specific except json.JSONDecodeError handler.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7d8714098c fix: dedent bulk-process.py main() body so explicit argv is used
The entire main() body was indented inside `if argv is None:`, causing
main() to silently return None when called with an explicit argv list.

Closes SURF-90.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a938c87fa4 fix: guard statistics calls in compare-backends.py against empty duration lists
When all runs for a backend fail, durations_by_backend[backend] is empty,
causing StatisticsError from statistics.quantiles (needs >= 2 points) and
statistics.mean (needs >= 1 point). Print placeholder messages instead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 604fae3519 fix: replace zipfile with pyzipper in minimize_vmray_results.py so output archive is AES-encrypted
zipfile.ZipFile.setpassword() only affects reads; writing encrypted entries requires pyzipper with WZ_AES encryption. Add pyzipper to scripts optional dependencies.
2026-05-08 17:58:07 +02:00
Willi Ballenthin e474e477f1 fix: assign yara_strings/yara_condition to empty string when Some has cmin=0 to prevent UnboundLocalError 2026-05-08 17:58:07 +02:00
Willi Ballenthin ae4c2ec82d fix: parenthesize s_type checks in capa2yara so kid.name guard applies to And/Or/Not uniformly
Without parentheses, Python's operator precedence caused `kid.name != "Some"`
to only guard the `Not` branch; `And` and `Or` kids named `"Some"` would
bypass the Some-handling block and enter recursive convert_rule unguarded.
2026-05-08 17:58:07 +02:00
Willi Ballenthin fc7f0533d7 fix: correct operator precedence in FeatureRegexRegistryControlSetMatchIncomplete
The `or "currentcontrolset" in pat` branch triggered the lint for any
regex containing "currentcontrolset", even unrelated paths like
HKLM\Software\CurrentControlSet that don't need the system\\ fix.

Fix by requiring "system\\\\" in both branches of the condition.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 861f3b8619 fix: FeatureRegexRegistryControlSetMatchIncomplete checks all Regex features
Dedent `return False` out of the `for` loop body so the method examines
every Regex feature instead of short-circuiting after the first one.
2026-05-08 17:58:07 +02:00
Willi Ballenthin bfa09f817b fix: guard MissingStaticScope and MissingDynamicScope against absent scopes dict
When rule.meta lacks a "scopes" key, rule.meta.get("scopes") returns None
and "static"/"dynamic" not in None raises TypeError, crashing lint_rule.
Add isinstance(scopes, dict) guard so both checks return False (no violation)
when scopes is absent, letting MissingScopes report the real problem.
2026-05-08 17:58:07 +02:00
Willi Ballenthin c5ae9be3e1 fix: MissingExampleOffset lint reads scopes.static instead of obsolete scope key
The check was reading rule.meta.get("scope") which no longer exists in the
current schema (replaced by scopes.static/scopes.dynamic), causing the lint
to never fire for function/basic-block rules missing example offsets.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 4da1addfb3 fix: invert scope filter in import-to-ida.py so function-scope rules are annotated
The condition was skipping FUNCTION-scope rules instead of keeping them,
causing the script to never annotate any functions. Invert to match the
correct logic in import-to-bn.py.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 74010ba03f fix: remove dead string literal in test_detect_duplicate_features
The triple-quoted string at lines 230-239 was never assigned and contained
an incomplete sentence ("The scripts"). Deleted entirely as the surrounding
code is self-explanatory.
2026-05-08 17:58:07 +02:00
Willi Ballenthin f93e342e74 fix: remove duplicate Rule.from_yaml call in test_scope_instruction_description 2026-05-08 17:58:07 +02:00
Willi Ballenthin ad538f7ac3 fix: remove unused imports from test_freeze_dynamic.py
Remove capa.helpers, capa.features.basicblock (never referenced), and
redundant bare capa.features.extractors.base_extractor (covered by the
from-import on the next line).

Closes SURF-78
2026-05-08 17:58:07 +02:00
Willi Ballenthin cb1951dd90 fix: correct test_json_meta loop to iterate list of function dicts and use correct serialized address format for matched_basic_blocks assertion 2026-05-08 17:58:07 +02:00
Willi Ballenthin f11c99d0e4 fix: remove unreachable StaticAnalysis assert in assert_meta and cover dynamic proto path
Removes the bare `assert isinstance(meta.analysis, rd.StaticAnalysis)` that
blocked dynamic ResultDocument from being validated, removes the incorrect
direct list comparison in assert_dynamic_analyis, and adds dynamic_a0000a6_rd
to test_doc_to_pb2 so the dynamic proto serialization path is exercised.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 8952151c97 fix: correct self-comparison sa.max == sa.max to sa.max == sb.max in test_proto
RangeStatement.max was compared against itself, so a protobuf round-trip
that corrupted the max field would pass undetected. Fix to compare against
the protobuf counterpart sb.max.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 5776bad371 fix: guard parse_node against missing "type" key to avoid TypeError crash
When "type" is absent from node_data, node_data.get(None) returns None
and "description" in None raises TypeError. Guard with early returns.

Closes SURF-73
2026-05-08 17:58:07 +02:00
Willi Ballenthin 5e3cf87f25 fix: allocate feat_dict per feature in parse_json to avoid shared-reference aliasing
Previously, a single feat_dict was allocated before the inner loop and
the same object reference was appended on every iteration, causing all
sub-match entries to be identical by the end of the loop.

Closes SURF-72
2026-05-08 17:58:07 +02:00
Willi Ballenthin f5383da728 fix: add missing capa.features.extractors.elf import to ghidra and ida helpers
Without this import, any ELF analysis via the Ghidra or IDA plugin raises
AttributeError: module 'capa.features.extractors' has no attribute 'elf'.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 7406e75f96 fix: remove dead view_tab_rulegen assignment from CapaExplorerForm
self.view_tab_rulegen = None was assigned but never read or reassigned anywhere in the codebase. All other attributes in the same block are type-annotated declarations; this one was an actual assignment with no purpose.
2026-05-08 17:58:07 +02:00
Willi Ballenthin ce8cecc059 fix: remove dead reset_query method from CapaExplorerSearchProxyModel
The method delegated to set_query("") but had zero callers; existing
call sites in form.py already call set_query directly.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 3538f8b85f fix: remove unused imports of capa.rules and capa.engine from view.py
Neither module was referenced anywhere in the file; removing them reduces
noise and eliminates unnecessary module load overhead.
2026-05-08 17:58:07 +02:00
Willi Ballenthin ba57a98194 fix: remove unused imports from capa/ida/plugin/form.py
capa.main, capa.render.json, and capa.features.extractors.ida.extractor
were imported but never referenced in the file, adding unnecessary
startup cost.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 42ed5a4086 fix: remove dead trim_function_name from form.py that was never called 2026-05-08 17:58:07 +02:00
Willi Ballenthin f8429009e5 fix: replace get_file_md5/sha256 with version-aware shims in IDA helpers
Remove get_file_md5 and get_file_sha256 (which used different underlying
IDA APIs and duplicated normalization logic) and replace all call sites with
the existing retrieve_input_file_md5/sha256 shims that already handle
IDA <9 vs >=9 return-type differences consistently.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 2eb2cb2e49 fix: rename dragEventEnter to dragEnterEvent in CapaExplorerRulegenEditor
Qt dispatches drag-enter events to dragEnterEvent; the misspelled method
dragEventEnter was dead code and its super() call would raise AttributeError
if ever reached.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1f6b9082a4 fix: guard against None in lessThan else-branch to prevent AttributeError when sorting empty cells 2026-05-08 17:58:07 +02:00
Willi Ballenthin 7f58fa097a fix: add explicit import capa.loader in ida plugin form.py
capa.loader.compute_layout was called without capa.loader being explicitly
imported; it worked only via capa.main's transitive import. Adding the
explicit import prevents a future AttributeError if import order changes.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 412e6ad725 fix: initialize f=None before try in load_capa_function_results to prevent UnboundLocalError
If idaapi.get_func() raises before assignment, the except handler previously
referenced the unbound variable f, causing a secondary UnboundLocalError that
masked the original exception. Also handles the case where get_func() returns
None by falling back to idaapi.get_screen_ea() in the error log.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a18595bf89 fix: handle NOT CompoundStatement in render_capa_doc_statement_node so NOT rules render children in IDA plugin tree view
Previously, the elif for CompoundStatement+NOT was unreachable (the outer
if already matched all CompoundStatement), causing NOT statements to return
None and their children to be orphaned/dropped from the tree.
2026-05-08 17:58:07 +02:00
Willi Ballenthin da9ccfaef3 fix: use next(iter(addrs)) to avoid mutating the feature cache in parse_features_for_tree
addrs.pop() removed the element from the cached set, so after the first
render the cache entry was empty and subsequent renders showed no address.
2026-05-08 17:58:07 +02:00
Willi Ballenthin a6dd0faf9f fix: use integer division in get_printable_len for UTF-16 LE operands
`get_printable_len` returned a float for UTF-16 LE operands due to `/`
instead of `//`, violating the `-> int` annotation and silently
propagating a float into `_bb_has_stackstring`'s accumulator. Aligns
with the IDA extractor equivalent.

Closes SURF-58
2026-05-08 17:58:07 +02:00
Willi Ballenthin 14a1d9981f fix: break thunk chain loop after resolving import to avoid duplicate API features 2026-05-08 17:58:07 +02:00
Willi Ballenthin 27d7741991 fix: pass insn instead of oper to getOperValue/getOperAddr in viv insn extractor
Four call sites in capa/features/extractors/viv/insn.py passed `oper`
(the operand object) as the first argument to getOperValue/getOperAddr,
where the API expects `insn` (the enclosing opcode). Silent today because
i386ImmOper ignores the argument, but would produce wrong values for
Amd64RipRelOper which uses op.va + op.size + self.imm.

Closes SURF-56
2026-05-08 17:58:07 +02:00
Willi Ballenthin 69a1ba862c fix: implement extract_function_loop in dnfile extractor
The stub always raised NotImplementedError and was not registered in
FUNCTION_HANDLERS, so loop detection was silently skipped for all .NET
samples. Detects backward branches (target offset < instruction offset)
as loops, matching the approach used by other extractors.
2026-05-08 17:58:07 +02:00
Willi Ballenthin f9df8f0a5c fix: remove dead find_process function and helpers.py from cape extractor
helpers.py contained only find_process, which was never called anywhere in the codebase. Its signature used dict-style field access while the rest of the cape extractor migrated to Pydantic models, so calling it today would raise a TypeError.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 74785d74fb fix: remove dead interface_extract_* stub functions from viv extractors
Three stub functions (interface_extract_basic_block_XXX, interface_extract_function_XXX,
interface_extract_instruction_XXX) each raised NotImplementedError unconditionally and
were never called or referenced anywhere in the codebase. They were leftover interface
documentation from an earlier design; the handler-list type annotations already document
the expected signatures.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 9f11491133 fix: remove unused import of capa.features.extractors.strings from binexport2 intel insn.py
The module was imported on line 18 but never referenced anywhere in the file.
It has no side effects, so the import was purely dead code adding noise.
2026-05-08 17:58:07 +02:00
Willi Ballenthin d32492d208 fix: remove extract_file_format from FILE_HANDLERS in five extractors
Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
2026-05-08 17:58:07 +02:00
Willi Ballenthin e2c8ab4bff fix: replace assert with guard for 2-operand ARM ADD/SUB instructions
ARM Thumb-2 has legal 2-operand forms like `add sp, #0x10` and `add r0, #1`.
The previous code asserted exactly 3 operands before checking if operand[1]
was a stack register, causing AssertionError on any 2-operand encoding.
The fix converts the assert to a guard condition so 2-operand instructions
fall through to the for-loop and are processed normally.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 723ee16ef7 fix: omit trailing ' -> ' suffix in syscall call names when no return value
DrakvufExtractor.get_call_name always appended ' -> ' even for SystemCall
objects that have no return_value attribute, because f" -> {''}" still
produces the literal ' -> ' string. Conditionally build the suffix only
when a non-empty return value exists.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 52e8fdfc92 fix: use AbsoluteVirtualAddress for string addresses in Ghidra and IDA file extractors
block.getStart().getOffset() and seg.start_ea both return virtual addresses,
not file offsets. Wrapping them in FileOffsetAddress was semantically wrong for
PE/ELF binaries where VA != file offset. Switch to AbsoluteVirtualAddress to
match what the value actually represents.
2026-05-08 17:58:07 +02:00
Willi Ballenthin b348867e55 fix: use .value.value for LLIL_CONST call destinations in binja insn.py
In extract_function_calls_from, the LLIL_CONST branches passed a RegisterValue
object to AbsoluteVirtualAddress instead of an int. Change dest.value to
dest.value.value and indirect_src.value to indirect_src.value.value, matching
the pattern used everywhere else in the file for LLIL_CONST and LLIL_CONST_PTR.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 1bf6ae9149 fix: remove duplicate getPrevLocation call and dead loc variable
get_previous_instructions called vw.getPrevLocation twice with identical
arguments; the first result was assigned to loc and used only as a None
gate, but the inner guard already covered that case. Collapsed the two
nested guards into one, removing the redundant call and dead variable.
2026-05-08 17:58:07 +02:00
Willi Ballenthin 56fcdd32ed fix: unpack getByteDef offset to correctly check ENDBRANCH at target address
getByteDef returns (offset, segment_bytes); the old code indexed [1] to get
segment_bytes and called startswith() on the whole buffer, which checked whether
the segment itself begins with ENDBRANCH rather than the target address.
Unpacking both values and slicing _buf[_offset:] fixes the check.
2026-05-08 17:58:07 +02:00