Compare commits

...

151 Commits

Author SHA1 Message Date
Willi Ballenthin
1f0cdc3c28 changelog 2026-03-20 10:03:47 +00:00
Willi Ballenthin
9f7264fa51 cache: support *BSD
closes #2930 

thanks @res2500
2026-03-20 10:35:51 +01:00
blenbot
6579e01d15 fix(freeze): use get_base_address in dumps_dynamic
Signed-off-by: blenbot <harshitiszz23@gmail.com>
2026-03-18 19:23:06 +01:00
blenbot
7090aa9c37 fix(range): correct unbounded max sentinel precedence
Signed-off-by: blenbot <harshitiszz23@gmail.com>
2026-03-18 12:47:31 +01:00
EdoardoAllegrini
30d9989538 fix: resolve mypy type error in dumps_dynamic by using actual field name 2026-03-17 07:22:05 +01:00
dependabot[bot]
7b23834d8e build(deps-dev): bump black from 25.12.0 to 26.3.0 (#2902)
* build(deps-dev): bump black from 25.12.0 to 26.3.0

Bumps [black](https://github.com/psf/black) from 25.12.0 to 26.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.12.0...26.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* style: auto-format with black and isort

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Capa Bot <capa-dev@mandiant.com>
2026-03-13 15:46:13 +01:00
Capa Bot
f1800b5eb4 Sync capa rules submodule 2026-03-12 17:41:51 +00:00
Capa Bot
43f556caf9 Sync capa rules submodule 2026-03-12 17:08:39 +00:00
Capa Bot
5f8c06c650 Sync capa rules submodule 2026-03-12 17:04:53 +00:00
Devyansh Somvanshi
ceaa3b6d03 webui: include feature type in global search (match, regex, api, …) (#2906)
* webui: include feature type in global search (match, regex, etc.)

Searching for "match" or "regex" in the capa Explorer web UI produced
no results because PrimeVue's globalFilterFields only included the
name field, while the feature kind (e.g. "match", "regex", "api") is
stored in the separate typeValue field.

Add 'typeValue' to globalFilterFields so that the global search box
matches nodes by both their value (name) and their kind (typeValue).
No change to rendering or data structure; only the set of fields
consulted during filtering is widened.

Fixes #2349.

* changelog: add entry for #2349 webui global search fix
2026-03-12 10:43:49 -06:00
Devyansh Somvanshi
c03d833a84 rules: handle empty or invalid YAML documents in Rule.from_yaml (#2903)
* rules: handle empty or invalid YAML documents in Rule.from_yaml

Empty or whitespace-only .yml files caused a cryptic TypeError in
Rule.from_dict (NoneType not subscriptable) when yaml.load returned None.
This made lint.py abort with a stack trace instead of a clear message.

Add an early guard in Rule.from_yaml that raises InvalidRule with a
descriptive message when the parsed document is None or structurally
invalid.  get_rules() now logs a warning and skips such files so that
scripts/lint.py completes cleanly even when placeholder .yml files
exist in the rules/ or rules/nursery/ directories.

Fixes #2900.

* changelog: add entry for #2900 empty YAML handling

* rules: fix exception check and add get_rules skip test

- Use e.args[0] instead of str(e) to check the error message.
  InvalidRule.__str__ prepends "invalid rule: " so str(e) never
  matched the bare message, causing every InvalidRule to be re-raised.
- Add test_get_rules_skips_empty_yaml to cover the get_rules skip path,
  confirming that an empty file is warned-and-skipped while a valid
  sibling rule is still loaded.

* fix: correct isort import ordering in tests/test_rules.py

Move capa.engine import before capa.rules.cache to satisfy
isort --length-sort ordering.
2026-03-10 15:04:11 -06:00
Devyansh Somvanshi
1f4a16cbcc loader: skip PE files with unrealistically large section virtual sizes (#2905)
* loader: skip PE files with unrealistically large section virtual sizes

Some malformed PE samples declare section virtual sizes orders of
magnitude larger than the file itself (e.g. a ~400 KB file with a
900 MB section).  vivisect attempts to map these regions, causing
unbounded CPU and memory consumption (see #1989).

Add _is_probably_corrupt_pe() which uses pefile (fast_load=True) to
check whether any section's Misc_VirtualSize exceeds
max(file_size * 128, 512 MB).  If the check fires, get_workspace()
raises CorruptFile before vivisect is invoked, keeping the existing
exception handling path consistent.

Thresholds are intentionally conservative to avoid false positives on
large but legitimate binaries.  When pefile is unavailable the helper
returns False and behaviour is unchanged.

Fixes #1989.

* changelog: add entry for #1989 corrupt PE large sections

* loader: apply Gemini review improvements

- Extend corrupt-PE check to FORMAT_AUTO so malformed PE files
  cannot bypass the guard when format is auto-detected (the helper
  returns False for non-PE files so there is no false-positive risk).
- Replace magic literals 128 and 512*1024*1024 with named constants
  _VSIZE_FILE_RATIO and _MAX_REASONABLE_VSIZE for clarity.
- Remove redundant int() cast around getattr(Misc_VirtualSize); keep
  the `or 0` guard for corrupt files where pefile may return None.
- Extend test to cover FORMAT_AUTO path alongside FORMAT_PE.

* tests: remove mock-only corrupt PE test per maintainer request

williballenthin noted the test doesn't add real value since it only
exercises the mock, not the actual heuristic. Removing it per feedback.

* fix: resolve flake8 NIC002 implicit string concat and add missing test

Fix the implicit string concatenation across multiple lines that caused
code_style CI to fail. Also add the test_corrupt_pe_with_unrealistic_section_size_short_circuits
test that was described in the PR body but not committed.
2026-03-10 15:03:35 -06:00
Devyansh Somvanshi
2c9e30c3e1 perf: eliminate O(n²) tuple growth and reduce per-match overhead (#2890)
* perf: eliminate O(n²) tuple growth and reduce per-match overhead

Four data-driven performance improvements identified by profiling
the hot paths in capa's rule-matching and capability-finding pipeline:

1. find_static_capabilities / find_dynamic_capabilities (O(n²) → O(n))
   Tuple concatenation with `t += (item,)` copies the entire tuple on
   every iteration. For a binary with N functions this allocates O(N²)
   total objects. Replace with list accumulation and a single
   `tuple(list)` conversion at the end.

2. RuleSet._match: pre-compute rule_index_by_rule_name (O(n) → O(1))
   `_match` is called once per instruction / basic-block / function scope
   (potentially millions of times). Previously it rebuilt the name→index
   dict on every call. The dict is now computed once in `__init__` and
   stored as `_rule_index_by_scope`, reducing each call to a dict lookup.

3. RuleSet._match: candidate_rules.pop(0) → deque.popleft() (O(n) → O(1))
   `list.pop(0)` is O(n) because it shifts every remaining element.
   Switch to `collections.deque` for O(1) left-side consumption.

4. RuleSet._extract_subscope_rules: list.pop(0) → deque.popleft() (O(n²) → O(n))
   Same issue: BFS over rules used list.pop(0), making the whole loop
   quadratic. Changed to a deque queue for linear-time processing.

Fixes #2880

* perf: use sorted merge instead of full re-sort for new rule candidates

When a rule matches and introduces new dependent candidates into
_match's work queue, the previous approach converted the deque to a
list, extended it with the new items, and re-sorted the whole
collection — O((k+m) log(k+m)).

Because the existing deque is already topologically sorted, we only
need to sort the new additions — O(m log m) — and then merge the two
sorted sequences in O(k+m) using heapq.merge.

Also adds a CHANGELOG entry for the performance improvements in #2890.

* perf: simplify candidate_rules to LIFO list, revert heapq.merge

Address reviewer feedback:
- Replace deque+popleft with list+pop (LIFO stack) in _extract_subscope_rules;
  processing order doesn't affect correctness, and list.pop() is O(1).
- Replace deque+popleft with list+pop (LIFO stack) in _match; sort candidate
  rules descending so pop() from the end yields the topologically-first rule.
- Revert heapq.merge back to the simpler extend+re-sort pattern; the added
  complexity wasn't justified given the typically small candidate set.
- Remove now-unused `import heapq`.
2026-03-10 14:21:48 -06:00
Devyansh Somvanshi
8c138e3d22 loader: handle struct.error from dnfile and raise CorruptFile with a clear message (#2872)
* loader: handle struct.error from dnfile and show clear CorruptFile message

When .NET metadata is truncated or invalid, dnfile can raise struct.error.
Catch it in DnfileFeatureExtractor and DotnetFileFeatureExtractor and
re-raise as CorruptFile with a user-friendly message.

Fixes #2442

* dnfile: centralize dnPE() loading in load_dotnet_image helper

Add load_dotnet_image() to dnfile/helpers.py that calls dnfile.dnPE()
and catches struct.error, raising CorruptFile with the original error
message included (f"Invalid or truncated .NET metadata: {e}").

Both DnfileFeatureExtractor and DotnetFileFeatureExtractor now call the
helper instead of duplicating the try/except block, and their direct
import of struct is removed.

Addresses review feedback on #2872.

* style: reformat dnfile files with black --line-length=120

Fixes CI code_style failure by applying the project's configured
line length (120) instead of black's default (88).

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-03-10 14:20:48 -06:00
Moritz
a11a03bc30 build(deps): bump minimatch and editorconfig in /web/explorer (#2892)
Bumps [minimatch](https://github.com/isaacs/minimatch) and [editorconfig](https://github.com/editorconfig/editorconfig-core-js). These dependencies needed to be updated together.

Updates `minimatch` from 3.1.2 to 3.1.5
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5)

Updates `minimatch` from 9.0.5 to 9.0.9
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5)

Updates `editorconfig` from 1.0.4 to 1.0.7
- [Release notes](https://github.com/editorconfig/editorconfig-core-js/releases)
- [Changelog](https://github.com/editorconfig/editorconfig-core-js/blob/main/CHANGELOG.md)
- [Commits](https://github.com/editorconfig/editorconfig-core-js/compare/v1.0.4...v1.0.7)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-version: 3.1.5
  dependency-type: indirect
- dependency-name: minimatch
  dependency-version: 9.0.9
  dependency-type: indirect
- dependency-name: editorconfig
  dependency-version: 1.0.7
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-10 12:14:56 +01:00
dependabot[bot]
1173dc5fa5 build(deps): bump protobuf from 6.33.5 to 7.34.0 (#2891)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.5 to 7.34.0.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 7.34.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-03-05 15:23:16 +01:00
Priyank Patel
e53f6abc1e ci: add black auto-format workflow (#2827) (#2883)
* ci: add black auto-format workflow (#2827)

Signed-off-by: priyank <priyank8445@gmail.com>

* ci: use pre-commit to run black and isort (#2827)

* ci: fix install dependencies to include dev extras

---------

Signed-off-by: priyank <priyank8445@gmail.com>
2026-03-05 12:29:33 +01:00
Aditya Pandey
038c46da16 features: fix Regex.get_value_str() returning escaped pattern, breaking capa2yara #1909 (#2886)
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2026-03-05 12:14:27 +01:00
Capa Bot
563071349f Sync capa rules submodule 2026-03-04 20:30:04 +00:00
Capa Bot
517dfe154a Sync capa rules submodule 2026-03-03 07:48:23 +00:00
Devyansh Somvanshi
2e36f67e11 doc: add table comparing ways to consume capa output (#2874)
* doc: add table comparing ways to consume capa output

Add a short table to usage.md for CLI, IDA, Ghidra, CAPE, and web.

Fixes #2273

* doc: add links to each option in the ways-to-consume table

Addresses reviewer feedback to provide a link to learn more for each
consumption method (IDA Pro, Ghidra, CAPE, Web/capa Explorer).

Refs #2273

* doc: add Binary Ninja to ways-to-consume table

Fixes #2273
2026-03-02 10:17:53 -07:00
dependabot[bot]
7bd04fe297 build(deps): bump minimatch and editorconfig in /web/explorer
Bumps [minimatch](https://github.com/isaacs/minimatch) and [editorconfig](https://github.com/editorconfig/editorconfig-core-js). These dependencies needed to be updated together.

Updates `minimatch` from 3.1.2 to 3.1.5
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5)

Updates `minimatch` from 9.0.5 to 9.0.9
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.1.2...v3.1.5)

Updates `editorconfig` from 1.0.4 to 1.0.7
- [Release notes](https://github.com/editorconfig/editorconfig-core-js/releases)
- [Changelog](https://github.com/editorconfig/editorconfig-core-js/blob/main/CHANGELOG.md)
- [Commits](https://github.com/editorconfig/editorconfig-core-js/compare/v1.0.4...v1.0.7)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-version: 3.1.5
  dependency-type: indirect
- dependency-name: minimatch
  dependency-version: 9.0.9
  dependency-type: indirect
- dependency-name: editorconfig
  dependency-version: 1.0.7
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-02 16:08:23 +00:00
dependabot[bot]
9f781ec21b build(deps): bump rollup from 4.36.0 to 4.59.0 in /web/explorer (#2885)
Bumps [rollup](https://github.com/rollup/rollup) from 4.36.0 to 4.59.0.
- [Release notes](https://github.com/rollup/rollup/releases)
- [Changelog](https://github.com/rollup/rollup/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rollup/rollup/compare/v4.36.0...v4.59.0)

---
updated-dependencies:
- dependency-name: rollup
  dependency-version: 4.59.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2026-03-02 17:07:20 +01:00
kamran ul haq
da1abed3f8 ci: pin pip-audit action SHAs and update to v1.1.0 (#2884)
* Fix conftest imports to use relative imports

After adding tests/__init__.py, conftest.py needs relative imports

* ci: pin pip-audit action SHAs and update to v1.1.0

* revert: remove unrelated conftest.py changes
2026-02-27 17:36:28 +01:00
Capa Bot
3bce2a9b62 Sync capa rules submodule 2026-02-26 16:44:29 +00:00
Devyansh Somvanshi
d97b61551d webui: show error when JSON does not follow expected result document schema (#2871)
* webui: show error when JSON does not follow expected schema

Validate result document has required fields (meta, meta.version,
meta.analysis, meta.analysis.layout, rules) after parse. Show
user-friendly error; for URL loads suggest reanalyzing (e.g. VT).

Fixes #2363

* webui: fix array validation bug and deduplicate VT suggestion string

- introduce isInvalidObject() helper (checks !v || typeof !== "object" || Array.isArray)
  so that arrays are correctly rejected in schema validation
- extract VT_REANALYZE_SUGGESTION constant to eliminate the duplicated string
  in loadRdoc()

Addresses review feedback on #2871

* webui: address review - validate feature_counts, hoist VT_REANALYZE_SUGGESTION

- Add validation for meta.analysis.feature_counts in validateRdocSchema()
  so parseFunctionCapabilities and other consumers do not hit missing/invalid
  feature_counts at runtime.
- Require feature_counts to have either 'functions' or 'processes' array
  (static vs dynamic result documents).
- Move VT_REANALYZE_SUGGESTION to module top level to avoid redefining
  on every loadRdoc call.

* webui: allow file-scoped-only result documents in schema validation

- Validation: allow feature_counts without functions/processes arrays; if
  present they must be arrays.
- rdocParser: default feature_counts.functions to [] when missing so
  file-scoped-only docs do not throw.

* webui: remove leading space from VT_REANALYZE_SUGGESTION constant

Per review feedback: the concatenation at call sites handles spacing,
so the constant should not carry a leading space.
2026-02-26 09:35:24 -07:00
Devyansh Somvanshi
e1ffa1dd09 ida-explorer: fix TypeError when sorting locations with mixed address types (#2867)
* ida-explorer: fix TypeError when sorting mixed address types

When a feature has multiple locations and those locations contain a mix
of integer-based addresses (e.g. AbsoluteVirtualAddress) and non-integer
addresses (e.g. _NoAddress), calling sorted() raises a TypeError because
Python falls back to the reflected comparison (__gt__) which is not
defined on _NoAddress.

Add a sort key to sorted() that places integer-based addresses first
(sorted by value) and non-integer addresses last, avoiding the
cross-type comparison.

Fixes #2195

* ida-explorer: fix comparison at source so sorted(locations) works everywhere

Implement the gt solution per review: fix comparison for all addresses
so we can use sorted(locations) / sorted(addrs) consistently without
per-call-site sort keys.

- Add _NoAddress.__gt__ so mixed-type comparison works: (real_address <
  NO_ADDRESS) invokes it and NoAddress sorts last. Avoids TypeError
  when sorting AbsoluteVirtualAddress with _NoAddress.
- In ida/plugin/model.py, use sorted(locations) instead of a custom
  key. view.py (lines 1054, 1077) already use sorted(); they now work
  with mixed address types without change.

Fixes #2195

* changelog: move address sort fix to Bug Fixes section

Per maintainer feedback: fix applies beyond ida-explorer.
2026-02-26 09:33:05 -07:00
kamran ul haq
10dfd287b4 ci: cache vivisect workspaces between CI runs to speed up tests (#2881) 2026-02-25 23:15:13 +01:00
Devyansh Somvanshi
e9b3311338 binja: add mypy config for top-level binaryninja module (#2877)
Allow mypy to skip missing 'binaryninja' (not just binaryninja.*)
when the Binary Ninja API is not installed.

Fixes #2399
2026-02-24 08:57:46 -07:00
Moritz
54cc4ee7a3 Merge pull request #2873 from devs6186/fix/1865-vv-api-color 2026-02-23 23:01:39 +01:00
Capa Bot
12863ab4f2 Sync capa rules submodule 2026-02-23 20:52:03 +00:00
Devyansh Somvanshi
e41b5fb150 webui: fix 404 for "View rule in capa-rules" by using proper URL encoding (#2868)
* webui: fix 404 for \"View rule in capa-rules\" links

The createCapaRulesUrl function was constructing URLs by lowercasing
the rule name and replacing spaces with hyphens, which produced URLs
like /rules/packaged-as-single-file-.net-application/ (404).

The capa-rules website uses the original rule name with URL encoding
(e.g. /rules/packaged%20as%20single-file%20.NET%20application/).

Use encodeURIComponent() on the rule name to produce correct URLs.

Fixes #2482

* refactor: extract baseUrl constant in createCapaRulesUrl per code review
2026-02-23 13:10:31 -07:00
dependabot[bot]
4697902310 build(deps-dev): bump isort from 7.0.0 to 8.0.0 (#2879)
Bumps [isort](https://github.com/PyCQA/isort) from 7.0.0 to 8.0.0.
- [Release notes](https://github.com/PyCQA/isort/releases)
- [Changelog](https://github.com/PyCQA/isort/blob/main/CHANGELOG.md)
- [Commits](https://github.com/PyCQA/isort/compare/7.0.0...8.0.0)

---
updated-dependencies:
- dependency-name: isort
  dependency-version: 8.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-23 12:50:28 -07:00
Capa Bot
ed0783c31e Sync capa rules submodule 2026-02-23 16:33:25 +00:00
devs6186
f03ee75d69 doc: document that default output shows top-level matches only; -v/-vv show nested matches (#2875) 2026-02-22 07:41:15 +01:00
devs6186
f526357def main: suggest --os flag when ELF OS detection fails (#2869)
* main: suggest --os flag when OS detection fails for ELF files

When capa cannot detect the target OS of an ELF file, it exits with an
error. Some ELF files lack the standard metadata capa uses for OS
detection (GNU ABI tag, OSABI field, library dependencies, etc.) even
though they do target a valid OS (e.g. a stripped Linux binary using
only raw syscalls).

Add a hint to the unsupported-OS error message telling users they can
specify the OS explicitly with the --os flag, matching the workaround
recommended in the issue.

Fixes #2577
2026-02-20 14:28:43 +01:00
Moritz
c1ec826a9f Merge pull request #2866 from devs6186/fix/2699-rich-markup-escape
render: escape sample-controlled strings to prevent rich MarkupError
2026-02-20 14:06:45 +01:00
devs6186
5ef4ad96ee doc: fix typo and add documentation links in README
- usage.md: fix 'occurance' -> 'occurrence'
- README: add short doc links (usage, installation, limitations, FAQ)

Fixes #2274
2026-02-20 11:15:01 +01:00
Capa Bot
8aef630a7f Sync capa rules submodule 2026-02-19 20:33:40 +00:00
Moritz
d1c9d20668 Merge pull request #2865 from mandiant/lsc-1771433500.551532
Refactor Github Action per b/485167538
2026-02-19 21:32:29 +01:00
devs6186
8ccd35d0cf render: use default styling for dynamic -vv API/call details
Stop wrapping call name and arguments in mute (dim) so API details
remain readable in -vv output.

Fixes #1865
2026-02-20 00:52:14 +05:30
devs6186
3f72b43f48 render: escape sample-controlled strings to prevent rich MarkupError
Strings extracted from analyzed samples may contain bracket characters
that Rich interprets as markup (e.g. [/tag]). When these are embedded
directly in markup templates like f"[dim]{s}", Rich raises a
MarkupError if the brackets form an invalid tag.

Use rich.markup.escape() to sanitize all user-controlled strings before
embedding them in Rich markup templates in bold(), bold2(), mute(), and
warn().

Fixes #2699
2026-02-19 03:42:05 +05:30
Ben Knutson
f7bb889f30 Refactor Github Action per b/485167538 2026-02-18 16:51:42 +00:00
Capa Bot
e0bd6d5ea6 Sync capa rules submodule 2026-02-17 21:19:08 +00:00
Capa Bot
239bafd285 Sync capa-testfiles submodule 2026-02-17 21:10:09 +00:00
dependabot[bot]
2033c4ab83 build(deps-dev): bump pyinstaller from 6.18.0 to 6.19.0 (#2856)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.18.0 to 6.19.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.18.0...v6.19.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.19.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-17 13:40:23 -07:00
dependabot[bot]
cbe005ae0f bump ruff from 0.14.7 to 0.15.0 (#2853)
---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.15.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-09 13:55:24 -07:00
kamran ul haq
26aba8067f loader: handle SegmentationViolation for malformed ELF files (#2799)
Catch envi.exc.SegmentationViolation raised by vivisect when processing
malformed ELF files with invalid relocations and convert it to a
CorruptFile exception with a descriptive message.

Closes #2794

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 12:24:48 -07:00
Aditya Pandey
3582bce6fd vmray: skip processes with invalid PID or missing filename (#2807) (#2845)
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 12:11:26 -07:00
dependabot[bot]
535faf281d build(deps): bump protobuf from 6.33.1 to 6.33.5 (#2851)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.33.1 to 6.33.5.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.5
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 10:55:26 -07:00
dependabot[bot]
fe27335136 build(deps): bump pip from 25.3 to 26.0 (#2847)
Bumps [pip](https://github.com/pypa/pip) from 25.3 to 26.0.
- [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/pip/compare/25.3...26.0)

---
updated-dependencies:
- dependency-name: pip
  dependency-version: '26.0'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 10:53:55 -07:00
dependabot[bot]
a40ae162ef build(deps): bump dnfile from 0.17.0 to 0.18.0 (#2848)
Bumps [dnfile](https://github.com/malwarefrank/dnfile) from 0.17.0 to 0.18.0.
- [Changelog](https://github.com/malwarefrank/dnfile/blob/master/HISTORY.rst)
- [Commits](https://github.com/malwarefrank/dnfile/compare/v0.17.0...v0.18.0)

---
updated-dependencies:
- dependency-name: dnfile
  dependency-version: 0.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 10:50:00 -07:00
dependabot[bot]
1500a34984 build(deps): bump rich from 14.2.0 to 14.3.2 (#2849)
* build(deps): bump rich from 14.2.0 to 14.3.2

Bumps [rich](https://github.com/Textualize/rich) from 14.2.0 to 14.3.2.
- [Release notes](https://github.com/Textualize/rich/releases)
- [Changelog](https://github.com/Textualize/rich/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Textualize/rich/compare/v14.2.0...v14.3.2)

---
updated-dependencies:
- dependency-name: rich
  dependency-version: 14.3.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* add hiddenimports for rich module

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-02-05 09:31:15 -07:00
Daniel Adeboye
77440c03f5 vmray: extract number features for registry key handles (#2835)
* vmray: extract number features for whitelisted void_ptr parameters

* added changelog

* Apply suggestions from code review

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix lint

* fix lint

* fix test

* remove unused import

* Add hKey parameter extraction and tests

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2026-01-30 15:10:57 -07:00
Capa Bot
26fd6b8569 Sync capa rules submodule 2026-01-30 17:41:05 +00:00
Capa Bot
2540dd688b Sync capa rules submodule 2026-01-30 17:04:59 +00:00
Moritz
ff8e7ef52f Add AI usage checkbox (#2844)
* Add AI usage checkbox

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-29 09:12:54 -07:00
Capa Bot
6f078734c3 Sync capa rules submodule 2026-01-28 17:43:11 +00:00
Capa Bot
93c11d2d4e Sync capa-testfiles submodule 2026-01-28 16:22:42 +00:00
Capa Bot
89c71f4d81 Sync capa rules submodule 2026-01-26 16:41:20 +00:00
dependabot[bot]
9599fbac02 build(deps): bump setuptools from 80.9.0 to 80.10.1 (#2837)
Bumps [setuptools](https://github.com/pypa/setuptools) from 80.9.0 to 80.10.1.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v80.9.0...v80.10.1)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-version: 80.10.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-23 12:41:04 -07:00
dependabot[bot]
b4c0f1369e build(deps): bump pycparser from 2.23 to 3.0 (#2838)
Bumps [pycparser](https://github.com/eliben/pycparser) from 2.23 to 3.0.
- [Release notes](https://github.com/eliben/pycparser/releases)
- [Commits](https://github.com/eliben/pycparser/compare/release_v2.23...release_v3.00)

---
updated-dependencies:
- dependency-name: pycparser
  dependency-version: '3.0'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-23 12:37:46 -07:00
Daniel Adeboye
37f2a897ff tests: remove redundant test_ida_features.py (#2834) 2026-01-23 09:46:58 -07:00
Maijin
e39e610f66 Create a vivisect group in dependabot.yml (#2830)
* Add msgpack group in dependabot.yml

Add msgpack group in dependabot.yml

* Change to make a vivisect group

Change to make a vivisect group

* Update dependabot.yml
2026-01-23 09:37:04 -07:00
Maijin
073760f279 fix(lint): disable rule caching during linting (#2817) 2026-01-22 09:27:02 -07:00
dependabot[bot]
52a761ebb3 build(deps-dev): bump lodash from 4.17.21 to 4.17.23 in /web/explorer (#2833)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.21...4.17.23)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.17.23
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-22 08:56:03 -07:00
Moritz
2a44482076 Merge pull request #2821 from mandiant/dependabot/pip/mypy-protobuf-5.0.0
build(deps-dev): bump mypy-protobuf from 4.0.0 to 5.0.0
2026-01-20 10:31:57 +01:00
Moritz
a359745765 build(deps-dev): bump pyinstaller from 6.17.0 to 6.18.0 (#2822)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.17.0 to 6.18.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.17.0...v6.18.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.18.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-20 10:31:35 +01:00
Maijin
203cc0aa0c Merge pull request #2824 from Maijin/patch-1
Group pyasn modules and vivisect in dependabot.yml
2026-01-20 10:18:35 +01:00
Moritz
3642ca94a6 Merge pull request #2820 from mandiant/dependabot/pip/vivisect-1.3.0
build(deps): bump vivisect from 1.2.1 to 1.3.0
2026-01-19 20:57:00 +01:00
dependabot[bot]
8e233ca69d build(deps-dev): bump pyinstaller from 6.17.0 to 6.18.0
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.17.0 to 6.18.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.17.0...v6.18.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.18.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-19 16:45:40 +00:00
dependabot[bot]
d5c23486e3 build(deps-dev): bump mypy-protobuf from 4.0.0 to 5.0.0
Bumps [mypy-protobuf](https://github.com/nipunn1313/mypy-protobuf) from 4.0.0 to 5.0.0.
- [Changelog](https://github.com/nipunn1313/mypy-protobuf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/nipunn1313/mypy-protobuf/commits)

---
updated-dependencies:
- dependency-name: mypy-protobuf
  dependency-version: 5.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-19 16:45:32 +00:00
dependabot[bot]
7600dd077b build(deps): bump vivisect from 1.2.1 to 1.3.0
Bumps [vivisect](https://github.com/vivisect/vivisect) from 1.2.1 to 1.3.0.
- [Changelog](https://github.com/vivisect/vivisect/blob/master/CHANGELOG.rst)
- [Commits](https://github.com/vivisect/vivisect/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: vivisect
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-01-19 16:45:26 +00:00
Moritz
3de84eff1b Merge pull request #2813 from doomedraven/patch-1
Add '2.5-CAPE' to tested versions
2026-01-16 20:28:39 +01:00
doomedraven
7e16ed741c Add '2.5-CAPE' to tested versions
hello, we just released CAPE v2.5, there are no behavior/structural changes. Is focused on webgui improvements, and some other improvements that doesnt impact CAPA.
2026-01-16 14:58:48 +00:00
Mike Hunhoff
5a5545aa14 ghidra: fix unit tests (#2812)
* ghidra: fix unit tests

* fix formatting
2026-01-15 12:34:43 -07:00
Moritz
6ad4fbbb9b Merge pull request #2742 from mandiant/idalib-tests 2026-01-13 21:48:30 +01:00
dependabot[bot]
8105214dc6 build(deps-dev): bump build from 1.3.0 to 1.4.0 (#2809)
Bumps [build](https://github.com/pypa/build) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/pypa/build/releases)
- [Changelog](https://github.com/pypa/build/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pypa/build/compare/1.3.0...1.4.0)

---
updated-dependencies:
- dependency-name: build
  dependency-version: 1.4.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-13 09:08:58 -07:00
Willi Ballenthin
d1fc8446f6 pyproject: ida: silence SWIG related warnings from IDA bindings 2026-01-13 16:15:31 +01:00
Willi Ballenthin
0686305f43 ida: loader: load resource sections to help discovery of embedded files 2026-01-13 16:15:31 +01:00
Willi Ballenthin
8d6b878e79 ida: fix return value from open_database 2026-01-13 16:15:31 +01:00
Willi Ballenthin
3646fcefa2 ida: helpers: refactor discovery of alternative names 2026-01-13 16:15:31 +01:00
Willi Ballenthin
ce67d99e49 ida: skip function-name features for default names (sub_*) 2026-01-13 16:15:31 +01:00
Willi Ballenthin
c89871f257 ci: pin setup-uv 2026-01-13 16:15:31 +01:00
Willi Ballenthin
03cc901f7b tests: idalib: xfail resource test on 9.0 2026-01-13 16:15:31 +01:00
Willi Ballenthin
412ab62c42 ida: pep8 2026-01-13 16:15:31 +01:00
Willi Ballenthin
f72bd49a5f ci: enable testing of IDA 9.0 2026-01-13 16:15:31 +01:00
Willi Ballenthin
1d561bd038 tests: idalib: xfail two tests on 9.0 and 9.1 2026-01-13 16:15:31 +01:00
Willi Ballenthin
c5808c4c41 tests: idalib: use 9.1 instead of 9.0 as min ver
9.0 doesn't support disabling lumina (or loading resources, for that
matter, too)
2026-01-13 16:15:31 +01:00
Willi Ballenthin
200c8037dd tests: fix logging message 2026-01-13 16:15:31 +01:00
mr-tz
4fb6ac0d1b add ida version to test matrix name 2026-01-13 16:15:31 +01:00
mr-tz
87fb96d08b load resource for test sample 2026-01-13 16:15:31 +01:00
Willi Ballenthin
e1fd184805 ida: function: extract function name
somehow we were extracting alternate names but not function names
2026-01-13 16:15:31 +01:00
Willi Ballenthin
82be20be64 loader: idalib: disable lumina
see #2742 in which Lumina names overwrote names provided by debug info
2026-01-13 16:15:31 +01:00
Willi Ballenthin
132e64a991 tests: idalib: better detect missing idapro package 2026-01-13 16:15:31 +01:00
Willi Ballenthin
9c6db00775 ci: add configuration for idalib tests 2026-01-13 16:15:31 +01:00
Moritz
7bdd1f11bb Merge branch 'master' into idalib-tests 2026-01-13 16:15:31 +01:00
kamran ul haq
7f3e35ee62 loader: gracefully handle ELF files with unsupported architectures (#2800)
* loader: gracefully handle ELF files with unsupported architectures

When analyzing ELF files with unsupported architectures (e.g., ARM64 variant),
vivisect raises a generic Exception with message 'Unsupported Architecture: %d'.
This was not caught by existing error handlers, causing capa to crash with an
unfriendly error message.

This change adds exception handling to detect the 'Unsupported Architecture'
error message and convert it to a user-friendly CorruptFile exception,
following the same pattern as the existing 'Couldn't convert rva' handler.

The architecture number is extracted from the exception args and included
in the error message to help users understand what went wrong.

closes #2793

* loader: address review feedback for PR #2800

- Add e.args check to prevent IndexError when accessing exception arguments
- Use error_msg variable instead of directly accessing e.args[0]
- Update CHANGELOG to reference PR #2800 instead of issue #2793

Addresses feedback from @mike-hunhoff and gemini-code-assist bot

* chore: move unsupported architecture bug fix to master (unreleased) section
2026-01-09 16:20:43 -07:00
Capa Bot
80c085b08b Sync capa rules submodule 2026-01-06 17:02:03 +00:00
Capa Bot
bfd1b09176 Sync capa-testfiles submodule 2026-01-06 16:50:00 +00:00
dependabot[bot]
dc47de1439 build(deps): bump ruamel-yaml from 0.18.6 to 0.19.1 (#2803)
Bumps ruamel-yaml from 0.18.6 to 0.19.1.

---
updated-dependencies:
- dependency-name: ruamel-yaml
  dependency-version: 0.19.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 10:08:43 -07:00
dependabot[bot]
2f7db1f446 build(deps-dev): bump flake8-simplify from 0.22.0 to 0.30.0 (#2804)
Bumps [flake8-simplify](https://github.com/MartinThoma/flake8-simplify) from 0.22.0 to 0.30.0.
- [Release notes](https://github.com/MartinThoma/flake8-simplify/releases)
- [Changelog](https://github.com/MartinThoma/flake8-simplify/blob/main/CHANGELOG.md)
- [Commits](https://github.com/MartinThoma/flake8-simplify/commits)

---
updated-dependencies:
- dependency-name: flake8-simplify
  dependency-version: 0.30.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 10:08:35 -07:00
dependabot[bot]
0908343ca1 build(deps): bump intervaltree from 3.1.0 to 3.2.1 (#2805)
Bumps [intervaltree](https://github.com/chaimleib/intervaltree) from 3.1.0 to 3.2.1.
- [Release notes](https://github.com/chaimleib/intervaltree/releases)
- [Changelog](https://github.com/chaimleib/intervaltree/blob/master/CHANGELOG.md)
- [Commits](https://github.com/chaimleib/intervaltree/compare/3.1.0...3.2.1)

---
updated-dependencies:
- dependency-name: intervaltree
  dependency-version: 3.2.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 10:07:36 -07:00
dependabot[bot]
342cb9d15a build(deps-dev): bump psutil from 7.1.2 to 7.2.1 (#2806)
Bumps [psutil](https://github.com/giampaolo/psutil) from 7.1.2 to 7.2.1.
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-7.1.2...release-7.2.1)

---
updated-dependencies:
- dependency-name: psutil
  dependency-version: 7.2.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-05 10:07:17 -07:00
Capa Bot
9aad2591c4 Sync capa rules submodule 2025-12-29 17:21:22 +00:00
dependabot[bot]
1153ca4cf7 build(deps-dev): bump types-psutil from 7.1.3.20251202 to 7.2.0.20251228 (#2801)
Bumps [types-psutil](https://github.com/typeshed-internal/stub_uploader) from 7.1.3.20251202 to 7.2.0.20251228.
- [Commits](https://github.com/typeshed-internal/stub_uploader/commits)

---
updated-dependencies:
- dependency-name: types-psutil
  dependency-version: 7.2.0.20251228
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 09:48:14 -07:00
dependabot[bot]
4500dd80b3 build(deps-dev): bump pygithub from 2.6.0 to 2.8.1 (#2798)
Bumps [pygithub](https://github.com/pygithub/pygithub) from 2.6.0 to 2.8.1.
- [Release notes](https://github.com/pygithub/pygithub/releases)
- [Changelog](https://github.com/PyGithub/PyGithub/blob/main/doc/changes.rst)
- [Commits](https://github.com/pygithub/pygithub/compare/v2.6.0...v2.8.1)

---
updated-dependencies:
- dependency-name: pygithub
  dependency-version: 2.8.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 09:46:44 -07:00
dependabot[bot]
a35379d32b build(deps): bump humanize from 4.14.0 to 4.15.0 (#2797)
Bumps [humanize](https://github.com/python-humanize/humanize) from 4.14.0 to 4.15.0.
- [Release notes](https://github.com/python-humanize/humanize/releases)
- [Commits](https://github.com/python-humanize/humanize/compare/4.14.0...4.15.0)

---
updated-dependencies:
- dependency-name: humanize
  dependency-version: 4.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 09:44:53 -07:00
dependabot[bot]
29a8fa263e build(deps-dev): bump mypy-protobuf from 3.6.0 to 4.0.0 (#2796)
Bumps [mypy-protobuf](https://github.com/nipunn1313/mypy-protobuf) from 3.6.0 to 4.0.0.
- [Changelog](https://github.com/nipunn1313/mypy-protobuf/blob/main/CHANGELOG.md)
- [Commits](https://github.com/nipunn1313/mypy-protobuf/compare/v3.6.0...v4.0.0)

---
updated-dependencies:
- dependency-name: mypy-protobuf
  dependency-version: 4.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 09:44:12 -07:00
dependabot[bot]
5dcf98b1af build(deps-dev): bump pytest from 8.0.0 to 9.0.2 (#2795)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.0.0 to 9.0.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/8.0.0...9.0.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.2
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-29 09:43:11 -07:00
dependabot[bot]
0ad45bfdcc build(deps-dev): bump mypy from 1.17.1 to 1.19.1 (#2789)
Bumps [mypy](https://github.com/python/mypy) from 1.17.1 to 1.19.1.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.17.1...v1.19.1)

---
updated-dependencies:
- dependency-name: mypy
  dependency-version: 1.19.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2025-12-19 10:29:59 -07:00
dependabot[bot]
acad501b07 build(deps-dev): bump pyinstaller from 6.16.0 to 6.17.0 (#2790)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.16.0 to 6.17.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.16.0...v6.17.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.17.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2025-12-19 09:37:59 -07:00
dependabot[bot]
6da6035e7e build(deps-dev): bump isort from 6.0.0 to 7.0.0 (#2791)
Bumps [isort](https://github.com/PyCQA/isort) from 6.0.0 to 7.0.0.
- [Release notes](https://github.com/PyCQA/isort/releases)
- [Changelog](https://github.com/PyCQA/isort/blob/main/CHANGELOG.md)
- [Commits](https://github.com/PyCQA/isort/compare/6.0.0...7.0.0)

---
updated-dependencies:
- dependency-name: isort
  dependency-version: 7.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2025-12-19 09:36:51 -07:00
Mike Hunhoff
66dc70a775 ghidra: support PyGhidra (#2788)
* ghidra: init commit switch to PyGhidra

* update CHANGELOG and PyGhidra version requirements

* Update capa/features/extractors/ghidra/helpers.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix black errors

* support Ghidra v12

* remove deprecated APIs

* refactor outdated code

* fix pyinstaller, code refactoring

* address PR feedback

* add back capa_explorer.py

* beef up capa_explorer.py script

* refactor README

* refactor README

* fix #2747

* add sha256 check for workflows

* add sha256 check for workflows

* add sha256 check for workflows

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-18 17:55:49 -07:00
Moritz
50300f1c8e Merge pull request #2792 from mandiant/dependabot/pip/deptry-0.24.0 2025-12-16 21:08:55 +01:00
dependabot[bot]
03f94536ca build(deps-dev): bump deptry from 0.23.0 to 0.24.0
Bumps [deptry](https://github.com/fpgmaas/deptry) from 0.23.0 to 0.24.0.
- [Release notes](https://github.com/fpgmaas/deptry/releases)
- [Changelog](https://github.com/fpgmaas/deptry/blob/main/CHANGELOG.md)
- [Commits](https://github.com/fpgmaas/deptry/compare/0.23.0...0.24.0)

---
updated-dependencies:
- dependency-name: deptry
  dependency-version: 0.24.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-15 14:02:35 +00:00
mr-tz
dc08843e2d address idalib-based test fails 2025-12-11 14:18:13 +00:00
Moritz
40b01f0998 Merge pull request #2787 from mandiant/dependabot/pip/msgspec-0.20.0
build(deps): bump msgspec from 0.19.0 to 0.20.0
2025-12-11 11:17:57 +01:00
Moritz
b96a3b6b23 Merge pull request #2786 from mandiant/dependabot/pip/black-25.12.0
build(deps-dev): bump black from 25.11.0 to 25.12.0
2025-12-11 11:17:33 +01:00
Moritz
43e5e60901 Merge pull request #2785 from mandiant/dependabot/pip/types-psutil-7.1.3.20251202
build(deps-dev): bump types-psutil from 7.0.0.20250218 to 7.1.3.20251202
2025-12-11 11:17:14 +01:00
Moritz
0f9f72dbd5 build(deps-dev): bump flake8-bugbear from 25.10.21 to 25.11.29 (#2784)
Bumps [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) from 25.10.21 to 25.11.29.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/25.10.21...25.11.29)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-version: 25.11.29
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-11 11:16:51 +01:00
dependabot[bot]
fd9f584cc4 build(deps): bump msgspec from 0.19.0 to 0.20.0
Bumps [msgspec](https://github.com/jcrist/msgspec) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/jcrist/msgspec/releases)
- [Changelog](https://github.com/jcrist/msgspec/blob/main/docs/changelog.md)
- [Commits](https://github.com/jcrist/msgspec/compare/0.19.0...0.20.0)

---
updated-dependencies:
- dependency-name: msgspec
  dependency-version: 0.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 14:02:34 +00:00
dependabot[bot]
c3b785e217 build(deps-dev): bump black from 25.11.0 to 25.12.0
Bumps [black](https://github.com/psf/black) from 25.11.0 to 25.12.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.11.0...25.12.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.12.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 14:02:27 +00:00
dependabot[bot]
6ae17f7ef4 build(deps-dev): bump types-psutil from 7.0.0.20250218 to 7.1.3.20251202
Bumps [types-psutil](https://github.com/typeshed-internal/stub_uploader) from 7.0.0.20250218 to 7.1.3.20251202.
- [Commits](https://github.com/typeshed-internal/stub_uploader/commits)

---
updated-dependencies:
- dependency-name: types-psutil
  dependency-version: 7.1.3.20251202
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 14:02:22 +00:00
dependabot[bot]
13297ad324 build(deps-dev): bump flake8-bugbear from 25.10.21 to 25.11.29
Bumps [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) from 25.10.21 to 25.11.29.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/25.10.21...25.11.29)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-version: 25.11.29
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-08 14:02:19 +00:00
dependabot[bot]
9b42b45d21 build(deps-dev): bump flake8-bugbear from 24.12.12 to 25.10.21 (#2773)
* build(deps-dev): bump flake8-bugbear from 24.12.12 to 25.10.21

Bumps [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) from 24.12.12 to 25.10.21.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/24.12.12...25.10.21)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-version: 25.10.21
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix flake8 raised bugs

* use super

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2025-12-04 10:19:16 -07:00
Capa Bot
d17264c928 Sync capa rules submodule 2025-12-04 17:17:51 +00:00
Capa Bot
f313852e70 Sync capa rules submodule 2025-12-04 12:11:09 +00:00
Capa Bot
c0ae1352c6 Sync capa-testfiles submodule 2025-12-03 21:00:48 +00:00
dependabot[bot]
ccb3e6de74 build(deps-dev): bump flake8-comprehensions from 3.16.0 to 3.17.0 (#2782)
Bumps [flake8-comprehensions](https://github.com/adamchainz/flake8-comprehensions) from 3.16.0 to 3.17.0.
- [Changelog](https://github.com/adamchainz/flake8-comprehensions/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/adamchainz/flake8-comprehensions/compare/3.16.0...3.17.0)

---
updated-dependencies:
- dependency-name: flake8-comprehensions
  dependency-version: 3.17.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-01 11:27:08 -07:00
dependabot[bot]
26c6ffd62d build(deps-dev): bump ruff from 0.12.0 to 0.14.7 (#2781)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.12.0 to 0.14.7.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/0.12.0...0.14.7)

---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.14.7
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-01 11:26:34 -07:00
Capa Bot
18923601c7 Sync capa rules submodule 2025-11-25 20:39:18 +00:00
0x1622
1568ce4832 Use SafeLoader for YAML (#2776) 2025-11-25 07:01:23 -07:00
Mike Hunhoff
ffce77b13d ci: deprecate macos-13 runner and use Python v3.13 for testing (#2777) 2025-11-24 19:53:39 -07:00
Moritz
074f7c742c Merge branch 'master' into idalib-tests 2025-11-24 19:52:40 +01:00
dependabot[bot]
895b2440c0 build(deps-dev): bump pre-commit from 4.2.0 to 4.5.0 (#2772)
Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 4.2.0 to 4.5.0.
- [Release notes](https://github.com/pre-commit/pre-commit/releases)
- [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pre-commit/pre-commit/compare/v4.2.0...v4.5.0)

---
updated-dependencies:
- dependency-name: pre-commit
  dependency-version: 4.5.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 08:40:13 -07:00
dependabot[bot]
c901f809a2 build(deps-dev): bump black from 25.1.0 to 25.11.0 (#2771)
Bumps [black](https://github.com/psf/black) from 25.1.0 to 25.11.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.1.0...25.11.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 25.11.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 08:39:17 -07:00
dependabot[bot]
308b3e5c1c build(deps): bump xmltodict from 0.14.2 to 1.0.2 (#2774)
Bumps [xmltodict](https://github.com/martinblech/xmltodict) from 0.14.2 to 1.0.2.
- [Release notes](https://github.com/martinblech/xmltodict/releases)
- [Changelog](https://github.com/martinblech/xmltodict/blob/master/CHANGELOG.md)
- [Commits](https://github.com/martinblech/xmltodict/compare/v0.14.2...v1.0.2)

---
updated-dependencies:
- dependency-name: xmltodict
  dependency-version: 1.0.2
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-24 08:30:16 -07:00
Mike Hunhoff
7844ebb144 v9.3.1 (#2769) 2025-11-20 08:37:49 -07:00
dependabot[bot]
e393cff0e1 build(deps): bump glob from 10.4.2 to 10.5.0 in /web/explorer (#2766)
Bumps [glob](https://github.com/isaacs/node-glob) from 10.4.2 to 10.5.0.
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/node-glob/compare/v10.4.2...v10.5.0)

---
updated-dependencies:
- dependency-name: glob
  dependency-version: 10.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-19 08:34:25 -07:00
Mike Hunhoff
7780b9e8a8 explorer: add missing ida-netnode dependency to project.toml (#2765) 2025-11-18 08:55:57 -07:00
Mike Hunhoff
8d39765e7b ci: bump binja minor version (#2763) 2025-11-17 11:10:46 -07:00
dependabot[bot]
dec0bcfe79 build(deps-dev): bump js-yaml from 4.1.0 to 4.1.1 in /web/explorer (#2758)
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 4.1.0 to 4.1.1.
- [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md)
- [Commits](https://github.com/nodeca/js-yaml/compare/4.1.0...4.1.1)

---
updated-dependencies:
- dependency-name: js-yaml
  dependency-version: 4.1.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 10:02:54 -07:00
dependabot[bot]
99ccecba4e build(deps): bump humanize from 4.13.0 to 4.14.0 (#2762)
Bumps [humanize](https://github.com/python-humanize/humanize) from 4.13.0 to 4.14.0.
- [Release notes](https://github.com/python-humanize/humanize/releases)
- [Commits](https://github.com/python-humanize/humanize/compare/4.13.0...4.14.0)

---
updated-dependencies:
- dependency-name: humanize
  dependency-version: 4.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 09:10:40 -07:00
dependabot[bot]
af27463c37 build(deps-dev): bump pyinstaller from 6.14.1 to 6.16.0 (#2761)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.14.1 to 6.16.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.14.1...v6.16.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-version: 6.16.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 09:10:30 -07:00
dependabot[bot]
f4f47b4d55 build(deps): bump protobuf from 6.31.1 to 6.33.1 (#2760)
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 6.31.1 to 6.33.1.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 6.33.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 09:10:19 -07:00
dependabot[bot]
adc2401136 build(deps): bump pycparser from 2.22 to 2.23 (#2759)
Bumps [pycparser](https://github.com/eliben/pycparser) from 2.22 to 2.23.
- [Release notes](https://github.com/eliben/pycparser/releases)
- [Changelog](https://github.com/eliben/pycparser/blob/main/CHANGES)
- [Commits](https://github.com/eliben/pycparser/compare/release_v2.22...release_v2.23)

---
updated-dependencies:
- dependency-name: pycparser
  dependency-version: '2.23'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-11-17 09:10:09 -07:00
Willi Ballenthin
cf463676b2 fixtures: remove dups 2025-11-03 12:47:12 +01:00
Willi Ballenthin
b5e5840a63 lints 2025-10-29 20:29:08 +01:00
Willi Ballenthin
f252b6bbd0 changelog 2025-10-29 20:23:12 +01:00
Willi Ballenthin
eda53ab3c1 tests: add feature tests for idalib 2025-10-29 20:20:57 +01:00
113 changed files with 2308 additions and 2131 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "9.3.0"
current_version = "9.3.1"
[[tool.bumpversion.files]]
filename = "capa/version.py"

View File

@@ -4,6 +4,13 @@ updates:
directory: "/"
schedule:
interval: "weekly"
groups:
vivisect:
patterns:
- "vivisect"
- "pyasn1"
- "pyasn1-modules"
- "msgpack"
ignore:
- dependency-name: "*"
update-types: ["version-update:semver-patch"]

2
.github/flake8.ini vendored
View File

@@ -33,8 +33,6 @@ per-file-ignores =
scripts/*: T201
# capa.exe is meant to print output
capa/main.py: T201
# IDA tests emit results to output window so need to print
tests/test_ida_features.py: T201
# utility used to find the Binary Ninja API via invoking python.exe
capa/features/extractors/binja/find_binja_api.py: T201

View File

@@ -63,6 +63,9 @@ ignore_missing_imports = True
[mypy-PyQt5.*]
ignore_missing_imports = True
[mypy-binaryninja]
ignore_missing_imports = True
[mypy-binaryninja.*]
ignore_missing_imports = True

View File

@@ -20,3 +20,5 @@ closes #issue_number
- [ ] No new tests needed
<!-- Please help us keeping capa documentation up-to-date -->
- [ ] No documentation update needed
<!-- Please indicate if and how you have used AI to generate (parts of) your code submission. Include your prompt, model, tool, etc. -->
- [ ] This submission includes AI-generated code and I have provided details in the description.

View File

@@ -17,6 +17,8 @@ import sys
import capa.rules.cache
from PyInstaller.utils.hooks import collect_submodules
from pathlib import Path
# SPECPATH is a global variable which points to .spec file path
@@ -34,6 +36,7 @@ a = Analysis(
["../../capa/main.py"],
pathex=["capa"],
binaries=None,
hiddenimports=collect_submodules('rich'),
datas=[
# when invoking pyinstaller from the project root,
# this gets invoked from the directory of the spec file,
@@ -74,6 +77,7 @@ a = Analysis(
# only be installed locally.
"binaryninja",
"ida",
"ghidra",
# remove once https://github.com/mandiant/capa/issues/2681 has
# been addressed by PyInstaller
"pkg_resources",

62
.github/workflows/black-format.yml vendored Normal file
View File

@@ -0,0 +1,62 @@
name: black auto-format
on:
pull_request:
branches: [ master ]
paths-ignore:
- 'web/**'
- 'doc/**'
- '**.md'
workflow_dispatch: # allow manual trigger
permissions:
contents: write
jobs:
black-format:
# only run on dependabot PRs or manual trigger
if: github.actor == 'dependabot[bot]' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
ref: ${{ github.head_ref }}
# need a token with write access to push the commit
token: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Python 3.13
uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0
with:
python-version: "3.13"
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -e .[dev,scripts]
- name: Run isort
run: pre-commit run isort --all-files
- name: Run black/continue
# black returns non-zero error code after formatting, which is what we expect
continue-on-error: true
run: pre-commit run black --all-files
- name: Check for changes
id: changes
run: |
if git diff --quiet; then
echo "has_changes=false" >> "$GITHUB_OUTPUT"
else
echo "has_changes=true" >> "$GITHUB_OUTPUT"
fi
- name: Commit and push formatting changes
if: steps.changes.outputs.has_changes == 'true'
run: |
git config user.name "${GITHUB_ACTOR}"
git config user.email "${GITHUB_ACTOR_ID}+${GITHUB_ACTOR}@users.noreply.github.com"
git add -A
git commit -m "style: auto-format with black and isort"
git push

View File

@@ -28,6 +28,11 @@ jobs:
artifact_name: capa
asset_name: linux
python_version: '3.10'
# for Ghidra
java-version: '21'
ghidra-version: '12.0'
public-version: 'PUBLIC_20251205'
ghidra-sha256: 'af43e8cfb2fa4490cf6020c3a2bde25c159d83f45236a0542688a024e8fc1941'
- os: ubuntu-22.04-arm
artifact_name: capa
asset_name: linux-arm64
@@ -46,8 +51,8 @@ jobs:
# artifact_name: capa.exe
# asset_name: windows-arm64
# python_version: '3.12'
- os: macos-13
# use older macOS for assumed better portability
- os: macos-15-intel
# macos-15-intel is the lowest native intel build
artifact_name: capa
asset_name: macos
python_version: '3.10'
@@ -106,6 +111,24 @@ jobs:
run: |
7z e "tests/data/dynamic/cape/v2.2/d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json.gz"
dist/capa -d "d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json"
- name: Set up Java ${{ matrix.java-version }}
if: matrix.os == 'ubuntu-22.04' && matrix.python_version == '3.10'
uses: actions/setup-java@387ac29b308b003ca37ba93a6cab5eb57c8f5f93 # v4.0.0
with:
distribution: 'temurin'
java-version: ${{ matrix.java-version }}
- name: Install Ghidra ${{ matrix.ghidra-version }}
if: matrix.os == 'ubuntu-22.04' && matrix.python_version == '3.10'
run: |
mkdir ./.github/ghidra
wget "https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_${{ matrix.ghidra-version }}_build/ghidra_${{ matrix.ghidra-version }}_${{ matrix.public-version }}.zip" -O ./.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip
echo "${{ matrix.ghidra-sha256 }} ./.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip" | sha256sum -c -
unzip .github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip -d .github/ghidra/
- name: Does it run (Ghidra)?
if: matrix.os == 'ubuntu-22.04' && matrix.python_version == '3.10'
env:
GHIDRA_INSTALL_DIR: ${{ github.workspace }}/.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC
run: dist/capa -b ghidra -d "tests/data/Practical Malware Analysis Lab 01-01.dll_"
- uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4.3.1
with:
name: ${{ matrix.asset_name }}
@@ -144,7 +167,7 @@ jobs:
- name: Set zip name
run: echo "zip_name=capa-${GITHUB_REF#refs/tags/}-${{ matrix.asset_name }}.zip" >> $GITHUB_ENV
- name: Zip ${{ matrix.artifact_name }} into ${{ env.zip_name }}
run: zip ${{ env.zip_name }} ${{ matrix.artifact_name }}
run: zip ${ZIP_NAME} ${{ matrix.artifact_name }}
- name: Upload ${{ env.zip_name }} to GH Release
uses: svenstaro/upload-release-action@2728235f7dc9ff598bd86ce3c274b74f802d2208 # v2
with:

View File

@@ -14,8 +14,8 @@ jobs:
steps:
- name: Check out repository code
uses: actions/checkout@v4
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- uses: pypa/gh-action-pip-audit@v1.0.8
- uses: pypa/gh-action-pip-audit@1220774d901786e6f652ae159f7b6bc8fea6d266 # v1.1.0
with:
inputs: .

View File

@@ -21,8 +21,10 @@ jobs:
# user information is needed to create annotated tags (with a message)
git config user.email 'capa-dev@mandiant.com'
git config user.name 'Capa Bot'
name=${{ github.event.release.tag_name }}
name=${GITHUB_EVENT_RELEASE_TAG_NAME}
git tag $name -m "https://github.com/mandiant/capa/releases/$name"
env:
GITHUB_EVENT_RELEASE_TAG_NAME: ${{ github.event.release.tag_name }}
# TODO update branch name-major=${name%%.*}
- name: Push tag to capa-rules
uses: ad-m/github-push-action@d91a481090679876dfc4178fef17f286781251df # v0.8.0

View File

@@ -42,10 +42,10 @@ jobs:
- name: Checkout capa
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
# use latest available python to take advantage of best performance
- name: Set up Python 3.12
- name: Set up Python 3.13
uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0
with:
python-version: "3.12"
python-version: "3.13"
- name: Install dependencies
run: |
pip install -r requirements.txt
@@ -70,10 +70,10 @@ jobs:
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
submodules: recursive
- name: Set up Python 3.12
- name: Set up Python 3.13
uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0
with:
python-version: "3.12"
python-version: "3.13"
- name: Install capa
run: |
pip install -r requirements.txt
@@ -88,13 +88,11 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-22.04, windows-2022, macos-13]
os: [ubuntu-22.04, ubuntu-22.04-arm, windows-2022, macos-15-intel, macos-14]
# across all operating systems
python-version: ["3.10", "3.11"]
python-version: ["3.10", "3.13"]
include:
# on Ubuntu run these as well
- os: ubuntu-22.04
python-version: "3.10"
- os: ubuntu-22.04
python-version: "3.11"
- os: ubuntu-22.04
@@ -115,6 +113,11 @@ jobs:
run: |
pip install -r requirements.txt
pip install -e .[dev,scripts]
- name: Cache vivisect workspaces
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: tests/data/**/*.viv
key: viv-${{ runner.os }}-${{ runner.arch }}-${{ matrix.python-version }}-${{ hashFiles('**/requirements.txt') }}
- name: Run tests (fast)
# this set of tests runs about 80% of the cases in 20% of the time,
# and should catch most errors quickly.
@@ -131,7 +134,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11"]
python-version: ["3.10", "3.13"]
steps:
- name: Checkout capa with submodules
# do only run if BN_SERIAL is available, have to do this in every step, see https://github.com/orgs/community/discussions/26726#discussioncomment-3253118
@@ -157,7 +160,7 @@ jobs:
run: |
mkdir ./.github/binja
curl "https://raw.githubusercontent.com/Vector35/binaryninja-api/6812c97/scripts/download_headless.py" -o ./.github/binja/download_headless.py
python ./.github/binja/download_headless.py --serial ${{ env.BN_SERIAL }} --output .github/binja/BinaryNinja-headless.zip
python ./.github/binja/download_headless.py --serial ${BN_SERIAL} --output .github/binja/BinaryNinja-headless.zip
unzip .github/binja/BinaryNinja-headless.zip -d .github/binja/
python .github/binja/binaryninja/scripts/install_api.py --install-on-root --silent
- name: Run tests
@@ -173,11 +176,11 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11"]
java-version: ["17"]
ghidra-version: ["11.0.1"]
public-version: ["PUBLIC_20240130"] # for ghidra releases
ghidrathon-version: ["4.0.0"]
python-version: ["3.10", "3.13"]
java-version: ["21"]
ghidra-version: ["12.0"]
public-version: ["PUBLIC_20251205"] # for ghidra releases
ghidra-sha256: ['af43e8cfb2fa4490cf6020c3a2bde25c159d83f45236a0542688a024e8fc1941']
steps:
- name: Checkout capa with submodules
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
@@ -196,26 +199,66 @@ jobs:
run: |
mkdir ./.github/ghidra
wget "https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_${{ matrix.ghidra-version }}_build/ghidra_${{ matrix.ghidra-version }}_${{ matrix.public-version }}.zip" -O ./.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip
echo "${{ matrix.ghidra-sha256 }} ./.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip" | sha256sum -c -
unzip .github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC.zip -d .github/ghidra/
- name: Install Ghidrathon
run : |
mkdir ./.github/ghidrathon
wget "https://github.com/mandiant/Ghidrathon/releases/download/v${{ matrix.ghidrathon-version }}/Ghidrathon-v${{ matrix.ghidrathon-version}}.zip" -O ./.github/ghidrathon/ghidrathon-v${{ matrix.ghidrathon-version }}.zip
unzip .github/ghidrathon/ghidrathon-v${{ matrix.ghidrathon-version }}.zip -d .github/ghidrathon/
python -m pip install -r .github/ghidrathon/requirements.txt
python .github/ghidrathon/ghidrathon_configure.py $(pwd)/.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC
unzip .github/ghidrathon/Ghidrathon-v${{ matrix.ghidrathon-version }}.zip -d .github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC/Ghidra/Extensions
- name: Install pyyaml
run: sudo apt-get install -y libyaml-dev
- name: Install capa with Ghidra extra
run: |
pip install -e .[dev,ghidra]
- name: Run tests
env:
GHIDRA_INSTALL_DIR: ${{ github.workspace }}/.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC
run: pytest -v tests/test_ghidra_features.py
idalib-tests:
name: IDA ${{ matrix.ida.version }} tests for ${{ matrix.python-version }}
runs-on: ubuntu-22.04
needs: [tests]
env:
IDA_LICENSE_ID: ${{ secrets.IDA_LICENSE_ID }}
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.13"]
ida:
- version: 9.0
slug: "release/9.0/ida-essential/ida-essential_90_x64linux.run"
- version: 9.1
slug: "release/9.1/ida-essential/ida-essential_91_x64linux.run"
- version: 9.2
slug: "release/9.2/ida-essential/ida-essential_92_x64linux.run"
steps:
- name: Checkout capa with submodules
# do only run if IDA_LICENSE_ID is available, have to do this in every step, see https://github.com/orgs/community/discussions/26726#discussioncomment-3253118
if: ${{ env.IDA_LICENSE_ID != 0 }}
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
with:
submodules: recursive
- name: Set up Python ${{ matrix.python-version }}
if: ${{ env.IDA_LICENSE_ID != 0 }}
uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.0.0
with:
python-version: ${{ matrix.python-version }}
- name: Setup uv
if: ${{ env.IDA_LICENSE_ID != 0 }}
uses: astral-sh/setup-uv@61cb8a9741eeb8a550a1b8544337180c0fc8476b # v7.2.0
- name: Install dependencies
if: ${{ env.IDA_LICENSE_ID != 0 }}
run: sudo apt-get install -y libyaml-dev
- name: Install capa
if: ${{ env.IDA_LICENSE_ID != 0 }}
run: |
pip install -r requirements.txt
pip install -e .[dev,scripts]
pip install idapro
- name: Install IDA ${{ matrix.ida.version }}
if: ${{ env.IDA_LICENSE_ID != 0 }}
run: |
uv run hcli --disable-updates ida install --download-id ${{ matrix.ida.slug }} --license-id ${{ secrets.IDA_LICENSE_ID }} --set-default --yes
env:
HCLI_API_KEY: ${{ secrets.HCLI_API_KEY }}
IDA_LICENSE_ID: ${{ secrets.IDA_LICENSE_ID }}
- name: Run tests
run: |
mkdir ./.github/ghidra/project
.github/ghidra/ghidra_${{ matrix.ghidra-version }}_PUBLIC/support/analyzeHeadless .github/ghidra/project ghidra_test -Import ./tests/data/mimikatz.exe_ -ScriptPath ./tests/ -PostScript test_ghidra_features.py > ../output.log
cat ../output.log
exit_code=$(cat ../output.log | grep exit | awk '{print $NF}')
exit $exit_code
if: ${{ env.IDA_LICENSE_ID != 0 }}
run: pytest -v tests/test_idalib_features.py # explicitly refer to the idalib tests for performance. other tests run above.

View File

@@ -18,14 +18,18 @@ jobs:
- uses: actions/checkout@v4
- name: Set release name
run: echo "RELEASE_NAME=capa-explorer-web-v${{ github.event.inputs.version }}-${GITHUB_SHA::7}" >> $GITHUB_ENV
run: echo "RELEASE_NAME=capa-explorer-web-v${GITHUB_EVENT_INPUTS_VERSION}-${GITHUB_SHA::7}" >> $GITHUB_ENV
env:
GITHUB_EVENT_INPUTS_VERSION: ${{ github.event.inputs.version }}
- name: Check if release already exists
run: |
if ls web/explorer/releases/capa-explorer-web-v${{ github.event.inputs.version }}-* 1> /dev/null 2>&1; then
echo "::error:: A release with version ${{ github.event.inputs.version }} already exists"
if ls web/explorer/releases/capa-explorer-web-v${GITHUB_EVENT_INPUTS_VERSION}-* 1> /dev/null 2>&1; then
echo "::error:: A release with version ${GITHUB_EVENT_INPUTS_VERSION} already exists"
exit 1
fi
env:
GITHUB_EVENT_INPUTS_VERSION: ${{ github.event.inputs.version }}
- name: Set up Node.js
uses: actions/setup-node@0a44ba7841725637a19e28fa30b79a866c81b0a6 # v4.0.4
@@ -43,24 +47,24 @@ jobs:
working-directory: web/explorer
- name: Compress bundle
run: zip -r ${{ env.RELEASE_NAME }}.zip capa-explorer-web
run: zip -r ${RELEASE_NAME}.zip capa-explorer-web
working-directory: web/explorer
- name: Create releases directory
run: mkdir -vp web/explorer/releases
- name: Move release to releases folder
run: mv web/explorer/${{ env.RELEASE_NAME }}.zip web/explorer/releases
run: mv web/explorer/${RELEASE_NAME}.zip web/explorer/releases
- name: Compute release SHA256 hash
run: |
echo "RELEASE_SHA256=$(sha256sum web/explorer/releases/${{ env.RELEASE_NAME }}.zip | awk '{print $1}')" >> $GITHUB_ENV
echo "RELEASE_SHA256=$(sha256sum web/explorer/releases/${RELEASE_NAME}.zip | awk '{print $1}')" >> $GITHUB_ENV
- name: Update CHANGELOG.md
run: |
echo "## ${{ env.RELEASE_NAME }}" >> web/explorer/releases/CHANGELOG.md
echo "## ${RELEASE_NAME}" >> web/explorer/releases/CHANGELOG.md
echo "- Release Date: $(date -u '+%Y-%m-%d %H:%M:%S UTC')" >> web/explorer/releases/CHANGELOG.md
echo "- SHA256: ${{ env.RELEASE_SHA256 }}" >> web/explorer/releases/CHANGELOG.md
echo "- SHA256: ${RELEASE_SHA256}" >> web/explorer/releases/CHANGELOG.md
echo "" >> web/explorer/releases/CHANGELOG.md
cat web/explorer/releases/CHANGELOG.md
@@ -73,7 +77,7 @@ jobs:
run: |
git config --local user.email "capa-dev@mandiant.com"
git config --local user.name "Capa Bot"
git add -f web/explorer/releases/${{ env.RELEASE_NAME }}.zip web/explorer/releases/CHANGELOG.md
git add -f web/explorer/releases/${RELEASE_NAME}.zip web/explorer/releases/CHANGELOG.md
git add -u web/explorer/releases/
- name: Create Pull Request

View File

@@ -136,8 +136,8 @@ repos:
- "tests/"
- "--ignore=tests/test_binja_features.py"
- "--ignore=tests/test_ghidra_features.py"
- "--ignore=tests/test_ida_features.py"
- "--ignore=tests/test_viv_features.py"
- "--ignore=tests/test_idalib_features.py"
- "--ignore=tests/test_main.py"
- "--ignore=tests/test_scripts.py"
always_run: true

View File

@@ -4,20 +4,91 @@
### New Features
- ghidra: support PyGhidra @mike-hunhoff #2788
- vmray: extract number features from whitelisted void_ptr parameters (hKey, hKeyRoot) @adeboyedn #2835
### Breaking Changes
### New Rules (0)
### New Rules (23)
- nursery/run-as-nodejs-native-module mehunhoff@google.com
- nursery/inject-shellcode-using-thread-pool-work-insertion-with-tp_io still@teamt5.org
- nursery/inject-shellcode-using-thread-pool-work-insertion-with-tp_timer still@teamt5.org
- nursery/inject-shellcode-using-thread-pool-work-insertion-with-tp_work still@teamt5.org
- data-manipulation/encryption/hc-256/encrypt-data-using-hc-256 wballenthin@hex-rays.com
- anti-analysis/anti-llm/terminate-anthropic-session-via-magic-strings wballenthin@hex-rays.com
- nursery/access-aws-credentials maximemorin@google.com
- nursery/access-cloudflare-credentials maximemorin@google.com
- nursery/access-docker-credentials maximemorin@google.com
- nursery/access-gcp-credentials maximemorin@google.com
- nursery/access-kubernetes-credentials maximemorin@google.com
- nursery/enumerate-aws-cloudformation maximemorin@google.com
- nursery/enumerate-aws-cloudtrail maximemorin@google.com
- nursery/enumerate-aws-direct-connect maximemorin@google.com
- nursery/enumerate-aws-ec2 maximemorin@google.com
- nursery/enumerate-aws-iam maximemorin@google.com
- nursery/enumerate-aws-s3 maximemorin@google.com
- nursery/enumerate-aws-support-cases maximemorin@google.com
- persistence/registry/persist-via-shellserviceobjectdelayload-registry-key xpzhxhm@gmail.com
- nursery/get-http-response-date @cosmoworker
- host-interaction/process/create/create-process-in-dotnet moritz.raabe@mandiant.com social.tarang@gmail.com
- nursery/read-file-in-dotnet moritz.raabe@mandiant.com anushka.virgaonkar@mandiant.com
- nursery/write-file-in-dotnet william.ballenthin@mandiant.com anushka.virgaonkar@mandiant.com
-
### Bug Fixes
- main: suggest --os flag in unsupported OS error message to help users override ELF OS detection @devs6186 #2577
- render: escape sample-controlled strings before passing to Rich to prevent MarkupError @devs6186 #2699
- rules: handle empty or invalid YAML documents gracefully in `Rule.from_yaml` and `get_rules` @devs6186 #2900
- Fixed insecure deserialization vulnerability in YAML loading @0x1622 (#2770)
- loader: gracefully handle ELF files with unsupported architectures kamranulhaq2002@gmail.com #2800
- loader: handle SegmentationViolation for malformed ELF files @kami922 #2799
- lint: disable rule caching during linting @Maijin #2817
- vmray: skip processes with invalid PID or missing filename @EclipseAditya #2807
- features: fix Regex.get_value_str() returning escaped pattern instead of raw regex @EclipseAditya #1909
- render: use default styling for dynamic -vv API/call details so they are easier to see @devs6186 #1865
- loader: handle struct.error from dnfile and show clear CorruptFile message @devs6186 #2442
- address: fix TypeError when sorting locations containing mixed address types @devs6186 #2195
- loader: skip PE files with unrealistically large section virtual sizes to prevent resource exhaustion @devs6186 #1989
- engine/render: fix unbounded range sentinel precedence so `count(...): N or more` uses explicit `((1 << 64) - 1)` @blenbot #2936
- cache: support *BSD @williballenthin @res2500 #2949
### capa Explorer Web
- webui: fix 404 for "View rule in capa-rules" by using encodeURIComponent for rule name in URL @devs6186 #2482
- webui: show error when JSON does not follow expected result document schema; suggest reanalyzing for VT URLs @devs6186 #2363
- webui: fix global search to match feature types (match, regex, api, …) @devs6186 #2349
### capa Explorer IDA Pro plugin
### Performance
- perf: eliminate O(n²) tuple growth and reduce per-match overhead @devs6186 #2890
### Development
- doc: document that default output shows top-level matches only; -v/-vv show nested matches @devs6186 #1410
- doc: fix typo in usage.md, add documentation links to README @devs6186 #2274
- doc: add table comparing ways to consume capa output (CLI, IDA, Ghidra, dynamic sandbox, web) @devs6186 #2273
- binja: add mypy config for top-level binaryninja module to fix mypy issues @devs6186 #2399
- ci: deprecate macos-13 runner and use Python v3.13 for testing @mike-hunhoff #2777
- ci: pin pip-audit action SHAs and update to v1.1.0 @kami922 #1131
### Raw diffs
- [capa v9.3.1...master](https://github.com/mandiant/capa/compare/v9.3.1...master)
- [capa-rules v9.3.1...master](https://github.com/mandiant/capa-rules/compare/v9.3.1...master)
## v9.3.1
This patch release fixes a missing import for the capa explorer plugin for IDA Pro.
### Bug Fixes
- add missing ida-netnode dependency to project.toml @mike-hunhoff #2765
### Development
- ci: bump binja min version @mike-hunhoff #2763
### Raw diffs
- [capa v9.3.0...master](https://github.com/mandiant/capa/compare/v9.3.0...master)
- [capa-rules v9.3.0...master](https://github.com/mandiant/capa-rules/compare/v9.3.0...master)
@@ -31,6 +102,7 @@ Additionally a Binary Ninja bug has been fixed. Released binaries now include AR
### New Features
- ci: add support for arm64 binary releases
- tests: run tests against IDA via idalib @williballenthin #2742
### Breaking Changes

View File

@@ -87,6 +87,8 @@ Download stable releases of the standalone capa binaries [here](https://github.c
To use capa as a library or integrate with another tool, see [doc/installation.md](https://github.com/mandiant/capa/blob/master/doc/installation.md) for further setup instructions.
**Documentation:** [Usage and tips](doc/usage.md) · [Installation](doc/installation.md) · [Limitations](doc/limitations.md) · [FAQ](doc/faq.md)
# capa Explorer Web
The [capa Explorer Web](https://mandiant.github.io/capa/explorer/) enables you to interactively explore capa results in your web browser. Besides the online version you can download a standalone HTML file for local offline usage.
@@ -291,11 +293,17 @@ It also uses your local changes to the .idb to extract better features, such as
![capa + IDA Pro integration](https://github.com/mandiant/capa/blob/master/doc/img/explorer_expanded.png)
# Ghidra integration
If you use Ghidra, then you can use the [capa + Ghidra integration](/capa/ghidra/) to run capa's analysis directly on your Ghidra database and render the results in Ghidra's user interface.
capa supports using Ghidra (via [PyGhidra](https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Features/PyGhidra)) as a feature extraction backend. This allows you to run capa against binaries using Ghidra's analysis engine.
You can run and view capa results in the Ghidra UI using [capa explorer for Ghidra](https://github.com/mandiant/capa/tree/master/capa/ghidra/plugin).
<img src="https://github.com/mandiant/capa/assets/66766340/eeae33f4-99d4-42dc-a5e8-4c1b8c661492" width=300>
You can also run capa from the command line using the [Ghidra backend](https://github.com/mandiant/capa/tree/master/capa/ghidra).
# blog posts
- [Riding Dragons: capa Harnesses Ghidra](https://www.mandiant.com/resources/blog/capa-harnesses-ghidra)
- [Dynamic capa: Exploring Executable Run-Time Behavior with the CAPE Sandbox](https://www.mandiant.com/resources/blog/dynamic-capa-executable-behavior-cape-sandbox)
- [capa v4: casting a wider .NET](https://www.mandiant.com/resources/blog/capa-v4-casting-wider-net) (.NET support)
- [ELFant in the Room capa v3](https://www.mandiant.com/resources/elfant-in-the-room-capa-v3) (ELF support)

View File

@@ -277,7 +277,9 @@ def find_dynamic_capabilities(
all_span_matches: MatchResults = collections.defaultdict(list)
all_call_matches: MatchResults = collections.defaultdict(list)
feature_counts = rdoc.DynamicFeatureCounts(file=0, processes=())
# Accumulate into a list to avoid O(n²) tuple concatenation.
# Tuples are immutable, so `t += (x,)` copies the entire tuple each time.
process_feature_counts: list[rdoc.ProcessFeatureCount] = []
assert isinstance(extractor, DynamicFeatureExtractor)
processes: list[ProcessHandle] = list(extractor.get_processes())
@@ -289,10 +291,10 @@ def find_dynamic_capabilities(
task = pbar.add_task("matching", total=n_processes, unit="processes")
for p in processes:
process_capabilities = find_process_capabilities(ruleset, extractor, p)
feature_counts.processes += (
process_feature_counts.append(
rdoc.ProcessFeatureCount(
address=frz.Address.from_capa(p.address), count=process_capabilities.feature_count
),
)
)
for rule_name, res in process_capabilities.process_matches.items():
@@ -317,7 +319,11 @@ def find_dynamic_capabilities(
capa.engine.index_rule_matches(process_and_lower_features, rule, locations)
all_file_capabilities = find_file_capabilities(ruleset, extractor, process_and_lower_features)
feature_counts.file = all_file_capabilities.feature_count
feature_counts = rdoc.DynamicFeatureCounts(
file=all_file_capabilities.feature_count,
processes=tuple(process_feature_counts),
)
matches = dict(
itertools.chain(

View File

@@ -156,8 +156,11 @@ def find_static_capabilities(
all_bb_matches: MatchResults = collections.defaultdict(list)
all_insn_matches: MatchResults = collections.defaultdict(list)
feature_counts = rdoc.StaticFeatureCounts(file=0, functions=())
library_functions: tuple[rdoc.LibraryFunction, ...] = ()
# Accumulate into lists to avoid O(n²) tuple concatenation.
# Tuples are immutable, so `t += (x,)` copies the entire tuple each time.
# For binaries with thousands of functions this becomes quadratic in memory work.
function_feature_counts: list[rdoc.FunctionFeatureCount] = []
library_functions_list: list[rdoc.LibraryFunction] = []
assert isinstance(extractor, StaticFeatureExtractor)
functions: list[FunctionHandle] = list(extractor.get_functions())
@@ -176,20 +179,20 @@ def find_static_capabilities(
if extractor.is_library_function(f.address):
function_name = extractor.get_function_name(f.address)
logger.debug("skipping library function 0x%x (%s)", f.address, function_name)
library_functions += (
rdoc.LibraryFunction(address=frz.Address.from_capa(f.address), name=function_name),
library_functions_list.append(
rdoc.LibraryFunction(address=frz.Address.from_capa(f.address), name=function_name)
)
n_libs = len(library_functions)
n_libs = len(library_functions_list)
percentage = round(100 * (n_libs / n_funcs))
pbar.update(task, postfix=f"skipped {n_libs} library functions, {percentage}%")
pbar.advance(task)
continue
code_capabilities = find_code_capabilities(ruleset, extractor, f)
feature_counts.functions += (
function_feature_counts.append(
rdoc.FunctionFeatureCount(
address=frz.Address.from_capa(f.address), count=code_capabilities.feature_count
),
)
)
t1 = time.time()
@@ -230,7 +233,11 @@ def find_static_capabilities(
capa.engine.index_rule_matches(function_and_lower_features, rule, locations)
all_file_capabilities = find_file_capabilities(ruleset, extractor, function_and_lower_features)
feature_counts.file = all_file_capabilities.feature_count
feature_counts = rdoc.StaticFeatureCounts(
file=all_file_capabilities.feature_count,
functions=tuple(function_feature_counts),
)
matches: MatchResults = dict(
itertools.chain(
@@ -244,4 +251,4 @@ def find_static_capabilities(
)
)
return Capabilities(matches, feature_counts, library_functions)
return Capabilities(matches, feature_counts, tuple(library_functions_list))

View File

@@ -227,7 +227,7 @@ class Range(Statement):
super().__init__(description=description)
self.child = child
self.min = min if min is not None else 0
self.max = max if max is not None else (1 << 64 - 1)
self.max = max if max is not None else ((1 << 64) - 1)
def evaluate(self, features: FeatureSet, short_circuit=True):
capa.perf.counters["evaluate.feature"] += 1
@@ -240,7 +240,7 @@ class Range(Statement):
return Result(self.min <= count <= self.max, self, [], locations=features.get(self.child))
def __str__(self):
if self.max == (1 << 64 - 1):
if self.max == ((1 << 64) - 1):
return f"range({str(self.child)}, min={self.min}, max=infinity)"
else:
return f"range({str(self.child)}, min={self.min}, max={self.max})"

View File

@@ -189,6 +189,11 @@ class _NoAddress(Address):
def __lt__(self, other):
return False
def __gt__(self, other):
# Mixed-type comparison: (real_address < NO_ADDRESS) invokes this so sort works.
# NoAddress sorts last.
return other is not self
def __hash__(self):
return hash(0)

View File

@@ -369,6 +369,12 @@ class Regex(String):
else:
return Result(False, _MatchedRegex(self, {}), [])
def get_value_str(self) -> str:
# return the raw regex pattern, not the escaped version from String.get_value_str().
# see #1909.
assert isinstance(self.value, str)
return self.value
def __str__(self):
assert isinstance(self.value, str)
return f"regex(string =~ {self.value})"

View File

@@ -20,6 +20,7 @@ Proto files generated via protobuf v24.4:
from BinExport2 at 6916731d5f6693c4a4f0a052501fd3bd92cfd08b
https://github.com/google/binexport/blob/6916731/binexport2.proto
"""
import io
import hashlib
import logging

View File

@@ -84,16 +84,14 @@ def extract_insn_number_features(
yield OperandOffset(i, value), ih.address
OFFSET_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
OFFSET_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack), #int] ; capture #int
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack), #int]! ; capture #int
ldr|ldrb|ldrh|ldrsb|ldrsh|ldrex|ldrd|str|strb|strh|strex|strd reg, [reg(not-stack)], #int ; capture #int
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack), #int] ; capture #int
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack), #int]! ; capture #int
ldp|ldpd|stp|stpd reg, reg, [reg(not-stack)], #int ; capture #int
"""
)
""")
def extract_insn_offset_features(
@@ -117,12 +115,10 @@ def extract_insn_offset_features(
yield OperandOffset(match.operand_index, value), ih.address
NZXOR_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
NZXOR_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
eor reg, reg, reg
eor reg, reg, #int
"""
)
""")
def extract_insn_nzxor_characteristic_features(
@@ -144,11 +140,9 @@ def extract_insn_nzxor_characteristic_features(
yield Characteristic("nzxor"), ih.address
INDIRECT_CALL_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
INDIRECT_CALL_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
blx|bx|blr reg
"""
)
""")
def extract_function_indirect_call_characteristic_features(

View File

@@ -34,17 +34,14 @@ from capa.features.extractors.binexport2.arch.intel.helpers import SECURITY_COOK
logger = logging.getLogger(__name__)
IGNORE_NUMBER_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
IGNORE_NUMBER_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
ret #int
retn #int
add reg(stack), #int
sub reg(stack), #int
"""
)
""")
NUMBER_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
NUMBER_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
push #int0 ; capture #int0
# its a little tedious to enumerate all the address forms
@@ -64,8 +61,7 @@ NUMBER_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
# imagine reg is zero'd out, then this is like `mov reg, #int`
# which is not uncommon.
lea reg, [reg + #int] ; capture #int
"""
)
""")
def extract_insn_number_features(
@@ -100,8 +96,7 @@ def extract_insn_number_features(
yield OperandOffset(match.operand_index, value), ih.address
OFFSET_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
OFFSET_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
mov|movzx|movsb|cmp [reg + reg * #int + #int0], #int ; capture #int0
mov|movzx|movsb|cmp [reg * #int + #int0], #int ; capture #int0
mov|movzx|movsb|cmp [reg + reg + #int0], #int ; capture #int0
@@ -114,18 +109,15 @@ OFFSET_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
mov|movzx|movsb|cmp|lea reg, [reg * #int + #int0] ; capture #int0
mov|movzx|movsb|cmp|lea reg, [reg + reg + #int0] ; capture #int0
mov|movzx|movsb|cmp|lea reg, [reg(not-stack) + #int0] ; capture #int0
"""
)
""")
# these are patterns that access offset 0 from some pointer
# (pointer is not the stack pointer).
OFFSET_ZERO_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
OFFSET_ZERO_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
mov|movzx|movsb [reg(not-stack)], reg
mov|movzx|movsb [reg(not-stack)], #int
lea reg, [reg(not-stack)]
"""
)
""")
def extract_insn_offset_features(
@@ -189,12 +181,10 @@ def is_security_cookie(
return False
NZXOR_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
NZXOR_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
xor|xorpd|xorps|pxor reg, reg
xor|xorpd|xorps|pxor reg, #int
"""
)
""")
def extract_insn_nzxor_characteristic_features(
@@ -228,8 +218,7 @@ def extract_insn_nzxor_characteristic_features(
yield Characteristic("nzxor"), ih.address
INDIRECT_CALL_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
"""
INDIRECT_CALL_PATTERNS = BinExport2InstructionPatternMatcher.from_str("""
call|jmp reg0
call|jmp [reg + reg * #int + #int]
call|jmp [reg + reg * #int]
@@ -237,8 +226,7 @@ INDIRECT_CALL_PATTERNS = BinExport2InstructionPatternMatcher.from_str(
call|jmp [reg + reg + #int]
call|jmp [reg + #int]
call|jmp [reg]
"""
)
""")
def extract_function_indirect_call_characteristic_features(

View File

@@ -35,7 +35,7 @@ from capa.features.extractors.base_extractor import (
logger = logging.getLogger(__name__)
TESTED_VERSIONS = {"2.2-CAPE", "2.4-CAPE"}
TESTED_VERSIONS = {"2.2-CAPE", "2.4-CAPE", "2.5-CAPE"}
class CapeExtractor(DynamicFeatureExtractor):

View File

@@ -27,7 +27,12 @@ import capa.features.extractors.dnfile.file
import capa.features.extractors.dnfile.insn
import capa.features.extractors.dnfile.function
from capa.features.common import Feature
from capa.features.address import NO_ADDRESS, Address, DNTokenAddress, DNTokenOffsetAddress
from capa.features.address import (
NO_ADDRESS,
Address,
DNTokenAddress,
DNTokenOffsetAddress,
)
from capa.features.extractors.dnfile.types import DnType, DnUnmanagedMethod
from capa.features.extractors.base_extractor import (
BBHandle,
@@ -39,6 +44,7 @@ from capa.features.extractors.base_extractor import (
from capa.features.extractors.dnfile.helpers import (
get_dotnet_types,
get_dotnet_fields,
load_dotnet_image,
get_dotnet_managed_imports,
get_dotnet_managed_methods,
get_dotnet_unmanaged_imports,
@@ -83,7 +89,7 @@ class DnFileFeatureExtractorCache:
class DnfileFeatureExtractor(StaticFeatureExtractor):
def __init__(self, path: Path):
self.pe: dnfile.dnPE = dnfile.dnPE(str(path))
self.pe = load_dotnet_image(path)
super().__init__(hashes=SampleHashes.from_bytes(path.read_bytes()))
# pre-compute .NET token lookup tables; each .NET method has access to this cache for feature extraction
@@ -112,7 +118,12 @@ class DnfileFeatureExtractor(StaticFeatureExtractor):
fh: FunctionHandle = FunctionHandle(
address=DNTokenAddress(token),
inner=method,
ctx={"pe": self.pe, "calls_from": set(), "calls_to": set(), "cache": self.token_cache},
ctx={
"pe": self.pe,
"calls_from": set(),
"calls_to": set(),
"cache": self.token_cache,
},
)
# method tokens should be unique

View File

@@ -15,8 +15,10 @@
from __future__ import annotations
import struct
import logging
from typing import Union, Iterator, Optional
from pathlib import Path
import dnfile
from dncil.cil.body import CilMethodBody
@@ -30,6 +32,16 @@ from capa.features.extractors.dnfile.types import DnType, DnUnmanagedMethod
logger = logging.getLogger(__name__)
def load_dotnet_image(path: Path) -> dnfile.dnPE:
"""load a .NET PE file, raising CorruptFile on struct.error with the original error message."""
try:
return dnfile.dnPE(str(path))
except struct.error as e:
from capa.loader import CorruptFile
raise CorruptFile(f"Invalid or truncated .NET metadata: {e}") from e
class DnfileMethodBodyReader(CilMethodBodyReaderBase):
def __init__(self, pe: dnfile.dnPE, row: dnfile.mdtable.MethodDefRow):
self.pe: dnfile.dnPE = pe
@@ -151,7 +163,9 @@ def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnType]:
)
def get_dotnet_methoddef_property_accessors(pe: dnfile.dnPE) -> Iterator[tuple[int, str]]:
def get_dotnet_methoddef_property_accessors(
pe: dnfile.dnPE,
) -> Iterator[tuple[int, str]]:
"""get MethodDef methods used to access properties
see https://www.ntcore.com/files/dotnetformat.htm
@@ -226,7 +240,13 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
yield DnType(token, typedefname, namespace=typedefnamespace, member=method_name, access=access)
yield DnType(
token,
typedefname,
namespace=typedefnamespace,
member=method_name,
access=access,
)
def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
@@ -259,7 +279,9 @@ def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
yield DnType(token, typedefname, namespace=typedefnamespace, member=field.row.Name)
def get_dotnet_managed_method_bodies(pe: dnfile.dnPE) -> Iterator[tuple[int, CilMethodBody]]:
def get_dotnet_managed_method_bodies(
pe: dnfile.dnPE,
) -> Iterator[tuple[int, CilMethodBody]]:
"""get managed methods from MethodDef table"""
for rid, method_def in iter_dotnet_table(pe, dnfile.mdtable.MethodDef.number):
assert isinstance(method_def, dnfile.mdtable.MethodDefRow)
@@ -338,7 +360,10 @@ def get_dotnet_table_row(pe: dnfile.dnPE, table_index: int, row_index: int) -> O
def resolve_nested_typedef_name(
nested_class_table: dict, index: int, typedef: dnfile.mdtable.TypeDefRow, pe: dnfile.dnPE
nested_class_table: dict,
index: int,
typedef: dnfile.mdtable.TypeDefRow,
pe: dnfile.dnPE,
) -> tuple[str, tuple[str, ...]]:
"""Resolves all nested TypeDef class names. Returns the namespace as a str and the nested TypeRef name as a tuple"""

View File

@@ -42,6 +42,7 @@ from capa.features.extractors.dnfile.types import DnType
from capa.features.extractors.base_extractor import SampleHashes, StaticFeatureExtractor
from capa.features.extractors.dnfile.helpers import (
iter_dotnet_table,
load_dotnet_image,
is_dotnet_mixed_mode,
get_dotnet_managed_imports,
get_dotnet_managed_methods,
@@ -184,8 +185,8 @@ GLOBAL_HANDLERS = (
class DotnetFileFeatureExtractor(StaticFeatureExtractor):
def __init__(self, path: Path):
super().__init__(hashes=SampleHashes.from_bytes(path.read_bytes()))
self.path: Path = path
self.pe: dnfile.dnPE = dnfile.dnPE(str(path))
self.path = path
self.pe = load_dotnet_image(path)
def get_base_address(self):
return NO_ADDRESS
@@ -217,7 +218,10 @@ class DotnetFileFeatureExtractor(StaticFeatureExtractor):
assert self.pe.net.struct.MajorRuntimeVersion is not None
assert self.pe.net.struct.MinorRuntimeVersion is not None
return self.pe.net.struct.MajorRuntimeVersion, self.pe.net.struct.MinorRuntimeVersion
return (
self.pe.net.struct.MajorRuntimeVersion,
self.pe.net.struct.MinorRuntimeVersion,
)
def get_meta_version_string(self) -> str:
assert self.pe.net is not None

View File

@@ -83,7 +83,7 @@ def bb_contains_stackstring(bb: ghidra.program.model.block.CodeBlock) -> bool:
true if basic block contains enough moves of constant bytes to the stack
"""
count = 0
for insn in currentProgram().getListing().getInstructions(bb, True): # type: ignore [name-defined] # noqa: F821
for insn in capa.features.extractors.ghidra.helpers.get_current_program().getListing().getInstructions(bb, True):
if is_mov_imm_to_stack(insn):
count += get_printable_len(insn.getScalar(1))
if count > MIN_STACKSTRING_LEN:
@@ -96,7 +96,9 @@ def _bb_has_tight_loop(bb: ghidra.program.model.block.CodeBlock):
parse tight loops, true if last instruction in basic block branches to bb start
"""
# Reverse Ordered, first InstructionDB
last_insn = currentProgram().getListing().getInstructions(bb, False).next() # type: ignore [name-defined] # noqa: F821
last_insn = (
capa.features.extractors.ghidra.helpers.get_current_program().getListing().getInstructions(bb, False).next()
)
if last_insn.getFlowType().isJump():
return last_insn.getAddress(0) == bb.getMinAddress()
@@ -140,20 +142,3 @@ def extract_features(fh: FunctionHandle, bbh: BBHandle) -> Iterator[tuple[Featur
for bb_handler in BASIC_BLOCK_HANDLERS:
for feature, addr in bb_handler(fh, bbh):
yield feature, addr
def main():
features = []
from capa.features.extractors.ghidra.extractor import GhidraFeatureExtractor
for fh in GhidraFeatureExtractor().get_functions():
for bbh in capa.features.extractors.ghidra.helpers.get_function_blocks(fh):
features.extend(list(extract_features(fh, bbh)))
import pprint
pprint.pprint(features) # noqa: T203
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,44 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import Optional
class GhidraContext:
"""
State holder for the Ghidra backend to avoid passing state to every function.
PyGhidra uses a context manager to set up the Ghidra environment (program, transaction, etc.).
We store the relevant objects here to allow easy access throughout the extractor
without needing to pass them as arguments to every feature extraction method.
"""
def __init__(self, program, flat_api, monitor):
self.program = program
self.flat_api = flat_api
self.monitor = monitor
_context: Optional[GhidraContext] = None
def set_context(program, flat_api, monitor):
global _context
_context = GhidraContext(program, flat_api, monitor)
def get_context() -> GhidraContext:
if _context is None:
raise RuntimeError("GhidraContext not initialized")
return _context

View File

@@ -12,11 +12,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import weakref
import contextlib
from typing import Iterator
import capa.features.extractors.ghidra.file
import capa.features.extractors.ghidra.insn
import capa.features.extractors.ghidra.global_
import capa.features.extractors.ghidra.helpers as ghidra_helpers
import capa.features.extractors.ghidra.function
import capa.features.extractors.ghidra.basicblock
from capa.features.common import Feature
@@ -31,19 +34,20 @@ from capa.features.extractors.base_extractor import (
class GhidraFeatureExtractor(StaticFeatureExtractor):
def __init__(self):
import capa.features.extractors.ghidra.helpers as ghidra_helpers
def __init__(self, ctx_manager=None, tmpdir=None):
self.ctx_manager = ctx_manager
self.tmpdir = tmpdir
super().__init__(
SampleHashes(
md5=capa.ghidra.helpers.get_file_md5(),
md5=ghidra_helpers.get_current_program().getExecutableMD5(),
# ghidra doesn't expose this hash.
# https://ghidra.re/ghidra_docs/api/ghidra/program/model/listing/Program.html
#
# the hashes are stored in the database, not computed on the fly,
# so it's probably not trivial to add SHA1.
sha1="",
sha256=capa.ghidra.helpers.get_file_sha256(),
sha256=ghidra_helpers.get_current_program().getExecutableSHA256(),
)
)
@@ -55,8 +59,14 @@ class GhidraFeatureExtractor(StaticFeatureExtractor):
self.externs = ghidra_helpers.get_file_externs()
self.fakes = ghidra_helpers.map_fake_import_addrs()
# Register cleanup to run when the extractor is garbage collected or when the program exits.
# We use weakref.finalize instead of __del__ to avoid issues with reference cycles and
# to ensure deterministic cleanup on interpreter shutdown.
if self.ctx_manager or self.tmpdir:
weakref.finalize(self, cleanup, self.ctx_manager, self.tmpdir)
def get_base_address(self):
return AbsoluteVirtualAddress(currentProgram().getImageBase().getOffset()) # type: ignore [name-defined] # noqa: F821
return AbsoluteVirtualAddress(ghidra_helpers.get_current_program().getImageBase().getOffset())
def extract_global_features(self):
yield from self.global_features
@@ -65,7 +75,6 @@ class GhidraFeatureExtractor(StaticFeatureExtractor):
yield from capa.features.extractors.ghidra.file.extract_features()
def get_functions(self) -> Iterator[FunctionHandle]:
import capa.features.extractors.ghidra.helpers as ghidra_helpers
for fhandle in ghidra_helpers.get_function_symbols():
fh: FunctionHandle = FunctionHandle(
@@ -77,14 +86,14 @@ class GhidraFeatureExtractor(StaticFeatureExtractor):
@staticmethod
def get_function(addr: int) -> FunctionHandle:
func = getFunctionContaining(toAddr(addr)) # type: ignore [name-defined] # noqa: F821
func = ghidra_helpers.get_flat_api().getFunctionContaining(ghidra_helpers.get_flat_api().toAddr(addr))
return FunctionHandle(address=AbsoluteVirtualAddress(func.getEntryPoint().getOffset()), inner=func)
def extract_function_features(self, fh: FunctionHandle) -> Iterator[tuple[Feature, Address]]:
yield from capa.features.extractors.ghidra.function.extract_features(fh)
def get_basic_blocks(self, fh: FunctionHandle) -> Iterator[BBHandle]:
import capa.features.extractors.ghidra.helpers as ghidra_helpers
yield from ghidra_helpers.get_function_blocks(fh)
@@ -92,9 +101,17 @@ class GhidraFeatureExtractor(StaticFeatureExtractor):
yield from capa.features.extractors.ghidra.basicblock.extract_features(fh, bbh)
def get_instructions(self, fh: FunctionHandle, bbh: BBHandle) -> Iterator[InsnHandle]:
import capa.features.extractors.ghidra.helpers as ghidra_helpers
yield from ghidra_helpers.get_insn_in_range(bbh)
def extract_insn_features(self, fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle):
yield from capa.features.extractors.ghidra.insn.extract_features(fh, bbh, ih)
def cleanup(ctx_manager, tmpdir):
if ctx_manager:
with contextlib.suppress(Exception):
ctx_manager.__exit__(None, None, None)
if tmpdir:
with contextlib.suppress(Exception):
tmpdir.cleanup()

View File

@@ -80,22 +80,54 @@ def extract_file_embedded_pe() -> Iterator[tuple[Feature, Address]]:
for i in range(256)
]
for block in currentProgram().getMemory().getBlocks(): # type: ignore [name-defined] # noqa: F821
for block in capa.features.extractors.ghidra.helpers.get_current_program().getMemory().getBlocks():
if not all((block.isLoaded(), block.isInitialized(), "Headers" not in block.getName())):
continue
for off, _ in find_embedded_pe(capa.features.extractors.ghidra.helpers.get_block_bytes(block), mz_xor):
# add offset back to block start
ea: int = block.getStart().add(off).getOffset()
ea_addr = block.getStart().add(off)
ea = ea_addr.getOffset()
f_offset = capa.features.extractors.ghidra.helpers.get_file_offset(ea_addr)
if f_offset != -1:
ea = f_offset
yield Characteristic("embedded pe"), FileOffsetAddress(ea)
def extract_file_export_names() -> Iterator[tuple[Feature, Address]]:
"""extract function exports"""
st = currentProgram().getSymbolTable() # type: ignore [name-defined] # noqa: F821
program = capa.features.extractors.ghidra.helpers.get_current_program()
st = program.getSymbolTable()
for addr in st.getExternalEntryPointIterator():
yield Export(st.getPrimarySymbol(addr).getName()), AbsoluteVirtualAddress(addr.getOffset())
sym = st.getPrimarySymbol(addr)
name = sym.getName()
# Check for forwarded export
is_forwarded = False
refs = program.getReferenceManager().getReferencesFrom(addr)
for ref in refs:
if ref.getToAddress().isExternalAddress():
ext_sym = st.getPrimarySymbol(ref.getToAddress())
if ext_sym:
ext_loc = program.getExternalManager().getExternalLocation(ext_sym)
if ext_loc:
# It is a forwarded export
libname = ext_loc.getLibraryName()
if libname.lower().endswith(".dll"):
libname = libname[:-4]
forwarded_name = f"{libname}.{ext_loc.getLabel()}"
forwarded_name = capa.features.extractors.helpers.reformat_forwarded_export_name(forwarded_name)
yield Export(forwarded_name), AbsoluteVirtualAddress(addr.getOffset())
yield Characteristic("forwarded export"), AbsoluteVirtualAddress(addr.getOffset())
is_forwarded = True
break
if not is_forwarded:
yield Export(name), AbsoluteVirtualAddress(addr.getOffset())
def extract_file_import_names() -> Iterator[tuple[Feature, Address]]:
@@ -110,7 +142,7 @@ def extract_file_import_names() -> Iterator[tuple[Feature, Address]]:
- importname
"""
for f in currentProgram().getFunctionManager().getExternalFunctions(): # type: ignore [name-defined] # noqa: F821
for f in capa.features.extractors.ghidra.helpers.get_current_program().getFunctionManager().getExternalFunctions():
for r in f.getSymbol().getReferences():
if r.getReferenceType().isData():
addr = r.getFromAddress().getOffset() # gets pointer to fake external addr
@@ -126,14 +158,14 @@ def extract_file_import_names() -> Iterator[tuple[Feature, Address]]:
def extract_file_section_names() -> Iterator[tuple[Feature, Address]]:
"""extract section names"""
for block in currentProgram().getMemory().getBlocks(): # type: ignore [name-defined] # noqa: F821
for block in capa.features.extractors.ghidra.helpers.get_current_program().getMemory().getBlocks():
yield Section(block.getName()), AbsoluteVirtualAddress(block.getStart().getOffset())
def extract_file_strings() -> Iterator[tuple[Feature, Address]]:
"""extract ASCII and UTF-16 LE strings"""
for block in currentProgram().getMemory().getBlocks(): # type: ignore [name-defined] # noqa: F821
for block in capa.features.extractors.ghidra.helpers.get_current_program().getMemory().getBlocks():
if not block.isInitialized():
continue
@@ -153,7 +185,8 @@ def extract_file_function_names() -> Iterator[tuple[Feature, Address]]:
extract the names of statically-linked library functions.
"""
for sym in currentProgram().getSymbolTable().getAllSymbols(True): # type: ignore [name-defined] # noqa: F821
for sym in capa.features.extractors.ghidra.helpers.get_current_program().getSymbolTable().getAllSymbols(True):
# .isExternal() misses more than this config for the function symbols
if sym.getSymbolType() == SymbolType.FUNCTION and sym.getSource() == SourceType.ANALYSIS and sym.isGlobal():
name = sym.getName() # starts to resolve names based on Ghidra's FidDB
@@ -170,7 +203,7 @@ def extract_file_function_names() -> Iterator[tuple[Feature, Address]]:
def extract_file_format() -> Iterator[tuple[Feature, Address]]:
ef = currentProgram().getExecutableFormat() # type: ignore [name-defined] # noqa: F821
ef = capa.features.extractors.ghidra.helpers.get_current_program().getExecutableFormat()
if "PE" in ef:
yield Format(FORMAT_PE), NO_ADDRESS
elif "ELF" in ef:
@@ -198,14 +231,3 @@ FILE_HANDLERS = (
extract_file_function_names,
extract_file_format,
)
def main():
""" """
import pprint
pprint.pprint(list(extract_features())) # noqa: T203
if __name__ == "__main__":
main()

View File

@@ -26,21 +26,25 @@ from capa.features.extractors.base_extractor import FunctionHandle
def extract_function_calls_to(fh: FunctionHandle):
"""extract callers to a function"""
f: ghidra.program.database.function.FunctionDB = fh.inner
f: "ghidra.program.database.function.FunctionDB" = fh.inner
for ref in f.getSymbol().getReferences():
if ref.getReferenceType().isCall():
yield Characteristic("calls to"), AbsoluteVirtualAddress(ref.getFromAddress().getOffset())
def extract_function_loop(fh: FunctionHandle):
f: ghidra.program.database.function.FunctionDB = fh.inner
f: "ghidra.program.database.function.FunctionDB" = fh.inner
edges = []
for block in SimpleBlockIterator(BasicBlockModel(currentProgram()), f.getBody(), monitor()): # type: ignore [name-defined] # noqa: F821
dests = block.getDestinations(monitor()) # type: ignore [name-defined] # noqa: F821
for block in SimpleBlockIterator(
BasicBlockModel(capa.features.extractors.ghidra.helpers.get_current_program()),
f.getBody(),
capa.features.extractors.ghidra.helpers.get_monitor(),
):
dests = block.getDestinations(capa.features.extractors.ghidra.helpers.get_monitor())
s_addrs = block.getStartAddresses()
while dests.hasNext(): # For loop throws Python TypeError
while dests.hasNext():
for addr in s_addrs:
edges.append((addr.getOffset(), dests.next().getDestinationAddress().getOffset()))
@@ -49,32 +53,17 @@ def extract_function_loop(fh: FunctionHandle):
def extract_recursive_call(fh: FunctionHandle):
f: ghidra.program.database.function.FunctionDB = fh.inner
f: "ghidra.program.database.function.FunctionDB" = fh.inner
for func in f.getCalledFunctions(monitor()): # type: ignore [name-defined] # noqa: F821
for func in f.getCalledFunctions(capa.features.extractors.ghidra.helpers.get_monitor()):
if func.getEntryPoint().getOffset() == f.getEntryPoint().getOffset():
yield Characteristic("recursive call"), AbsoluteVirtualAddress(f.getEntryPoint().getOffset())
def extract_features(fh: FunctionHandle) -> Iterator[tuple[Feature, Address]]:
for func_handler in FUNCTION_HANDLERS:
for feature, addr in func_handler(fh):
for function_handler in FUNCTION_HANDLERS:
for feature, addr in function_handler(fh):
yield feature, addr
FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop, extract_recursive_call)
def main():
""" """
features = []
for fhandle in capa.features.extractors.ghidra.helpers.get_function_symbols():
features.extend(list(extract_features(fhandle)))
import pprint
pprint.pprint(features) # noqa: T203
if __name__ == "__main__":
main()

View File

@@ -26,7 +26,7 @@ logger = logging.getLogger(__name__)
def extract_os() -> Iterator[tuple[Feature, Address]]:
format_name: str = currentProgram().getExecutableFormat() # type: ignore [name-defined] # noqa: F821
format_name: str = capa.features.extractors.ghidra.helpers.get_current_program().getExecutableFormat()
if "PE" in format_name:
yield OS(OS_WINDOWS), NO_ADDRESS
@@ -53,7 +53,7 @@ def extract_os() -> Iterator[tuple[Feature, Address]]:
def extract_arch() -> Iterator[tuple[Feature, Address]]:
lang_id = currentProgram().getMetadata().get("Language ID") # type: ignore [name-defined] # noqa: F821
lang_id = capa.features.extractors.ghidra.helpers.get_current_program().getMetadata().get("Language ID")
if "x86" in lang_id and "64" in lang_id:
yield Arch(ARCH_AMD64), NO_ADDRESS

View File

@@ -22,9 +22,22 @@ from ghidra.program.model.symbol import SourceType, SymbolType
from ghidra.program.model.address import AddressSpace
import capa.features.extractors.helpers
import capa.features.extractors.ghidra.context as ghidra_context
from capa.features.common import THUNK_CHAIN_DEPTH_DELTA
from capa.features.address import AbsoluteVirtualAddress
from capa.features.extractors.base_extractor import BBHandle, InsnHandle, FunctionHandle
from capa.features.extractors.base_extractor import BBHandle, InsnHandle
def get_current_program():
return ghidra_context.get_context().program
def get_monitor():
return ghidra_context.get_context().monitor
def get_flat_api():
return ghidra_context.get_context().flat_api
def ints_to_bytes(bytez: list[int]) -> bytes:
@@ -36,7 +49,7 @@ def ints_to_bytes(bytez: list[int]) -> bytes:
return bytes([b & 0xFF for b in bytez])
def find_byte_sequence(addr: ghidra.program.model.address.Address, seq: bytes) -> Iterator[int]:
def find_byte_sequence(addr: "ghidra.program.model.address.Address", seq: bytes) -> Iterator[int]:
"""yield all ea of a given byte sequence
args:
@@ -44,12 +57,25 @@ def find_byte_sequence(addr: ghidra.program.model.address.Address, seq: bytes) -
seq: bytes to search e.g. b"\x01\x03"
"""
seqstr = "".join([f"\\x{b:02x}" for b in seq])
eas = findBytes(addr, seqstr, java.lang.Integer.MAX_VALUE, 1) # type: ignore [name-defined] # noqa: F821
eas = get_flat_api().findBytes(addr, seqstr, java.lang.Integer.MAX_VALUE, 1)
yield from eas
def get_bytes(addr: ghidra.program.model.address.Address, length: int) -> bytes:
def get_file_offset(addr: "ghidra.program.model.address.Address") -> int:
"""get file offset for an address"""
block = get_current_program().getMemory().getBlock(addr)
if not block:
return -1
for info in block.getSourceInfos():
if info.contains(addr):
return info.getFileBytesOffset(addr)
return -1
def get_bytes(addr: "ghidra.program.model.address.Address", length: int) -> bytes:
"""yield length bytes at addr
args:
@@ -57,12 +83,12 @@ def get_bytes(addr: ghidra.program.model.address.Address, length: int) -> bytes:
length: length of bytes to pull
"""
try:
return ints_to_bytes(getBytes(addr, length)) # type: ignore [name-defined] # noqa: F821
except RuntimeError:
return ints_to_bytes(get_flat_api().getBytes(addr, int(length)))
except Exception:
return b""
def get_block_bytes(block: ghidra.program.model.mem.MemoryBlock) -> bytes:
def get_block_bytes(block: "ghidra.program.model.mem.MemoryBlock") -> bytes:
"""yield all bytes in a given block
args:
@@ -73,20 +99,21 @@ def get_block_bytes(block: ghidra.program.model.mem.MemoryBlock) -> bytes:
def get_function_symbols():
"""yield all non-external function symbols"""
yield from currentProgram().getFunctionManager().getFunctionsNoStubs(True) # type: ignore [name-defined] # noqa: F821
yield from get_current_program().getFunctionManager().getFunctionsNoStubs(True)
def get_function_blocks(fh: FunctionHandle) -> Iterator[BBHandle]:
"""yield BBHandle for each bb in a given function"""
def get_function_blocks(fh: "capa.features.extractors.base_extractor.FunctionHandle") -> Iterator[BBHandle]:
"""
yield the basic blocks of the function
"""
func: ghidra.program.database.function.FunctionDB = fh.inner
for bb in SimpleBlockIterator(BasicBlockModel(currentProgram()), func.getBody(), monitor()): # type: ignore [name-defined] # noqa: F821
yield BBHandle(address=AbsoluteVirtualAddress(bb.getMinAddress().getOffset()), inner=bb)
for block in SimpleBlockIterator(BasicBlockModel(get_current_program()), fh.inner.getBody(), get_monitor()):
yield BBHandle(address=AbsoluteVirtualAddress(block.getMinAddress().getOffset()), inner=block)
def get_insn_in_range(bbh: BBHandle) -> Iterator[InsnHandle]:
"""yield InshHandle for each insn in a given basicblock"""
for insn in currentProgram().getListing().getInstructions(bbh.inner, True): # type: ignore [name-defined] # noqa: F821
for insn in get_current_program().getListing().getInstructions(bbh.inner, True):
yield InsnHandle(address=AbsoluteVirtualAddress(insn.getAddress().getOffset()), inner=insn)
@@ -95,7 +122,7 @@ def get_file_imports() -> dict[int, list[str]]:
import_dict: dict[int, list[str]] = {}
for f in currentProgram().getFunctionManager().getExternalFunctions(): # type: ignore [name-defined] # noqa: F821
for f in get_current_program().getFunctionManager().getExternalFunctions():
for r in f.getSymbol().getReferences():
if r.getReferenceType().isData():
addr = r.getFromAddress().getOffset() # gets pointer to fake external addr
@@ -133,7 +160,7 @@ def get_file_externs() -> dict[int, list[str]]:
extern_dict: dict[int, list[str]] = {}
for sym in currentProgram().getSymbolTable().getAllSymbols(True): # type: ignore [name-defined] # noqa: F821
for sym in get_current_program().getSymbolTable().getAllSymbols(True):
# .isExternal() misses more than this config for the function symbols
if sym.getSymbolType() == SymbolType.FUNCTION and sym.getSource() == SourceType.ANALYSIS and sym.isGlobal():
name = sym.getName() # starts to resolve names based on Ghidra's FidDB
@@ -171,7 +198,7 @@ def map_fake_import_addrs() -> dict[int, list[int]]:
"""
fake_dict: dict[int, list[int]] = {}
for f in currentProgram().getFunctionManager().getExternalFunctions(): # type: ignore [name-defined] # noqa: F821
for f in get_current_program().getFunctionManager().getExternalFunctions():
for r in f.getSymbol().getReferences():
if r.getReferenceType().isData():
fake_dict.setdefault(f.getEntryPoint().getOffset(), []).append(r.getFromAddress().getOffset())
@@ -180,7 +207,7 @@ def map_fake_import_addrs() -> dict[int, list[int]]:
def check_addr_for_api(
addr: ghidra.program.model.address.Address,
addr: "ghidra.program.model.address.Address",
fakes: dict[int, list[int]],
imports: dict[int, list[str]],
externs: dict[int, list[str]],
@@ -202,18 +229,18 @@ def check_addr_for_api(
return False
def is_call_or_jmp(insn: ghidra.program.database.code.InstructionDB) -> bool:
def is_call_or_jmp(insn: "ghidra.program.database.code.InstructionDB") -> bool:
return any(mnem in insn.getMnemonicString() for mnem in ["CALL", "J"]) # JMP, JNE, JNZ, etc
def is_sp_modified(insn: ghidra.program.database.code.InstructionDB) -> bool:
def is_sp_modified(insn: "ghidra.program.database.code.InstructionDB") -> bool:
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.REGISTER:
return "SP" in insn.getRegister(i).getName() and insn.getOperandRefType(i).isWrite()
return False
def is_stack_referenced(insn: ghidra.program.database.code.InstructionDB) -> bool:
def is_stack_referenced(insn: "ghidra.program.database.code.InstructionDB") -> bool:
"""generic catch-all for stack references"""
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.REGISTER:
@@ -225,7 +252,7 @@ def is_stack_referenced(insn: ghidra.program.database.code.InstructionDB) -> boo
return any(ref.isStackReference() for ref in insn.getReferencesFrom())
def is_zxor(insn: ghidra.program.database.code.InstructionDB) -> bool:
def is_zxor(insn: "ghidra.program.database.code.InstructionDB") -> bool:
# assume XOR insn
# XOR's against the same operand zero out
ops = []
@@ -241,29 +268,29 @@ def is_zxor(insn: ghidra.program.database.code.InstructionDB) -> bool:
return all(n == operands[0] for n in operands)
def handle_thunk(addr: ghidra.program.model.address.Address):
def handle_thunk(addr: "ghidra.program.model.address.Address"):
"""Follow thunk chains down to a reasonable depth"""
ref = addr
for _ in range(THUNK_CHAIN_DEPTH_DELTA):
thunk_jmp = getInstructionAt(ref) # type: ignore [name-defined] # noqa: F821
thunk_jmp = get_flat_api().getInstructionAt(ref)
if thunk_jmp and is_call_or_jmp(thunk_jmp):
if OperandType.isAddress(thunk_jmp.getOperandType(0)):
ref = thunk_jmp.getAddress(0)
else:
thunk_dat = getDataContaining(ref) # type: ignore [name-defined] # noqa: F821
thunk_dat = get_flat_api().getDataContaining(ref)
if thunk_dat and thunk_dat.isDefined() and thunk_dat.isPointer():
ref = thunk_dat.getValue()
break # end of thunk chain reached
return ref
def dereference_ptr(insn: ghidra.program.database.code.InstructionDB):
def dereference_ptr(insn: "ghidra.program.database.code.InstructionDB"):
addr_code = OperandType.ADDRESS | OperandType.CODE
to_deref = insn.getAddress(0)
dat = getDataContaining(to_deref) # type: ignore [name-defined] # noqa: F821
dat = get_flat_api().getDataContaining(to_deref)
if insn.getOperandType(0) == addr_code:
thfunc = getFunctionContaining(to_deref) # type: ignore [name-defined] # noqa: F821
thfunc = get_flat_api().getFunctionContaining(to_deref)
if thfunc and thfunc.isThunk():
return handle_thunk(to_deref)
else:
@@ -294,7 +321,7 @@ def find_data_references_from_insn(insn, max_depth: int = 10):
to_addr = reference.getToAddress()
for _ in range(max_depth - 1):
data = getDataAt(to_addr) # type: ignore [name-defined] # noqa: F821
data = get_flat_api().getDataAt(to_addr)
if data and data.isPointer():
ptr_value = data.getValue()

View File

@@ -234,7 +234,7 @@ def extract_insn_bytes_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandle
push offset iid_004118d4_IShellLinkA ; riid
"""
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
data = capa.features.extractors.ghidra.helpers.get_flat_api().getDataAt(addr)
if data and not data.hasStringValue():
extracted_bytes = capa.features.extractors.ghidra.helpers.get_bytes(addr, MAX_BYTES_FEATURE_SIZE)
if extracted_bytes and not capa.features.extractors.helpers.all_zeros(extracted_bytes):
@@ -249,9 +249,9 @@ def extract_insn_string_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandl
push offset aAcr ; "ACR > "
"""
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
data = capa.features.extractors.ghidra.helpers.get_flat_api().getDataAt(addr)
if data and data.hasStringValue():
yield String(data.getValue()), ih.address
yield String(str(data.getValue())), ih.address
def extract_insn_mnemonic_features(
@@ -361,8 +361,8 @@ def extract_insn_cross_section_cflow(
if capa.features.extractors.ghidra.helpers.check_addr_for_api(ref, fakes, imports, externs):
return
this_mem_block = getMemoryBlock(insn.getAddress()) # type: ignore [name-defined] # noqa: F821
ref_block = getMemoryBlock(ref) # type: ignore [name-defined] # noqa: F821
this_mem_block = capa.features.extractors.ghidra.helpers.get_flat_api().getMemoryBlock(insn.getAddress())
ref_block = capa.features.extractors.ghidra.helpers.get_flat_api().getMemoryBlock(ref)
if ref_block != this_mem_block:
yield Characteristic("cross section flow"), ih.address
@@ -425,19 +425,19 @@ def check_nzxor_security_cookie_delta(
Check if insn within last addr of last bb - delta
"""
model = SimpleBlockModel(currentProgram()) # type: ignore [name-defined] # noqa: F821
model = SimpleBlockModel(capa.features.extractors.ghidra.helpers.get_current_program())
insn_addr = insn.getAddress()
func_asv = fh.getBody()
first_addr = func_asv.getMinAddress()
if insn_addr < first_addr.add(SECURITY_COOKIE_BYTES_DELTA):
first_bb = model.getFirstCodeBlockContaining(first_addr, monitor()) # type: ignore [name-defined] # noqa: F821
first_bb = model.getFirstCodeBlockContaining(first_addr, capa.features.extractors.ghidra.helpers.get_monitor())
if first_bb.contains(insn_addr):
return True
last_addr = func_asv.getMaxAddress()
if insn_addr > last_addr.add(SECURITY_COOKIE_BYTES_DELTA * -1):
last_bb = model.getFirstCodeBlockContaining(last_addr, monitor()) # type: ignore [name-defined] # noqa: F821
last_bb = model.getFirstCodeBlockContaining(last_addr, capa.features.extractors.ghidra.helpers.get_monitor())
if last_bb.contains(insn_addr):
return True
@@ -488,22 +488,3 @@ INSTRUCTION_HANDLERS = (
extract_function_calls_from,
extract_function_indirect_call_characteristic_features,
)
def main():
""" """
features = []
from capa.features.extractors.ghidra.extractor import GhidraFeatureExtractor
for fh in GhidraFeatureExtractor().get_functions():
for bb in capa.features.extractors.ghidra.helpers.get_function_blocks(fh):
for insn in capa.features.extractors.ghidra.helpers.get_insn_in_range(bb):
features.extend(list(extract_features(fh, bb, insn)))
import pprint
pprint.pprint(features) # noqa: T203
if __name__ == "__main__":
main()

View File

@@ -18,6 +18,7 @@ import idaapi
import idautils
import capa.features.extractors.ida.helpers
from capa.features.file import FunctionName
from capa.features.common import Feature, Characteristic
from capa.features.address import Address, AbsoluteVirtualAddress
from capa.features.extractors import loops
@@ -50,10 +51,39 @@ def extract_recursive_call(fh: FunctionHandle):
yield Characteristic("recursive call"), fh.address
def extract_function_name(fh: FunctionHandle) -> Iterator[tuple[Feature, Address]]:
ea = fh.inner.start_ea
name = idaapi.get_name(ea)
if name.startswith("sub_"):
# skip default names, like "sub_401000"
return
yield FunctionName(name), fh.address
if name.startswith("_"):
# some linkers may prefix linked routines with a `_` to avoid name collisions.
# extract features for both the mangled and un-mangled representations.
# e.g. `_fwrite` -> `fwrite`
# see: https://stackoverflow.com/a/2628384/87207
yield FunctionName(name[1:]), fh.address
def extract_function_alternative_names(fh: FunctionHandle):
"""Get all alternative names for an address."""
for aname in capa.features.extractors.ida.helpers.get_function_alternative_names(fh.inner.start_ea):
yield FunctionName(aname), fh.address
def extract_features(fh: FunctionHandle) -> Iterator[tuple[Feature, Address]]:
for func_handler in FUNCTION_HANDLERS:
for feature, addr in func_handler(fh):
yield feature, addr
FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop, extract_recursive_call)
FUNCTION_HANDLERS = (
extract_function_calls_to,
extract_function_loop,
extract_recursive_call,
extract_function_name,
extract_function_alternative_names,
)

View File

@@ -20,6 +20,7 @@ import idaapi
import ida_nalt
import idautils
import ida_bytes
import ida_funcs
import ida_segment
from capa.features.address import AbsoluteVirtualAddress
@@ -436,3 +437,16 @@ def is_basic_block_return(bb: idaapi.BasicBlock) -> bool:
def has_sib(oper: idaapi.op_t) -> bool:
# via: https://reverseengineering.stackexchange.com/a/14300
return oper.specflag1 == 1
def find_alternative_names(cmt: str):
for line in cmt.split("\n"):
if line.startswith("Alternative name is '") and line.endswith("'"):
name = line[len("Alternative name is '") : -1] # Extract name between quotes
yield name
def get_function_alternative_names(fva: int):
"""Get all alternative names for an address."""
yield from find_alternative_names(ida_bytes.get_cmt(fva, False) or "")
yield from find_alternative_names(ida_funcs.get_func_cmt(idaapi.get_func(fva), False) or "")

View File

@@ -22,6 +22,7 @@ import idautils
import capa.features.extractors.helpers
import capa.features.extractors.ida.helpers
from capa.features.file import FunctionName
from capa.features.insn import API, MAX_STRUCTURE_SIZE, Number, Offset, Mnemonic, OperandNumber, OperandOffset
from capa.features.common import MAX_BYTES_FEATURE_SIZE, THUNK_CHAIN_DEPTH_DELTA, Bytes, String, Feature, Characteristic
from capa.features.address import Address, AbsoluteVirtualAddress
@@ -129,8 +130,8 @@ def extract_insn_api_features(fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle)
# not a function (start)
return
if target_func.flags & idaapi.FUNC_LIB:
name = idaapi.get_name(target_func.start_ea)
name = idaapi.get_name(target_func.start_ea)
if target_func.flags & idaapi.FUNC_LIB or not name.startswith("sub_"):
yield API(name), ih.address
if name.startswith("_"):
# some linkers may prefix linked routines with a `_` to avoid name collisions.
@@ -139,6 +140,10 @@ def extract_insn_api_features(fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle)
# see: https://stackoverflow.com/a/2628384/87207
yield API(name[1:]), ih.address
for altname in capa.features.extractors.ida.helpers.get_function_alternative_names(target_func.start_ea):
yield FunctionName(altname), ih.address
yield API(altname), ih.address
def extract_insn_number_features(
fh: FunctionHandle, bbh: BBHandle, ih: InsnHandle

View File

@@ -56,7 +56,7 @@ def get_previous_instructions(vw: VivWorkspace, va: int) -> list[int]:
if ploc is not None:
# from vivisect.const:
# location: (L_VA, L_SIZE, L_LTYPE, L_TINFO)
(pva, _, ptype, pinfo) = ploc
pva, _, ptype, pinfo = ploc
if ptype == LOC_OP and not (pinfo & IF_NOFALL):
ret.append(pva)

View File

@@ -176,7 +176,7 @@ def extract_insn_api_features(fh: FunctionHandle, bb, ih: InsnHandle) -> Iterato
elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper):
try:
(_, target) = resolve_indirect_call(f.vw, insn.va, insn=insn)
_, target = resolve_indirect_call(f.vw, insn.va, insn=insn)
except NotFoundError:
# not able to resolve the indirect call, sorry
return

View File

@@ -26,6 +26,16 @@ from capa.features.extractors.base_extractor import CallHandle, ThreadHandle, Pr
logger = logging.getLogger(__name__)
VOID_PTR_NUMBER_PARAMS = frozenset(
{
"hKey",
"hKeyRoot",
"hkResult",
"samDesired",
}
)
def get_call_param_features(param: Param, ch: CallHandle) -> Iterator[tuple[Feature, Address]]:
if param.deref is not None:
# pointer types contain a special "deref" member that stores the deref'd value
@@ -39,10 +49,31 @@ def get_call_param_features(param: Param, ch: CallHandle) -> Iterator[tuple[Feat
# parsing the data up to here results in double-escaped backslashes, remove those here
yield String(param.deref.value.replace("\\\\", "\\")), ch.address
else:
logger.debug("skipping deref param type %s", param.deref.type_)
if param.name in VOID_PTR_NUMBER_PARAMS:
try:
yield Number(hexint(param.deref.value)), ch.address
except (ValueError, TypeError) as e:
logger.debug(
"failed to parse whitelisted void_ptr param %s value %s: %s",
param.name,
param.deref.value,
e,
)
else:
logger.debug("skipping deref param type %s", param.deref.type_)
elif param.value is not None:
if param.type_ in PARAM_TYPE_INT:
yield Number(hexint(param.value)), ch.address
elif param.type_ == "void_ptr" and param.name in VOID_PTR_NUMBER_PARAMS:
try:
yield Number(hexint(param.value)), ch.address
except (ValueError, TypeError) as e:
logger.debug(
"failed to parse whitelisted void_ptr param %s value %s: %s",
param.name,
param.value,
e,
)
def extract_call_features(ph: ProcessHandle, th: ThreadHandle, ch: CallHandle) -> Iterator[tuple[Feature, Address]]:

View File

@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import Iterator
from pathlib import Path
@@ -39,6 +39,8 @@ from capa.features.extractors.base_extractor import (
DynamicFeatureExtractor,
)
logger = logging.getLogger(__name__)
def get_formatted_params(params: ParamList) -> list[str]:
params_list: list[str] = []
@@ -87,6 +89,16 @@ class VMRayExtractor(DynamicFeatureExtractor):
def get_processes(self) -> Iterator[ProcessHandle]:
for monitor_process in self.analysis.monitor_processes.values():
# skip invalid/incomplete monitor process entries, see #2807
if monitor_process.pid == 0 or not monitor_process.filename:
logger.debug(
"skipping incomplete process entry: pid=%d, filename=%s, monitor_id=%d",
monitor_process.pid,
monitor_process.filename,
monitor_process.monitor_id,
)
continue
address: ProcessAddress = ProcessAddress(pid=monitor_process.pid, ppid=monitor_process.ppid)
yield ProcessHandle(address, inner=monitor_process)

View File

@@ -490,11 +490,10 @@ def dumps_dynamic(extractor: DynamicFeatureExtractor) -> str:
taddr = Address.from_capa(t.address)
tfeatures = [
ThreadFeature(
basic_block=taddr,
thread=taddr,
address=Address.from_capa(addr),
feature=feature_from_capa(feature),
) # type: ignore
# Mypy is unable to recognise `basic_block` as an argument due to alias
)
for feature, addr in extractor.extract_thread_features(p, t)
]
@@ -544,7 +543,7 @@ def dumps_dynamic(extractor: DynamicFeatureExtractor) -> str:
# Mypy is unable to recognise `global_` as an argument due to alias
# workaround around mypy issue: https://github.com/python/mypy/issues/1424
get_base_addr = getattr(extractor, "get_base_addr", None)
get_base_addr = getattr(extractor, "get_base_address", None)
base_addr = get_base_addr() if get_base_addr else capa.features.address.NO_ADDRESS
freeze = Freeze(

View File

@@ -1,107 +1,75 @@
<div align="center">
<img src="../../doc/img/ghidra_backend_logo.png" width=240 height=125>
</div>
# capa analysis using Ghidra
# capa + Ghidra
capa supports using Ghidra (via [PyGhidra](https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Features/PyGhidra)) as a feature extraction backend. This enables you to run capa against binaries using Ghidra's analysis engine.
[capa](https://github.com/mandiant/capa) is the FLARE teams open-source tool that detects capabilities in executable files. [Ghidra](https://github.com/NationalSecurityAgency/ghidra) is an open-source software reverse engineering framework created and maintained by the National Security Agency Research Directorate. capa + Ghidra brings capas detection capabilities directly to Ghidras user interface helping speed up your reverse engineering tasks by identifying what parts of a program suggest interesting behavior, such as setting a registry value. You can execute the included Python 3 scripts [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) or [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) to run capas analysis and view the results in Ghidra. You may be asking yourself, “Python 3 scripts in Ghidra?”. You read that correctly. This integration is written entirely in Python 3 and relies on [Ghidrathon]( https://github.com/mandiant/ghidrathon), an open source Ghidra extension that adds Python 3 scripting to Ghidra.
Check out our capa + Ghidra blog posts:
* [Riding Dragons: capa Harnesses Ghidra](https://www.mandiant.com/resources/blog/capa-harnesses-ghidra)
## UI Integration
[capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) renders capa results in Ghidra's UI to help you quickly navigate them. This includes adding matched functions to Ghidras Symbol Tree and Bookmarks windows and adding comments to functions that indicate matched capabilities and features. You can execute this script using Ghidras Script Manager window.
### Symbol Tree Window
Matched functions are added to Ghidra's Symbol Tree window under a custom namespace that maps to the capabilities' [capa namespace](https://github.com/mandiant/capa-rules/blob/master/doc/format.md#rule-namespace).
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/eeae33f4-99d4-42dc-a5e8-4c1b8c661492" width=300>
</div>
### Comments
Comments are added at the beginning of matched functions indicating matched capabilities and inline comments are added to functions indicating matched features. You can view these comments in Ghidras Disassembly Listing and Decompile windows.
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/bb2b4170-7fd4-45fc-8c7b-ff8f2e2f101b" width=1000>
</div>
### Bookmarks
Bookmarks are added to functions that matched a capability that is mapped to a MITRE ATT&CK and/or Malware Behavior Catalog (MBC) technique. You can view these bookmarks in Ghidra's Bookmarks window.
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/7f9a66a9-7be7-4223-91c6-4b8fc4651336" width=825>
</div>
## Text-based Integration
[capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) outputs text-based capa results that mirror the output of capas standalone tool. You can execute this script using Ghidras Script Manager and view its output in Ghidras Console window.
<div align="center">
<img src="../../doc/img/ghidra_script_mngr_output.png" width=700>
</div>
You can also execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidra's Headless Analyzer to view its output in a terminal window.
<div align="center">
<img src="../../doc/img/ghidra_headless_analyzer.png">
</div>
# Getting Started
## Requirements
| Tool | Version | Source |
|------------|---------|--------|
| capa | `>= 7.0.0` | https://github.com/mandiant/capa/releases |
| Ghidrathon | `>= 3.0.0` | https://github.com/mandiant/Ghidrathon/releases |
| Ghidra | `>= 10.3.2` | https://github.com/NationalSecurityAgency/ghidra/releases |
| Python | `>= 3.10.0` | https://www.python.org/downloads |
## Installation
**Note**: capa + Ghidra relies on [Ghidrathon]( https://github.com/mandiant/ghidrathon) to execute Python 3 code in Ghidra. You must first install and configure Ghidrathon using the [steps outlined in its README]( https://github.com/mandiant/ghidrathon?tab=readme-ov-file#installing-ghidrathon). Then, you must use the Python 3 interpreter that you configured with Ghidrathon to complete the following steps:
1. Install capa and its dependencies from PyPI using the following command:
```bash
$ pip install flare-capa
$ capa -b ghidra Practical\ Malware\ Analysis\ Lab\ 01-01.exe_
┌──────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ md5 │ bb7425b82141a1c0f7d60e5106676bb1 │
│ sha1 │ │
│ sha256 │ 58898bd42c5bd3bf9b1389f0eee5b39cd59180e8370eb9ea838a0b327bd6fe47 │
│ analysis │ static │
│ os │ windows │
│ format │ pe │
│ arch │ i386 │
│ path │ ~/Documents/capa/tests/data/Practical Malware Analysis Lab 01-01.exe_ │
└──────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ATT&CK Tactic ┃ ATT&CK Technique ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ DISCOVERY │ File and Directory Discovery [T1083]
└────────────────────────────────────┴─────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ MBC Objective ┃ MBC Behavior ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ DISCOVERY │ File and Directory Discovery [E1083]
│ FILE SYSTEM │ Copy File [C0045]
│ │ Read File [C0051]
│ PROCESS │ Terminate Process [C0018]
└────────────────────────────────────┴─────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Capability ┃ Namespace ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ copy file │ host-interaction/file-system/copy │
│ enumerate files recursively │ host-interaction/file-system/files/list │
read file via mapping (2 matches) │ host-interaction/file-system/read │
│ terminate process (2 matches) │ host-interaction/process/terminate │
│ resolve function by parsing PE exports │ load-code/pe │
└────────────────────────────────────────────────┴─────────────────────────────────────────────────┘
```
2. Download and extract the [official capa rules](https://github.com/mandiant/capa-rules/releases) that match the capa version you have installed. You can use the following command to view the version of capa you have installed:
## getting started
### requirements
- [Ghidra](https://github.com/NationalSecurityAgency/ghidra) >= 12.0 must be installed and available via the `GHIDRA_INSTALL_DIR` environment variable.
#### standalone binary (recommended)
The capa [standalone binary](https://github.com/mandiant/capa/releases) is the preferred way to run capa with the Ghidra backend.
Although the binary does not bundle the Java environment or Ghidra itself, it will dynamically load them at runtime.
#### python package
You can also use the Ghidra backend with the capa Python package by installing `flare-capa` with the `ghidra` extra.
```bash
$ pip show flare-capa
OR
$ capa --version
$ pip install "flare-capa[ghidra]"
```
3. Copy [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) to your `ghidra_scripts` directory or manually add the parent directory of each script using Ghidras Script Manager.
### usage
## Usage
To use the Ghidra backend, specify it with the `-b` or `--backend` flag:
You can execute [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidras Script Manager. [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) can also be executed using Ghidra's Headless Analyzer.
### Execution using Ghidras Script Manager
You can execute [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidra's Script Manager as follows:
1. Navigate to `Window > Script Manager`
2. Expand the `Python 3 > capa` category
3. Double-click a script to execute it
Both scripts ask you to provide the path of your capa rules directory (see installation step 2). [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) also has you choose one of `default`, `verbose`, and `vverbose` output formats which mirror the output formats of capas standalone tool.
### Execution using Ghidras Headless Analyzer
You can execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidras Headless Analyzer by invoking the `analyzeHeadless` script included with Ghidra in its `support` directory. The following arguments must be provided:
| Argument | Description |
|----|----|
|`<project_path>`| Path to Ghidra project|
| `<project_name>`| Name of Ghidra Project|
| `-Process <sample_name>` OR `-Import <sample_path>`| Name of sample `<sample_name>` already imported into `<project_name>` OR absolute path of sample `<sample_path>` to import into `<project_name>`|
| `-ScriptPath <script_path>`| OPTIONAL parent directory `<script_path>` of [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py)|
| `-PostScript capa_ghidra.py`| Execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) after Ghidra analysis|
| `"<script_args>"`| Quoted string `"<script_args>"` containing script arguments passed to [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) that must specify a capa rules path and optionally the output format (`--verbose`, `--vverbose`, `--json`) you can specify `”help”` to view the scripts help message |
The following is an example of combining these arguments into a single `analyzeHeadless` script command:
```bash
$ analyzeHeadless /home/wumbo/demo demo -Import /home/wumbo/capa/tests/data/Practical\ Malware\ Analysis\ Lab\ 01-01.dll_ -PostScript capa_ghidra.py "/home/wumbo/capa/rules --verbose"
$ capa -b ghidra /path/to/sample
```
capa will:
1. Initialize a headless Ghidra instance.
2. Create a temporary project.
3. Import and analyze the sample.
4. Extract features and match rules.
5. Clean up the temporary project.
**Note:** The first time you run this, it may take a few moments to initialize the Ghidra environment.

View File

@@ -1,174 +0,0 @@
# Run capa against loaded Ghidra database and render results in Ghidra Console window
# @author Mike Hunhoff (mehunhoff@google.com)
# @category Python 3.capa
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
import logging
import pathlib
import argparse
import capa
import capa.main
import capa.rules
import capa.ghidra.helpers
import capa.render.default
import capa.capabilities.common
import capa.features.extractors.ghidra.extractor
logger = logging.getLogger("capa_ghidra")
def run_headless():
parser = argparse.ArgumentParser(description="The FLARE team's open-source tool to integrate capa with Ghidra.")
parser.add_argument(
"rules",
type=str,
help="path to rule file or directory",
)
parser.add_argument(
"-v", "--verbose", action="store_true", help="enable verbose result document (no effect with --json)"
)
parser.add_argument(
"-vv", "--vverbose", action="store_true", help="enable very verbose result document (no effect with --json)"
)
parser.add_argument("-d", "--debug", action="store_true", help="enable debugging output on STDERR")
parser.add_argument("-q", "--quiet", action="store_true", help="disable all output but errors")
parser.add_argument("-j", "--json", action="store_true", help="emit JSON instead of text")
script_args = list(getScriptArgs()) # type: ignore [name-defined] # noqa: F821
if not script_args or len(script_args) > 1:
script_args = []
else:
script_args = script_args[0].split()
for idx, arg in enumerate(script_args):
if arg.lower() == "help":
script_args[idx] = "--help"
args = parser.parse_args(args=script_args)
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
logger.debug("running in Ghidra headless mode")
rules_path = pathlib.Path(args.rules)
logger.debug("rule path: %s", rules_path)
rules = capa.rules.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
capabilities = capa.capabilities.common.find_capabilities(rules, extractor, False)
meta.analysis.feature_counts = capabilities.feature_counts
meta.analysis.library_functions = capabilities.library_functions
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities.matches)
if capa.capabilities.common.has_static_limitation(rules, capabilities, is_standalone=True):
logger.info("capa encountered warnings during analysis")
if args.json:
print(capa.render.json.render(meta, rules, capabilities.matches)) # noqa: T201
elif args.vverbose:
print(capa.render.vverbose.render(meta, rules, capabilities.matches)) # noqa: T201
elif args.verbose:
print(capa.render.verbose.render(meta, rules, capabilities.matches)) # noqa: T201
else:
print(capa.render.default.render(meta, rules, capabilities.matches)) # noqa: T201
return 0
def run_ui():
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
rules_dir: str = ""
try:
selected_dir = askDirectory("Choose capa rules directory", "Ok") # type: ignore [name-defined] # noqa: F821
if selected_dir:
rules_dir = selected_dir.getPath()
except RuntimeError:
# RuntimeError thrown when user selects "Cancel"
pass
if not rules_dir:
logger.info("You must choose a capa rules directory before running capa.")
return capa.main.E_MISSING_RULES
verbose = askChoice( # type: ignore [name-defined] # noqa: F821
"capa output verbosity", "Choose capa output verbosity", ["default", "verbose", "vverbose"], "default"
)
rules_path: pathlib.Path = pathlib.Path(rules_dir)
logger.info("running capa using rules from %s", str(rules_path))
rules = capa.rules.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
capabilities = capa.capabilities.common.find_capabilities(rules, extractor, True)
meta.analysis.feature_counts = capabilities.feature_counts
meta.analysis.library_functions = capabilities.library_functions
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities.matches)
if capa.capabilities.common.has_static_limitation(rules, capabilities, is_standalone=False):
logger.info("capa encountered warnings during analysis")
if verbose == "vverbose":
print(capa.render.vverbose.render(meta, rules, capabilities.matches)) # noqa: T201
elif verbose == "verbose":
print(capa.render.verbose.render(meta, rules, capabilities.matches)) # noqa: T201
else:
print(capa.render.default.render(meta, rules, capabilities.matches)) # noqa: T201
return 0
def main():
if not capa.ghidra.helpers.is_supported_ghidra_version():
return capa.main.E_UNSUPPORTED_GHIDRA_VERSION
if not capa.ghidra.helpers.is_supported_file_type():
return capa.main.E_INVALID_FILE_TYPE
if not capa.ghidra.helpers.is_supported_arch_type():
return capa.main.E_INVALID_FILE_ARCH
if isRunningHeadless(): # type: ignore [name-defined] # noqa: F821
return run_headless()
else:
return run_ui()
if __name__ == "__main__":
if sys.version_info < (3, 10):
from capa.exceptions import UnsupportedRuntimeError
raise UnsupportedRuntimeError("This version of capa can only be used with Python 3.10+")
sys.exit(main())

View File

@@ -22,6 +22,7 @@ import capa.version
import capa.features.common
import capa.features.freeze
import capa.render.result_document as rdoc
import capa.features.extractors.ghidra.context as ghidra_context
import capa.features.extractors.ghidra.helpers
from capa.features.address import AbsoluteVirtualAddress
@@ -31,6 +32,18 @@ logger = logging.getLogger("capa")
SUPPORTED_FILE_TYPES = ("Executable and Linking Format (ELF)", "Portable Executable (PE)", "Raw Binary")
def get_current_program():
return ghidra_context.get_context().program
def get_flat_api():
return ghidra_context.get_context().flat_api
def get_monitor():
return ghidra_context.get_context().monitor
class GHIDRAIO:
"""
An object that acts as a file-like object,
@@ -48,7 +61,12 @@ class GHIDRAIO:
self.offset = offset
def read(self, size):
logger.debug("reading 0x%x bytes at 0x%x (ea: 0x%x)", size, self.offset, currentProgram().getImageBase().add(self.offset).getOffset()) # type: ignore [name-defined] # noqa: F821
logger.debug(
"reading 0x%x bytes at 0x%x (ea: 0x%x)",
size,
self.offset,
get_current_program().getImageBase().add(self.offset).getOffset(),
)
if size > len(self.bytes_) - self.offset:
logger.debug("cannot read 0x%x bytes at 0x%x (ea: BADADDR)", size, self.offset)
@@ -60,7 +78,7 @@ class GHIDRAIO:
return
def get_bytes(self):
file_bytes = currentProgram().getMemory().getAllFileBytes()[0] # type: ignore [name-defined] # noqa: F821
file_bytes = get_current_program().getMemory().getAllFileBytes()[0]
# getOriginalByte() allows for raw file parsing on the Ghidra side
# other functions will fail as Ghidra will think that it's reading uninitialized memory
@@ -70,21 +88,32 @@ class GHIDRAIO:
def is_supported_ghidra_version():
version = float(getGhidraVersion()[:4]) # type: ignore [name-defined] # noqa: F821
if version < 10.2:
warning_msg = "capa does not support this Ghidra version"
logger.warning(warning_msg)
logger.warning("Your Ghidra version is: %s. Supported versions are: Ghidra >= 10.2", version)
import ghidra.framework
version = ghidra.framework.Application.getApplicationVersion()
try:
# version format example: "11.1.2" or "11.4"
major, minor = map(int, version.split(".")[:2])
if major < 12:
logger.error("-" * 80)
logger.error(" Ghidra version %s is not supported.", version)
logger.error(" ")
logger.error(" capa requires Ghidra 12.0 or higher.")
logger.error("-" * 80)
return False
except ValueError:
logger.warning("could not parse Ghidra version: %s", version)
return False
return True
def is_running_headless():
return isRunningHeadless() # type: ignore [name-defined] # noqa: F821
return True # PyGhidra is always headless in this context
def is_supported_file_type():
file_info = currentProgram().getExecutableFormat() # type: ignore [name-defined] # noqa: F821
file_info = get_current_program().getExecutableFormat()
if file_info not in SUPPORTED_FILE_TYPES:
logger.error("-" * 80)
logger.error(" Input file does not appear to be a supported file type.")
@@ -99,7 +128,7 @@ def is_supported_file_type():
def is_supported_arch_type():
lang_id = str(currentProgram().getLanguageID()).lower() # type: ignore [name-defined] # noqa: F821
lang_id = str(get_current_program().getLanguageID()).lower()
if not all((lang_id.startswith("x86"), any(arch in lang_id for arch in ("32", "64")))):
logger.error("-" * 80)
@@ -112,18 +141,18 @@ def is_supported_arch_type():
def get_file_md5():
return currentProgram().getExecutableMD5() # type: ignore [name-defined] # noqa: F821
return get_current_program().getExecutableMD5()
def get_file_sha256():
return currentProgram().getExecutableSHA256() # type: ignore [name-defined] # noqa: F821
return get_current_program().getExecutableSHA256()
def collect_metadata(rules: list[Path]):
md5 = get_file_md5()
sha256 = get_file_sha256()
info = currentProgram().getLanguageID().toString() # type: ignore [name-defined] # noqa: F821
info = get_current_program().getLanguageID().toString()
if "x86" in info and "64" in info:
arch = "x86_64"
elif "x86" in info and "32" in info:
@@ -131,11 +160,11 @@ def collect_metadata(rules: list[Path]):
else:
arch = "unknown arch"
format_name: str = currentProgram().getExecutableFormat() # type: ignore [name-defined] # noqa: F821
format_name: str = get_current_program().getExecutableFormat()
if "PE" in format_name:
os = "windows"
elif "ELF" in format_name:
with contextlib.closing(capa.ghidra.helpers.GHIDRAIO()) as f:
with contextlib.closing(GHIDRAIO()) as f:
os = capa.features.extractors.elf.detect_elf_os(f)
else:
os = "unknown os"
@@ -148,16 +177,18 @@ def collect_metadata(rules: list[Path]):
md5=md5,
sha1="",
sha256=sha256,
path=currentProgram().getExecutablePath(), # type: ignore [name-defined] # noqa: F821
path=get_current_program().getExecutablePath(),
),
flavor=rdoc.Flavor.STATIC,
analysis=rdoc.StaticAnalysis(
format=currentProgram().getExecutableFormat(), # type: ignore [name-defined] # noqa: F821
format=get_current_program().getExecutableFormat(),
arch=arch,
os=os,
extractor="ghidra",
rules=tuple(r.resolve().absolute().as_posix() for r in rules),
base_address=capa.features.freeze.Address.from_capa(AbsoluteVirtualAddress(currentProgram().getImageBase().getOffset())), # type: ignore [name-defined] # noqa: F821
base_address=capa.features.freeze.Address.from_capa(
AbsoluteVirtualAddress(get_current_program().getImageBase().getOffset())
),
layout=rdoc.StaticLayout(
functions=(),
),

View File

@@ -0,0 +1,54 @@
<div align="center">
<img src="https://github.com/mandiant/capa/blob/master/doc/img/ghidra_backend_logo.png" width=240 height=125>
</div>
# capa explorer for Ghidra
capa explorer for Ghidra brings capas detection capabilities directly to Ghidras user interface helping speed up your reverse engineering tasks by identifying what parts of a program suggest interesting behavior, such as setting a registry value. You can execute (via [PyGhidra](https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Features/PyGhidra)) the script [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/plugin/capa_explorer.py) using Ghidras Script Manager window to run capas analysis and view the results in Ghidra.
## ui integration
[capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) renders capa results in Ghidra's UI to help you quickly navigate them. This includes adding matched functions to Ghidras Symbol Tree and Bookmarks windows and adding comments to functions that indicate matched capabilities and features. You can execute this script using Ghidras Script Manager window.
### symbol tree window
Matched functions are added to Ghidra's Symbol Tree window under a custom namespace that maps to the capabilities' [capa namespace](https://github.com/mandiant/capa-rules/blob/master/doc/format.md#rule-namespace).
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/eeae33f4-99d4-42dc-a5e8-4c1b8c661492" width=300>
</div>
### comments
Comments are added at the beginning of matched functions indicating matched capabilities and inline comments are added to functions indicating matched features. You can view these comments in Ghidras Disassembly Listing and Decompile windows.
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/bb2b4170-7fd4-45fc-8c7b-ff8f2e2f101b" width=1000>
</div>
### bookmarks
Bookmarks are added to functions that matched a capability that is mapped to a MITRE ATT&CK and/or Malware Behavior Catalog (MBC) technique. You can view these bookmarks in Ghidra's Bookmarks window.
<div align="center">
<img src="https://github.com/mandiant/capa/assets/66766340/7f9a66a9-7be7-4223-91c6-4b8fc4651336" width=825>
</div>
# getting started
## requirements
- [Ghidra](https://github.com/NationalSecurityAgency/ghidra) >= 12.0 must be installed.
- [flare-capa](https://pypi.org/project/flare-capa/) >= 10.0 must be installed (virtual environment recommended) with the `ghidra` extra (e.g., `pip install "flare-capa[ghidra]"`).
- [capa rules](https://github.com/mandiant/capa-rules) must be downloaded for the version of capa you are using.
## execution
### 1. run Ghidra with PyGhidra
You must start Ghidra using the `pyghidraRun` script provided in the support directory of your Ghidra installation to ensure the Python environment is correctly loaded. You should execute `pyghidraRun` from within the Python environment that you used to install capa.
```bash
<ghidra_install>/support/pyghidraRun
```
### 2. run capa_explorer.py
1. Open your Ghidra project and CodeBrowser.
2. Open the Script Manager.
3. Add [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/plugin/capa_explorer.py) to the script directories.
4. Filter for capa and run the script.
5. When prompted, select the directory containing the downloaded capa rules.

View File

@@ -1,7 +1,3 @@
# Run capa against loaded Ghidra database and render results in Ghidra UI
# @author Colton Gabertan (gabertan.colton@gmail.com)
# @category Python 3.capa
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
@@ -16,36 +12,63 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
# Run capa against loaded Ghidra database and render results in Ghidra UI
# @author Colton Gabertan (gabertan.colton@gmail.com)
# @category capa
# @runtime PyGhidra
import json
import logging
import pathlib
from typing import Any
from java.util import ArrayList
from ghidra.util import Msg
from ghidra.app.cmd.label import AddLabelCmd, CreateNamespacesCmd
from ghidra.util.exception import CancelledException
from ghidra.program.flatapi import FlatProgramAPI
from ghidra.program.model.symbol import Namespace, SourceType, SymbolType
import capa
import capa.main
import capa.rules
import capa.version
import capa.render.json
import capa.ghidra.helpers
import capa.capabilities.common
import capa.features.extractors.ghidra.context
import capa.features.extractors.ghidra.extractor
logger = logging.getLogger("capa_explorer")
def show_monitor_message(msg):
capa.ghidra.helpers.get_monitor().checkCanceled()
capa.ghidra.helpers.get_monitor().setMessage(msg)
def show_error(msg):
Msg.showError(None, None, "capa explorer", msg)
def show_warn(msg):
Msg.showWarn(None, None, "capa explorer", msg)
def show_info(msg):
Msg.showInfo(None, None, "capa explorer", msg)
def add_bookmark(addr, txt, category="CapaExplorer"):
"""create bookmark at addr"""
currentProgram().getBookmarkManager().setBookmark(addr, "Info", category, txt) # type: ignore [name-defined] # noqa: F821
capa.ghidra.helpers.get_current_program().getBookmarkManager().setBookmark(addr, "Info", category, txt)
def create_namespace(namespace_str):
"""create new Ghidra namespace for each capa namespace"""
cmd = CreateNamespacesCmd(namespace_str, SourceType.USER_DEFINED)
cmd.applyTo(currentProgram()) # type: ignore [name-defined] # noqa: F821
cmd.applyTo(capa.ghidra.helpers.get_current_program())
return cmd.getNamespace()
@@ -53,7 +76,7 @@ def create_label(ghidra_addr, name, capa_namespace):
"""custom label cmd to overlay symbols under capa-generated namespaces"""
# prevent duplicate labels under the same capa-generated namespace
symbol_table = currentProgram().getSymbolTable() # type: ignore [name-defined] # noqa: F821
symbol_table = capa.ghidra.helpers.get_current_program().getSymbolTable()
for sym in symbol_table.getSymbols(ghidra_addr):
if sym.getName(True) == capa_namespace.getName(True) + Namespace.DELIMITER + name:
return
@@ -61,7 +84,7 @@ def create_label(ghidra_addr, name, capa_namespace):
# create SymbolType.LABEL at addr
# prioritize capa-generated namespace (duplicate match @ new addr), else put under global Ghidra one (new match)
cmd = AddLabelCmd(ghidra_addr, name, True, SourceType.USER_DEFINED)
cmd.applyTo(currentProgram()) # type: ignore [name-defined] # noqa: F821
cmd.applyTo(capa.ghidra.helpers.get_current_program())
# assign new match overlay label to capa-generated namespace
cmd.getSymbol().setNamespace(capa_namespace)
@@ -92,8 +115,8 @@ class CapaMatchData:
return
for key in self.matches.keys():
addr = toAddr(hex(key)) # type: ignore [name-defined] # noqa: F821
func = getFunctionContaining(addr) # type: ignore [name-defined] # noqa: F821
addr = capa.ghidra.helpers.get_flat_api().toAddr(hex(key))
func = capa.ghidra.helpers.get_flat_api().getFunctionContaining(addr)
# bookmark & tag MITRE ATT&CK tactics & MBC @ function scope
if func is not None:
@@ -117,140 +140,160 @@ class CapaMatchData:
def set_plate_comment(self, ghidra_addr):
"""set plate comments at matched functions"""
comment = getPlateComment(ghidra_addr) # type: ignore [name-defined] # noqa: F821
comment = capa.ghidra.helpers.get_flat_api().getPlateComment(ghidra_addr)
rule_path = self.namespace.replace(Namespace.DELIMITER, "/")
# 2 calls to avoid duplicate comments via subsequent script runs
if comment is None:
# first comment @ function
comment = rule_path + "\n"
setPlateComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
capa.ghidra.helpers.get_flat_api().setPlateComment(ghidra_addr, comment)
elif rule_path not in comment:
comment = comment + rule_path + "\n"
setPlateComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
capa.ghidra.helpers.get_flat_api().setPlateComment(ghidra_addr, comment)
else:
return
def set_pre_comment(self, ghidra_addr, sub_type, description):
"""set pre comments at subscoped matches of main rules"""
comment = getPreComment(ghidra_addr) # type: ignore [name-defined] # noqa: F821
comment = capa.ghidra.helpers.get_flat_api().getPreComment(ghidra_addr)
if comment is None:
comment = "capa: " + sub_type + "(" + description + ")" + ' matched in "' + self.capability + '"\n'
setPreComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
capa.ghidra.helpers.get_flat_api().setPreComment(ghidra_addr, comment)
elif self.capability not in comment:
comment = (
comment + "capa: " + sub_type + "(" + description + ")" + ' matched in "' + self.capability + '"\n'
)
setPreComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
capa.ghidra.helpers.get_flat_api().setPreComment(ghidra_addr, comment)
else:
return
def label_matches(self):
def label_matches(self, do_namespaces, do_comments):
"""label findings at function scopes and comment on subscope matches"""
capa_namespace = create_namespace(self.namespace)
symbol_table = currentProgram().getSymbolTable() # type: ignore [name-defined] # noqa: F821
capa_namespace = None
if do_namespaces:
capa_namespace = create_namespace(self.namespace)
symbol_table = capa.ghidra.helpers.get_current_program().getSymbolTable()
# handle function main scope of matched rule
# these will typically contain further matches within
if self.scope == "function":
for addr in self.matches.keys():
ghidra_addr = toAddr(hex(addr)) # type: ignore [name-defined] # noqa: F821
ghidra_addr = capa.ghidra.helpers.get_flat_api().toAddr(hex(addr))
# classify new function label under capa-generated namespace
sym = symbol_table.getPrimarySymbol(ghidra_addr)
if sym is not None:
if sym.getSymbolType() == SymbolType.FUNCTION:
create_label(ghidra_addr, sym.getName(), capa_namespace)
self.set_plate_comment(ghidra_addr)
if do_namespaces:
sym = symbol_table.getPrimarySymbol(ghidra_addr)
if sym is not None:
if sym.getSymbolType() == SymbolType.FUNCTION:
create_label(ghidra_addr, sym.getName(), capa_namespace)
# parse the corresponding nodes, and pre-comment subscope matched features
# under the encompassing function(s)
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = toAddr(hex(loc)) # type: ignore [name-defined] # noqa: F821
if sub_ghidra_addr == ghidra_addr:
# skip duplicates
continue
if do_comments:
self.set_plate_comment(ghidra_addr)
# precomment subscope matches under the function
if node != {}:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
# parse the corresponding nodes, and pre-comment subscope matched features
# under the encompassing function(s)
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = capa.ghidra.helpers.get_flat_api().toAddr(hex(loc))
if sub_ghidra_addr == ghidra_addr:
# skip duplicates
continue
# precomment subscope matches under the function
if node != {} and do_comments:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# resolve the encompassing function for the capa namespace
# of non-function scoped main matches
for addr in self.matches.keys():
ghidra_addr = toAddr(hex(addr)) # type: ignore [name-defined] # noqa: F821
ghidra_addr = capa.ghidra.helpers.get_flat_api().toAddr(hex(addr))
# basic block / insn scoped main matches
# Ex. See "Create Process on Windows" Rule
func = getFunctionContaining(ghidra_addr) # type: ignore [name-defined] # noqa: F821
func = capa.ghidra.helpers.get_flat_api().getFunctionContaining(ghidra_addr)
if func is not None:
func_addr = func.getEntryPoint()
create_label(func_addr, func.getName(), capa_namespace)
self.set_plate_comment(func_addr)
if do_namespaces:
create_label(func_addr, func.getName(), capa_namespace)
if do_comments:
self.set_plate_comment(func_addr)
# create subscope match precomments
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = toAddr(hex(loc)) # type: ignore [name-defined] # noqa: F821
sub_ghidra_addr = capa.ghidra.helpers.get_flat_api().toAddr(hex(loc))
if node != {}:
if func is not None:
# basic block/ insn scope under resolved function
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
if do_comments:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
sub_func = capa.ghidra.helpers.get_flat_api().getFunctionContaining(sub_ghidra_addr)
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
if do_namespaces:
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
if do_comments:
self.set_plate_comment(sub_func_addr)
if do_comments:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
for sub_type, description in parse_node(node):
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
if do_namespaces:
for _sub_type, _description in parse_node(node):
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)
if do_comments:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
def get_capabilities():
rules_dir: str = ""
try:
selected_dir = askDirectory("Choose capa rules directory", "Ok") # type: ignore [name-defined] # noqa: F821
if selected_dir:
rules_dir = selected_dir.getPath()
except RuntimeError:
# RuntimeError thrown when user selects "Cancel"
pass
rules_dir = ""
show_monitor_message(f"requesting capa {capa.version.__version__} rules directory")
selected_dir = askDirectory(f"choose capa {capa.version.__version__} rules directory", "Ok") # type: ignore [name-defined] # noqa: F821
if selected_dir:
rules_dir = selected_dir.getPath()
if not rules_dir:
logger.info("You must choose a capa rules directory before running capa.")
return "" # return empty str to avoid handling both int and str types
raise CancelledException
rules_path: pathlib.Path = pathlib.Path(rules_dir)
logger.info("running capa using rules from %s", str(rules_path))
show_monitor_message(f"loading rules from {rules_path}")
rules = capa.rules.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
show_monitor_message("collecting binary metadata")
meta = capa.ghidra.helpers.collect_metadata([rules_path])
show_monitor_message("running capa analysis")
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
capabilities = capa.capabilities.common.find_capabilities(rules, extractor, True)
show_monitor_message("checking for static limitations")
if capa.capabilities.common.has_static_limitation(rules, capabilities, is_standalone=False):
popup("capa explorer encountered warnings during analysis. Please check the console output for more information.") # type: ignore [name-defined] # noqa: F821
logger.info("capa encountered warnings during analysis")
show_warn(
"capa explorer encountered warnings during analysis. Please check the console output for more information.",
)
show_monitor_message("rendering results")
return capa.render.json.render(meta, rules, capabilities.matches)
@@ -328,12 +371,12 @@ def parse_json(capa_data):
# this requires the correct delimiter used by Ghidra
# Ex. 'communication/named-pipe/create/create pipe' -> capa::communication::named-pipe::create::create-pipe
namespace_str = Namespace.DELIMITER.join(meta["namespace"].split("/"))
namespace = "capa" + Namespace.DELIMITER + namespace_str + fmt_rule
namespace = "capa_explorer" + Namespace.DELIMITER + namespace_str + fmt_rule
else:
# lib rules via the official rules repo will not contain data
# for the "namespaces" key, so format using rule itself
# Ex. 'contain loop' -> capa::lib::contain-loop
namespace = "capa" + Namespace.DELIMITER + "lib" + fmt_rule
namespace = "capa_explorer" + Namespace.DELIMITER + "lib" + fmt_rule
yield CapaMatchData(namespace, scope, rule, rule_matches, attack, mbc)
@@ -342,44 +385,79 @@ def main():
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
if isRunningHeadless(): # type: ignore [name-defined] # noqa: F821
logger.error("unsupported Ghidra execution mode")
return capa.main.E_UNSUPPORTED_GHIDRA_EXECUTION_MODE
choices = ["namespaces", "bookmarks", "comments"]
# use ArrayList to resolve ambiguous askChoices overloads (List vs List, List) in PyGhidra
choices_java = ArrayList()
for c in choices:
choices_java.add(c)
choice_labels = [
'add "capa_explorer" namespace for matched functions',
"add bookmarks for matched functions",
"add comments to matched functions",
]
# use ArrayList to resolve ambiguous askChoices overloads (List vs List, List) in PyGhidra
choice_labels_java = ArrayList()
for c in choice_labels:
choice_labels_java.add(c)
selected = list(askChoices("capa explorer", "select actions:", choices_java, choice_labels_java)) # type: ignore [name-defined] # noqa: F821
do_namespaces = "namespaces" in selected
do_comments = "comments" in selected
do_bookmarks = "bookmarks" in selected
if not any((do_namespaces, do_comments, do_bookmarks)):
raise CancelledException("no actions selected")
# initialize the context for the extractor/helpers
capa.features.extractors.ghidra.context.set_context(
currentProgram, # type: ignore [name-defined] # noqa: F821
FlatProgramAPI(currentProgram), # type: ignore [name-defined] # noqa: F821
monitor, # type: ignore [name-defined] # noqa: F821
)
show_monitor_message("checking supported Ghidra version")
if not capa.ghidra.helpers.is_supported_ghidra_version():
logger.error("unsupported Ghidra version")
show_error("unsupported Ghidra version")
return capa.main.E_UNSUPPORTED_GHIDRA_VERSION
show_monitor_message("checking supported file type")
if not capa.ghidra.helpers.is_supported_file_type():
logger.error("unsupported file type")
show_error("unsupported file type")
return capa.main.E_INVALID_FILE_TYPE
show_monitor_message("checking supported file architecture")
if not capa.ghidra.helpers.is_supported_arch_type():
logger.error("unsupported file architecture")
show_error("unsupported file architecture")
return capa.main.E_INVALID_FILE_ARCH
# capa_data will always contain {'meta':..., 'rules':...}
# if the 'rules' key contains no values, then there were no matches
capa_data = json.loads(get_capabilities())
if capa_data.get("rules") is None:
logger.info("capa explorer found no matches")
popup("capa explorer found no matches.") # type: ignore [name-defined] # noqa: F821
show_info("capa explorer found no matches.")
return capa.main.E_EMPTY_REPORT
show_monitor_message("processing matches")
for item in parse_json(capa_data):
item.bookmark_functions()
item.label_matches()
logger.info("capa explorer analysis complete")
popup("capa explorer analysis complete.\nPlease see results in the Bookmarks Window and Namespaces section of the Symbol Tree Window.") # type: ignore [name-defined] # noqa: F821
if do_bookmarks:
show_monitor_message("adding bookmarks")
item.bookmark_functions()
if do_namespaces or do_comments:
show_monitor_message("adding labels")
item.label_matches(do_namespaces, do_comments)
show_info("capa explorer analysis complete.")
return 0
if __name__ == "__main__":
if sys.version_info < (3, 10):
from capa.exceptions import UnsupportedRuntimeError
raise UnsupportedRuntimeError("This version of capa can only be used with Python 3.10+")
exit_code = main()
if exit_code != 0:
popup("capa explorer encountered errors during analysis. Please check the console output for more information.") # type: ignore [name-defined] # noqa: F821
sys.exit(exit_code)
try:
if main() != 0:
show_error(
"capa explorer encountered errors during analysis. Please check the console output for more information.",
)
except CancelledException:
show_info("capa explorer analysis cancelled.")

View File

@@ -96,11 +96,7 @@ def is_runtime_ida():
def is_runtime_ghidra():
try:
currentProgram # type: ignore [name-defined] # noqa: F821
except NameError:
return False
return True
return importlib.util.find_spec("ghidra") is not None
def assert_never(value) -> NoReturn:
@@ -331,6 +327,9 @@ def log_unsupported_os_error():
logger.error(" ")
logger.error(" capa currently only analyzes executables for some operating systems")
logger.error(" (including Windows, Linux, and Android).")
logger.error(" ")
logger.error(" If you know the target OS, you can specify it explicitly, for example:")
logger.error(" capa --os linux <sample>")
logger.error("-" * 80)

View File

@@ -3,7 +3,7 @@
"plugin": {
"name": "capa",
"entryPoint": "capa_explorer.py",
"version": "9.3.0",
"version": "9.3.1",
"idaVersions": ">=7.4",
"description": "Identify capabilities in executable files using FLARE's capa framework",
"license": "Apache-2.0",
@@ -12,7 +12,7 @@
"api-scripting-and-automation",
"ui-ux-and-visualization"
],
"pythonDependencies": ["flare-capa==9.3.0"],
"pythonDependencies": ["flare-capa==9.3.1"],
"urls": {
"repository": "https://github.com/mandiant/capa"
},

View File

@@ -403,7 +403,7 @@ class CapaExplorerDataModel(QtCore.QAbstractItemModel):
display += f"{statement.min}"
elif statement.min == 0:
display += f"{statement.max} or fewer"
elif statement.max == (1 << 64 - 1):
elif statement.max == ((1 << 64) - 1):
display += f"{statement.min} or more"
else:
display += f"between {statement.min} and {statement.max}"

View File

@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import io
import os
import logging
import datetime
@@ -23,24 +22,13 @@ from pathlib import Path
from rich.console import Console
from typing_extensions import assert_never
import capa.perf
import capa.rules
import capa.engine
import capa.helpers
import capa.version
import capa.render.json
import capa.rules.cache
import capa.render.default
import capa.render.verbose
import capa.features.common
import capa.features.freeze as frz
import capa.render.vverbose
import capa.features.extractors
import capa.render.result_document
import capa.render.result_document as rdoc
import capa.features.extractors.common
import capa.features.extractors.base_extractor
import capa.features.extractors.cape.extractor
from capa.rules import RuleSet
from capa.engine import MatchResults
from capa.exceptions import UnsupportedOSError, UnsupportedArchError, UnsupportedFormatError
@@ -79,6 +67,7 @@ BACKEND_VMRAY = "vmray"
BACKEND_FREEZE = "freeze"
BACKEND_BINEXPORT2 = "binexport2"
BACKEND_IDA = "ida"
BACKEND_GHIDRA = "ghidra"
class CorruptFile(ValueError):
@@ -137,6 +126,57 @@ def get_meta_str(vw):
return f"{', '.join(meta)}, number of functions: {len(vw.getFunctions())}"
def _is_probably_corrupt_pe(path: Path) -> bool:
"""
Heuristic check for obviously malformed PE samples that provoke
pathological behavior in vivisect (see GH-1989).
We treat a PE as "probably corrupt" when any section declares an
unrealistically large virtual size compared to the file size, e.g.
hundreds of megabytes in a tiny file. Such cases lead vivisect to
try to map enormous regions and can exhaust CPU/memory.
"""
try:
import pefile
except Exception:
# If pefile is unavailable, fall back to existing behavior.
return False
try:
pe = pefile.PE(str(path), fast_load=True)
except pefile.PEFormatError:
# Not a PE file (or badly formed); let existing checks handle it.
return False
except Exception:
return False
try:
file_size = path.stat().st_size
except OSError:
return False
if file_size <= 0:
return False
# Flag sections whose declared virtual size is wildly disproportionate
# to the file size (e.g. 900MB section in a ~400KB sample).
_VSIZE_FILE_RATIO = 128
_MAX_REASONABLE_VSIZE = 512 * 1024 * 1024 # 512 MB
max_reasonable = max(file_size * _VSIZE_FILE_RATIO, _MAX_REASONABLE_VSIZE)
for section in getattr(pe, "sections", []):
vsize = getattr(section, "Misc_VirtualSize", 0) or 0
if vsize > max_reasonable:
logger.debug(
"detected unrealistic PE section virtual size: 0x%x (file size: 0x%x), treating as corrupt",
vsize,
file_size,
)
return True
return False
def get_workspace(path: Path, input_format: str, sigpaths: list[Path]):
"""
load the program at the given path into a vivisect workspace using the given format.
@@ -154,11 +194,18 @@ def get_workspace(path: Path, input_format: str, sigpaths: list[Path]):
"""
# lazy import enables us to not require viv if user wants another backend.
import envi.exc
import viv_utils
import viv_utils.flirt
logger.debug("generating vivisect workspace for: %s", path)
if input_format in (FORMAT_PE, FORMAT_AUTO) and _is_probably_corrupt_pe(path):
raise CorruptFile(
"PE file appears to contain unrealistically large sections and is likely corrupt"
+ " - skipping analysis to avoid excessive resource usage."
)
try:
if input_format == FORMAT_AUTO:
if not is_supported_format(path):
@@ -175,11 +222,20 @@ def get_workspace(path: Path, input_format: str, sigpaths: list[Path]):
vw = viv_utils.getShellcodeWorkspaceFromFile(str(path), arch="amd64", analyze=False)
else:
raise ValueError("unexpected format: " + input_format)
except envi.exc.SegmentationViolation as e:
raise CorruptFile(f"Invalid memory access during binary parsing: {e}") from e
except Exception as e:
# vivisect raises raw Exception instances, and we don't want
# to do a subclass check via isinstance.
if type(e) is Exception and "Couldn't convert rva" in e.args[0]:
raise CorruptFile(e.args[0]) from e
if type(e) is Exception and e.args:
error_msg = str(e.args[0])
if "Couldn't convert rva" in error_msg:
raise CorruptFile(error_msg) from e
elif "Unsupported Architecture" in error_msg:
# Extract architecture number if available
arch_info = e.args[1] if len(e.args) > 1 else "unknown"
raise CorruptFile(f"Unsupported architecture: {arch_info}") from e
raise
viv_utils.flirt.register_flirt_signature_analyzers(vw, [str(s) for s in sigpaths])
@@ -338,12 +394,24 @@ def get_extractor(
import capa.features.extractors.ida.extractor
logger.debug("idalib: opening database...")
# idalib writes to stdout (ugh), so we have to capture that
# so as not to screw up structured output.
with capa.helpers.stdout_redirector(io.BytesIO()):
with console.status("analyzing program...", spinner="dots"):
if idapro.open_database(str(input_path), run_auto_analysis=True):
raise RuntimeError("failed to analyze input file")
idapro.enable_console_messages(False)
with console.status("analyzing program...", spinner="dots"):
# we set the primary and secondary Lumina servers to 0.0.0.0 to disable Lumina,
# which sometimes provides bad names, including overwriting names from debug info.
#
# use -R to load resources, which can help us embedded PE files.
#
# return values from open_database:
# 0 - Success
# 2 - User cancelled or 32-64 bit conversion failed
# 4 - Database initialization failed
# -1 - Generic errors (database already open, auto-analysis failed, etc.)
# -2 - User cancelled operation
ret = idapro.open_database(
str(input_path), run_auto_analysis=True, args="-Olumina:host=0.0.0.0 -Osecondary_lumina:host=0.0.0.0 -R"
)
if ret != 0:
raise RuntimeError("failed to analyze input file")
logger.debug("idalib: waiting for analysis...")
ida_auto.auto_wait()
@@ -351,6 +419,69 @@ def get_extractor(
return capa.features.extractors.ida.extractor.IdaFeatureExtractor()
elif backend == BACKEND_GHIDRA:
import pyghidra
with console.status("analyzing program...", spinner="dots"):
if not pyghidra.started():
pyghidra.start()
import capa.ghidra.helpers
if not capa.ghidra.helpers.is_supported_ghidra_version():
raise RuntimeError("unsupported Ghidra version")
import tempfile
tmpdir = tempfile.TemporaryDirectory()
project_cm = pyghidra.open_project(tmpdir.name, "CapaProject", create=True)
project = project_cm.__enter__()
try:
from ghidra.util.task import TaskMonitor
monitor = TaskMonitor.DUMMY
# Import file
loader = pyghidra.program_loader().project(project).source(str(input_path)).name(input_path.name)
with loader.load() as load_results:
load_results.save(monitor)
# Open program
program, consumer = pyghidra.consume_program(project, "/" + input_path.name)
# Analyze
pyghidra.analyze(program, monitor)
from ghidra.program.flatapi import FlatProgramAPI
flat_api = FlatProgramAPI(program)
import capa.features.extractors.ghidra.context as ghidra_context
ghidra_context.set_context(program, flat_api, monitor)
# Wrapper to handle cleanup of program (consumer) and project
class GhidraContextWrapper:
def __init__(self, project_cm, program, consumer):
self.project_cm = project_cm
self.program = program
self.consumer = consumer
def __exit__(self, exc_type, exc_val, exc_tb):
self.program.release(self.consumer)
self.project_cm.__exit__(exc_type, exc_val, exc_tb)
cm = GhidraContextWrapper(project_cm, program, consumer)
except Exception:
project_cm.__exit__(None, None, None)
tmpdir.cleanup()
raise
import capa.features.extractors.ghidra.extractor
return capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor(ctx_manager=cm, tmpdir=tmpdir)
else:
raise ValueError("unexpected backend: " + backend)

View File

@@ -55,6 +55,7 @@ from capa.loader import (
BACKEND_VMRAY,
BACKEND_DOTNET,
BACKEND_FREEZE,
BACKEND_GHIDRA,
BACKEND_PEFILE,
BACKEND_DRAKVUF,
BACKEND_BINEXPORT2,
@@ -298,6 +299,7 @@ def install_common_args(parser, wanted=None):
(BACKEND_BINJA, "Binary Ninja"),
(BACKEND_DOTNET, ".NET"),
(BACKEND_BINEXPORT2, "BinExport2"),
(BACKEND_GHIDRA, "Ghidra"),
(BACKEND_FREEZE, "capa freeze"),
(BACKEND_CAPE, "CAPE"),
(BACKEND_DRAKVUF, "DRAKVUF"),
@@ -392,6 +394,7 @@ class ShouldExitError(Exception):
"""raised when a main-related routine indicates the program should exit."""
def __init__(self, status_code: int):
super().__init__(status_code)
self.status_code = status_code
@@ -658,7 +661,9 @@ def get_rules_from_cli(args) -> RuleSet:
raises:
ShouldExitError: if the program is invoked incorrectly and should exit.
"""
enable_cache: bool = True
enable_cache: bool = getattr(args, "enable_cache", True)
# this allows calling functions to easily disable rule caching, e.g., used by the rule linter to avoid
try:
if capa.helpers.is_running_standalone() and args.is_default_rules:
cache_dir = get_default_root() / "cache"
@@ -940,8 +945,7 @@ def main(argv: Optional[list[str]] = None):
argv = sys.argv[1:]
desc = "The FLARE team's open-source tool to identify capabilities in executable files."
epilog = textwrap.dedent(
"""
epilog = textwrap.dedent("""
By default, capa uses a default set of embedded rules.
You can see the rule set here:
https://github.com/mandiant/capa-rules
@@ -968,8 +972,7 @@ def main(argv: Optional[list[str]] = None):
filter rules by meta fields, e.g. rule name or namespace
capa -t "create TCP socket" suspicious.exe
"""
)
""")
parser = argparse.ArgumentParser(
description=desc, epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter
@@ -1104,14 +1107,26 @@ def ida_main():
def ghidra_main():
from ghidra.program.flatapi import FlatProgramAPI
import capa.rules
import capa.ghidra.helpers
import capa.render.default
import capa.features.extractors.ghidra.context
import capa.features.extractors.ghidra.extractor
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
# These are provided by the Ghidra scripting environment
# but are not available when running standard python
# so we have to ignore the linting errors
program = currentProgram # type: ignore [name-defined] # noqa: F821
monitor_ = monitor # type: ignore [name-defined] # noqa: F821
flat_api = FlatProgramAPI(program)
capa.features.extractors.ghidra.context.set_context(program, flat_api, monitor_)
logger.debug("-" * 80)
logger.debug(" Using default embedded rules.")
logger.debug(" ")

View File

@@ -31,6 +31,7 @@ $ protoc.exe --python_out=. --mypy_out=. <path_to_proto> (e.g. capa/render/proto
Alternatively, --pyi_out=. can be used to generate a Python Interface file that supports development
"""
import datetime
from typing import Any, Union

View File

@@ -17,6 +17,7 @@ import io
from typing import Union, Iterator, Optional
import rich.console
from rich.markup import escape
from rich.progress import Text
import capa.render.result_document as rd
@@ -24,21 +25,21 @@ import capa.render.result_document as rd
def bold(s: str) -> Text:
"""draw attention to the given string"""
return Text.from_markup(f"[cyan]{s}")
return Text.from_markup(f"[cyan]{escape(s)}")
def bold2(s: str) -> Text:
"""draw attention to the given string, within a `bold` section"""
return Text.from_markup(f"[green]{s}")
return Text.from_markup(f"[green]{escape(s)}")
def mute(s: str) -> Text:
"""draw attention away from the given string"""
return Text.from_markup(f"[dim]{s}")
return Text.from_markup(f"[dim]{escape(s)}")
def warn(s: str) -> Text:
return Text.from_markup(f"[yellow]{s}")
return Text.from_markup(f"[yellow]{escape(s)}")
def format_parts_id(data: Union[rd.AttackSpec, rd.MBCSpec]):

View File

@@ -159,9 +159,8 @@ def render_call(layout: rd.DynamicLayout, addr: frz.Address) -> str:
s.append(f"){rest}")
newline = "\n"
return (
f"{pname}{{pid:{call.thread.process.pid},tid:{call.thread.tid},call:{call.id}}}\n{rutils.mute(newline.join(s))}"
)
# Use default (non-dim) styling for API details so they remain readable in -vv output
return f"{pname}{{pid:{call.thread.process.pid},tid:{call.thread.tid},call:{call.id}}}\n{newline.join(s)}"
def render_short_call(layout: rd.DynamicLayout, addr: frz.Address) -> str:
@@ -180,7 +179,8 @@ def render_short_call(layout: rd.DynamicLayout, addr: frz.Address) -> str:
s.append(f"){rest}")
newline = "\n"
return f"call:{call.id}\n{rutils.mute(newline.join(s))}"
# Use default (non-dim) styling for API details so they remain readable in -vv output
return f"call:{call.id}\n{newline.join(s)}"
def render_static_meta(console: Console, meta: rd.StaticMetadata):

View File

@@ -172,7 +172,7 @@ def render_statement(console: Console, layout: rd.Layout, match: rd.Match, state
console.write(f"{statement.min}")
elif statement.min == 0:
console.write(f"{statement.max} or fewer")
elif statement.max == (1 << 64 - 1):
elif statement.max == ((1 << 64) - 1):
console.write(f"{statement.min} or more")
else:
console.write(f"between {statement.min} and {statement.max}")

View File

@@ -274,12 +274,8 @@ SUPPORTED_FEATURES[Scope.FUNCTION].update(SUPPORTED_FEATURES[Scope.BASIC_BLOCK])
class InvalidRule(ValueError):
def __init__(self, msg):
super().__init__()
self.msg = msg
def __str__(self):
return f"invalid rule: {self.msg}"
return f"invalid rule: {super().__str__()}"
def __repr__(self):
return str(self)
@@ -289,20 +285,15 @@ class InvalidRuleWithPath(InvalidRule):
def __init__(self, path, msg):
super().__init__(msg)
self.path = path
self.msg = msg
self.__cause__ = None
def __str__(self):
return f"invalid rule: {self.path}: {self.msg}"
return f"invalid rule: {self.path}: {super(InvalidRule, self).__str__()}"
class InvalidRuleSet(ValueError):
def __init__(self, msg):
super().__init__()
self.msg = msg
def __str__(self):
return f"invalid rule set: {self.msg}"
return f"invalid rule set: {super().__str__()}"
def __repr__(self):
return str(self)
@@ -1102,15 +1093,15 @@ class Rule:
@lru_cache()
def _get_yaml_loader():
try:
# prefer to use CLoader to be fast, see #306
# prefer to use CLoader to be fast, see #306 / CSafeLoader is the same as CLoader but with safe loading
# on Linux, make sure you install libyaml-dev or similar
# on Windows, get WHLs from pyyaml.org/pypi
logger.debug("using libyaml CLoader.")
return yaml.CLoader
logger.debug("using libyaml CSafeLoader.")
return yaml.CSafeLoader
except Exception:
logger.debug("unable to import libyaml CLoader, falling back to Python yaml parser.")
logger.debug("unable to import libyaml CSafeLoader, falling back to Python yaml parser.")
logger.debug("this will be slower to load rules.")
return yaml.Loader
return yaml.SafeLoader
@staticmethod
def _get_ruamel_yaml_parser():
@@ -1152,6 +1143,8 @@ class Rule:
else:
# use pyyaml because it can be much faster than ruamel (pure python)
doc = yaml.load(s, Loader=cls._get_yaml_loader())
if doc is None or not isinstance(doc, dict) or "rule" not in doc:
raise InvalidRule("empty or invalid YAML document")
return cls.from_dict(doc, s)
@classmethod
@@ -1456,6 +1449,13 @@ class RuleSet:
scope: self._index_rules_by_feature(scope, self.rules_by_scope[scope], scores_by_rule) for scope in scopes
}
# Pre-compute the topological index mapping for each scope.
# This avoids rebuilding the dict on every call to _match (which runs once per
# instruction/basic-block/function/file scope, i.e. potentially millions of times).
self._rule_index_by_scope: dict[Scope, dict[str, int]] = {
scope: {rule.name: i for i, rule in enumerate(self.rules_by_scope[scope])} for scope in scopes
}
@property
def file_rules(self):
return self.rules_by_scope[Scope.FILE]
@@ -1885,11 +1885,13 @@ class RuleSet:
"""
done = []
# use a queue of rules, because we'll be modifying the list (appending new items) as we go.
while rules:
rule = rules.pop(0)
# use a list as a stack: append new items and pop() from the end, both O(1).
# order doesn't matter here since every rule in the queue is processed eventually.
rules_stack = list(rules)
while rules_stack:
rule = rules_stack.pop()
for subscope_rule in rule.extract_subscope_rules():
rules.append(subscope_rule)
rules_stack.append(subscope_rule)
done.append(rule)
return done
@@ -1938,11 +1940,11 @@ class RuleSet:
"""
feature_index: RuleSet._RuleFeatureIndex = self._feature_indexes_by_scopes[scope]
rules: list[Rule] = self.rules_by_scope[scope]
# Topologic location of rule given its name.
# That is, rules with a lower index should be evaluated first, since their dependencies
# will be evaluated later.
rule_index_by_rule_name = {rule.name: i for i, rule in enumerate(rules)}
# Pre-computed in __init__ to avoid rebuilding on every _match call.
rule_index_by_rule_name = self._rule_index_by_scope[scope]
# This algorithm is optimized to evaluate as few rules as possible,
# because the less work we do, the faster capa can run.
@@ -2038,7 +2040,9 @@ class RuleSet:
candidate_rules = [self.rules[name] for name in candidate_rule_names]
# Order rules topologically, so that rules with dependencies work correctly.
# Sort descending so pop() from the end yields the topologically-first rule in O(1).
RuleSet._sort_rules_by_index(rule_index_by_rule_name, candidate_rules)
candidate_rules.reverse()
#
# The following is derived from ceng.match
@@ -2053,7 +2057,7 @@ class RuleSet:
augmented_features = features
while candidate_rules:
rule = candidate_rules.pop(0)
rule = candidate_rules.pop()
res = rule.evaluate(augmented_features, short_circuit=True)
if res:
# we first matched the rule with short circuiting enabled.
@@ -2092,6 +2096,7 @@ class RuleSet:
candidate_rule_names.update(new_candidates)
candidate_rules.extend([self.rules[rule_name] for rule_name in new_candidates])
RuleSet._sort_rules_by_index(rule_index_by_rule_name, candidate_rules)
candidate_rules.reverse()
return (augmented_features, results)
@@ -2228,7 +2233,10 @@ def get_rules(
try:
rule = Rule.from_yaml(content.decode("utf-8"))
except InvalidRule:
except InvalidRule as e:
if e.args and e.args[0] == "empty or invalid YAML document":
logger.warning("skipping %s: %s", path, e)
continue
raise
else:
rule.meta["capa/path"] = path.as_posix()

View File

@@ -59,7 +59,7 @@ def get_default_cache_directory() -> Path:
# MacOS: ~/Library/Caches/capa
# ref: https://stackoverflow.com/a/8220141/87207
if sys.platform == "linux" or sys.platform == "linux2":
if sys.platform.startswith(("linux", "freebsd", "openbsd", "netbsd", "dragonfly")):
directory = Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache" / "capa"))
elif sys.platform == "darwin":
directory = Path.home() / "Library" / "Caches" / "capa"

View File

@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
__version__ = "9.3.0"
__version__ = "9.3.1"
def get_major_version():

Binary file not shown.

Before

Width:  |  Height:  |  Size: 210 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 110 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 79 KiB

View File

@@ -2,6 +2,21 @@
See `capa -h` for all supported arguments and usage examples.
## Ways to consume capa output
| Method | Output / interface | Typical use |
|--------|--------------------|-------------|
| **CLI** | Text (default, `-v`, `-vv`), JSON (`-j`), or other formats | Scripting, CI, one-off analysis |
| [**IDA Pro**](https://github.com/mandiant/capa/tree/master/capa/ida/plugin) | capa Explorer plugin inside IDA | Interactive analysis with jump-to-address |
| [**Ghidra**](https://github.com/mandiant/capa/tree/master/capa/ghidra/plugin) | capa Explorer plugin inside Ghidra | Interactive analysis with Ghidra integration |
| [**Binary Ninja**](https://github.com/mandiant/capa/tree/master/capa/features/extractors/binja) | capa run using Binary Ninja as the analysis backend | Interactive analysis with Binary Ninja integration |
| [**Dynamic (Sandbox)**](https://www.mandiant.com/resources/blog/dynamic-capa-executable-behavior-cape-sandbox) | capa run on dynamic sandbox report (CAPE, VMRay, etc.) | Dynamic analysis of sandbox output |
| [**Web (capa Explorer)**](https://mandiant.github.io/capa/explorer/) | Web UI (upload JSON or load from URL) | Sharing results, viewing from VirusTotal or similar |
## Default vs verbose output
By default, capa shows only *top-level* rule matches: capabilities that are not already implied by another displayed rule. For example, if a rule "persist via Run registry key" matches and it *contains* a match for "set registry value", the default output lists only "persist via Run registry key". This keeps the default output short while still reflecting all detected capabilities at the top level. Use **`-v`** to see all rule matches, including nested ones. Use **`-vv`** for an even more detailed view that shows how each rule matched.
## tips and tricks
### only run selected rules
@@ -11,7 +26,7 @@ For example, `capa -t william.ballenthin@mandiant.com` runs rules that reference
### only analyze selected functions
Use the `--restrict-to-functions` option to extract capabilities from only a selected set of functions. This is useful for analyzing
large functions and figuring out their capabilities and their address of occurance; for example: PEB access, RC4 encryption, etc.
large functions and figuring out their capabilities and their address of occurrence; for example: PEB access, RC4 encryption, etc.
To use this, you can copy the virtual addresses from your favorite disassembler and pass them to capa as follows:
`capa sample.exe --restrict-to-functions 0x4019C0,0x401CD0`. If you add the `-v` option then capa will extract the interesting parts of a function for you.

View File

@@ -74,6 +74,7 @@ dependencies = [
# comments and context.
"pyyaml>=6",
"colorama>=0.4",
"ida-netnode>=3.0",
"ida-settings>=3.1.0",
"ruamel.yaml>=0.18",
"pefile>=2023.2.7",
@@ -108,6 +109,13 @@ dependencies = [
]
dynamic = ["version"]
[tool.pytest.ini_options]
filterwarnings = [
"ignore:builtin type SwigPyPacked has no __module__ attribute:DeprecationWarning",
"ignore:builtin type SwigPyObject has no __module__ attribute:DeprecationWarning",
"ignore:builtin type swigvarlink has no __module__ attribute:DeprecationWarning",
]
[tool.setuptools.dynamic]
version = {attr = "capa.version.__version__"}
@@ -121,55 +129,58 @@ dev = [
# we want all developer environments to be consistent.
# These dependencies are not used in production environments
# and should not conflict with other libraries/tooling.
"pre-commit==4.2.0",
"pytest==8.0.0",
"pre-commit==4.5.0",
"pytest==9.0.2",
"pytest-sugar==1.1.1",
"pytest-instafail==0.5.0",
"flake8==7.3.0",
"flake8-bugbear==24.12.12",
"flake8-bugbear==25.11.29",
"flake8-encodings==0.5.1",
"flake8-comprehensions==3.16.0",
"flake8-comprehensions==3.17.0",
"flake8-logging-format==0.9.0",
"flake8-no-implicit-concat==0.3.5",
"flake8-print==5.0.0",
"flake8-todos==0.3.1",
"flake8-simplify==0.22.0",
"flake8-simplify==0.30.0",
"flake8-use-pathlib==0.3.0",
"flake8-copyright==0.2.4",
"ruff==0.12.0",
"black==25.1.0",
"isort==6.0.0",
"mypy==1.17.1",
"mypy-protobuf==3.6.0",
"PyGithub==2.6.0",
"ruff==0.15.0",
"black==26.3.0",
"isort==8.0.0",
"mypy==1.19.1",
"mypy-protobuf==5.0.0",
"PyGithub==2.8.1",
"bump-my-version==1.2.4",
# type stubs for mypy
"types-backports==0.1.3",
"types-colorama==0.4.15.11",
"types-PyYAML==6.0.8",
"types-psutil==7.0.0.20250218",
"types-psutil==7.2.0.20251228",
"types_requests==2.32.0.20240712",
"types-protobuf==6.32.1.20250918",
"deptry==0.23.0"
"deptry==0.24.0"
]
build = [
# Dev and build dependencies are not relaxed because
# we want all developer environments to be consistent.
# These dependencies are not used in production environments
# and should not conflict with other libraries/tooling.
"pyinstaller==6.14.1",
"setuptools==80.9.0",
"build==1.3.0"
"pyinstaller==6.19.0",
"setuptools==80.10.1",
"build==1.4.0"
]
scripts = [
# can (optionally) be more lenient on dependencies here
# see comment on dependencies for more context
"jschema_to_python==1.2.3",
"psutil==7.1.2",
"psutil==7.2.1",
"stix2==3.0.1",
"sarif_om==1.0.4",
"requests>=2.32.4",
]
ghidra = [
"pyghidra>=3.0.0",
]
[tool.deptry]
extend_exclude = [

View File

@@ -10,39 +10,40 @@ annotated-types==0.7.0
colorama==0.4.6
cxxfilt==0.3.0
dncil==1.0.2
dnfile==0.17.0
dnfile==0.18.0
funcy==2.0
humanize==4.13.0
humanize==4.15.0
ida-netnode==3.0
ida-settings==3.2.2
intervaltree==3.1.0
intervaltree==3.2.1
markdown-it-py==4.0.0
mdurl==0.1.2
msgpack==1.0.8
networkx==3.4.2
pefile==2024.8.26
pip==25.3
protobuf==6.31.1
pip==26.0
protobuf==7.34.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser==2.22
pycparser==3.0
pydantic==2.12.4
# pydantic pins pydantic-core,
# but dependabot updates these separately (which is broken) and is annoying,
# so we rely on pydantic to pull in the right version of pydantic-core.
# pydantic-core==2.23.4
xmltodict==0.14.2
xmltodict==1.0.2
pyelftools==0.32
pygments==2.19.1
pyghidra==3.0.0
python-flirt==0.9.2
pyyaml==6.0.2
rich==14.2.0
ruamel-yaml==0.18.6
rich==14.3.2
ruamel-yaml==0.19.1
ruamel-yaml-clib==0.2.14
setuptools==80.9.0
setuptools==80.10.1
six==1.17.0
sortedcontainers==2.4.0
viv-utils==0.8.0
vivisect==1.2.1
msgspec==0.19.0
vivisect==1.3.0
msgspec==0.20.0
bump-my-version==1.2.4

2
rules

Submodule rules updated: b0b486fe0c...03a20f69ae

View File

@@ -61,6 +61,7 @@ usage:
parallelism factor
--no-mp disable subprocesses
"""
import sys
import json
import logging

View File

@@ -28,6 +28,7 @@ Requires:
- sarif_om 1.0.4
- jschema_to_python 1.2.3
"""
import sys
import json
import logging

View File

@@ -32,6 +32,7 @@ Example:
│00000070│ 39 31 37 36 61 64 36 38 ┊ 32 66 66 64 64 36 35 66 │9176ad68┊2ffdd65f│
│00000080│ 30 61 36 36 39 12 28 61 ┊ 34 62 33 35 64 65 37 31 │0a669•(a┊4b35de71│
"""
import sys
import logging
import argparse

View File

@@ -18,6 +18,7 @@ detect-elf-os
Attempt to detect the underlying OS that the given ELF file targets.
"""
import sys
import logging
import argparse

View File

@@ -36,6 +36,7 @@ Check the log window for any errors, and/or the summary of changes.
Derived from: https://github.com/mandiant/capa/blob/master/scripts/import-to-ida.py
"""
import os
import json
from pathlib import Path

View File

@@ -1229,6 +1229,7 @@ def main(argv=None):
time0 = time.time()
args.enable_cache = False
try:
rules = capa.main.get_rules_from_cli(args)
except capa.main.ShouldExitError as e:

View File

@@ -54,6 +54,7 @@ Example::
0x44cb60: ?
0x44cba0: __guard_icall_checks_enforced
"""
import sys
import logging
import argparse

View File

@@ -16,6 +16,7 @@
"""
Extract files relevant to capa analysis from VMRay Analysis Archive and create a new ZIP file.
"""
import sys
import logging
import zipfile

View File

@@ -43,6 +43,7 @@ example:
^^^ --label or git hash
"""
import sys
import timeit
import logging

View File

@@ -34,6 +34,7 @@ Example:
│00000080│ 30 61 36 36 39 12 28 61 ┊ 34 62 33 35 64 65 37 31 │0a669•(a┊4b35de71│
"""
import sys
import logging
import argparse

View File

@@ -37,6 +37,7 @@ Example:
────┴────────────────────────────────────────────────────
"""
import sys
import logging
import argparse

View File

@@ -46,6 +46,7 @@ Example:
2022-01-24 22:35:39,839 [INFO] Starting extraction...
2022-01-24 22:35:42,632 [INFO] Writing results to linter-data.json
"""
import json
import logging
import argparse

View File

@@ -54,6 +54,7 @@ Example::
- connect TCP socket
...
"""
import sys
import logging
import argparse

View File

@@ -70,6 +70,7 @@ Example::
insn: 0x10001027: mnemonic(shl)
...
"""
import sys
import logging
import argparse

View File

@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import contextlib
import collections
from pathlib import Path
@@ -20,7 +20,7 @@ from functools import lru_cache
import pytest
import capa.main
import capa.loader
import capa.features.file
import capa.features.insn
import capa.features.common
@@ -53,6 +53,7 @@ from capa.features.extractors.base_extractor import (
)
from capa.features.extractors.dnfile.extractor import DnfileFeatureExtractor
logger = logging.getLogger(__name__)
CD = Path(__file__).resolve().parent
DOTNET_DIR = CD / "data" / "dotnet"
DNFILE_TESTFILES = DOTNET_DIR / "dnfile-testfiles"
@@ -200,6 +201,73 @@ def get_binja_extractor(path: Path):
return extractor
# we can't easily cache this because the extractor relies on global state (the opened database)
# which also has to be closed elsewhere. so, the idalib tests will just take a little bit to run.
def get_idalib_extractor(path: Path):
import capa.features.extractors.ida.idalib as idalib
if not idalib.has_idalib():
raise RuntimeError("cannot find IDA idalib module.")
if not idalib.load_idalib():
raise RuntimeError("failed to load IDA idalib module.")
import idapro
import ida_auto
import capa.features.extractors.ida.extractor
logger.debug("idalib: opening database...")
idapro.enable_console_messages(False)
# we set the primary and secondary Lumina servers to 0.0.0.0 to disable Lumina,
# which sometimes provides bad names, including overwriting names from debug info.
#
# use -R to load resources, which can help us embedded PE files.
#
# return values from open_database:
# 0 - Success
# 2 - User cancelled or 32-64 bit conversion failed
# 4 - Database initialization failed
# -1 - Generic errors (database already open, auto-analysis failed, etc.)
# -2 - User cancelled operation
ret = idapro.open_database(
str(path), run_auto_analysis=True, args="-Olumina:host=0.0.0.0 -Osecondary_lumina:host=0.0.0.0 -R"
)
if ret != 0:
raise RuntimeError("failed to analyze input file")
logger.debug("idalib: waiting for analysis...")
ida_auto.auto_wait()
logger.debug("idalib: opened database.")
extractor = capa.features.extractors.ida.extractor.IdaFeatureExtractor()
fixup_idalib(path, extractor)
return extractor
def fixup_idalib(path: Path, extractor):
"""
IDA fixups to overcome differences between backends
"""
import idaapi
import ida_funcs
def remove_library_id_flag(fva):
f = idaapi.get_func(fva)
f.flags &= ~ida_funcs.FUNC_LIB
ida_funcs.update_func(f)
if "kernel32-64" in path.name:
# remove (correct) library function id, so we can test x64 thunk
remove_library_id_flag(0x1800202B0)
if "al-khaser_x64" in path.name:
# remove (correct) library function id, so we can test x64 nested thunk
remove_library_id_flag(0x14004B4F0)
@lru_cache(maxsize=1)
def get_cape_extractor(path):
from capa.helpers import load_json_from_path
@@ -227,13 +295,33 @@ def get_vmray_extractor(path):
return VMRayExtractor.from_zipfile(path)
@lru_cache(maxsize=1)
GHIDRA_CACHE: dict[Path, tuple] = {}
def get_ghidra_extractor(path: Path):
# we need to start PyGhidra before importing the extractor
# because the extractor imports Ghidra modules that are only available after PyGhidra is started
import pyghidra
if not pyghidra.started():
pyghidra.start()
import capa.features.extractors.ghidra.context
import capa.features.extractors.ghidra.extractor
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
setattr(extractor, "path", path.as_posix())
if path in GHIDRA_CACHE:
extractor, program, flat_api, monitor = GHIDRA_CACHE[path]
capa.features.extractors.ghidra.context.set_context(program, flat_api, monitor)
return extractor
# We use a larger cache size to avoid re-opening the same file multiple times
# which is very slow with Ghidra.
extractor = capa.loader.get_extractor(
path, FORMAT_AUTO, OS_AUTO, capa.loader.BACKEND_GHIDRA, [], disable_progress=True
)
ctx = capa.features.extractors.ghidra.context.get_context()
GHIDRA_CACHE[path] = (extractor, ctx.program, ctx.flat_api, ctx.monitor)
return extractor
@@ -894,20 +982,8 @@ FEATURE_PRESENCE_TESTS = sorted(
("mimikatz", "function=0x4556E5", capa.features.insn.API("advapi32.LsaQueryInformationPolicy"), False),
("mimikatz", "function=0x4556E5", capa.features.insn.API("LsaQueryInformationPolicy"), True),
# insn/api: x64
(
"kernel32-64",
"function=0x180001010",
capa.features.insn.API("RtlVirtualUnwind"),
True,
),
("kernel32-64", "function=0x180001010", capa.features.insn.API("RtlVirtualUnwind"), True),
# insn/api: x64 thunk
(
"kernel32-64",
"function=0x1800202B0",
capa.features.insn.API("RtlCaptureContext"),
True,
),
("kernel32-64", "function=0x1800202B0", capa.features.insn.API("RtlCaptureContext"), True),
# insn/api: x64 nested thunk
("al-khaser x64", "function=0x14004B4F0", capa.features.insn.API("__vcrt_GetModuleHandle"), True),
@@ -995,20 +1071,20 @@ FEATURE_PRESENCE_TESTS = sorted(
("pma16-01", "file", OS(OS_WINDOWS), True),
("pma16-01", "file", OS(OS_LINUX), False),
("mimikatz", "file", OS(OS_WINDOWS), True),
("pma16-01", "function=0x404356", OS(OS_WINDOWS), True),
("pma16-01", "function=0x404356,bb=0x4043B9", OS(OS_WINDOWS), True),
("pma16-01", "function=0x401100", OS(OS_WINDOWS), True),
("pma16-01", "function=0x401100,bb=0x401130", OS(OS_WINDOWS), True),
("mimikatz", "function=0x40105D", OS(OS_WINDOWS), True),
("pma16-01", "file", Arch(ARCH_I386), True),
("pma16-01", "file", Arch(ARCH_AMD64), False),
("mimikatz", "file", Arch(ARCH_I386), True),
("pma16-01", "function=0x404356", Arch(ARCH_I386), True),
("pma16-01", "function=0x404356,bb=0x4043B9", Arch(ARCH_I386), True),
("pma16-01", "function=0x401100", Arch(ARCH_I386), True),
("pma16-01", "function=0x401100,bb=0x401130", Arch(ARCH_I386), True),
("mimikatz", "function=0x40105D", Arch(ARCH_I386), True),
("pma16-01", "file", Format(FORMAT_PE), True),
("pma16-01", "file", Format(FORMAT_ELF), False),
("mimikatz", "file", Format(FORMAT_PE), True),
# format is also a global feature
("pma16-01", "function=0x404356", Format(FORMAT_PE), True),
("pma16-01", "function=0x401100", Format(FORMAT_PE), True),
("mimikatz", "function=0x456BB9", Format(FORMAT_PE), True),
# elf support
("7351f.elf", "file", OS(OS_LINUX), True),

View File

@@ -458,9 +458,7 @@ def test_pattern_parsing():
capture="#int",
)
assert (
BinExport2InstructionPatternMatcher.from_str(
"""
assert BinExport2InstructionPatternMatcher.from_str("""
# comment
br reg
br reg(not-stack)
@@ -481,10 +479,7 @@ def test_pattern_parsing():
call [reg * #int + #int]
call [reg + reg + #int]
call [reg + #int]
"""
).queries
is not None
)
""").queries is not None
def match_address(extractor: BinExport2FeatureExtractor, queries: BinExport2InstructionPatternMatcher, address: int):
@@ -507,8 +502,7 @@ def match_address_with_be2(
def test_pattern_matching():
queries = BinExport2InstructionPatternMatcher.from_str(
"""
queries = BinExport2InstructionPatternMatcher.from_str("""
br reg(stack) ; capture reg
br reg(not-stack) ; capture reg
mov reg0, reg1 ; capture reg0
@@ -522,8 +516,7 @@ def test_pattern_matching():
ldp|stp reg, reg, [reg, #int]! ; capture #int
ldp|stp reg, reg, [reg], #int ; capture #int
ldrb reg0, [reg1(not-stack), reg2] ; capture reg2
"""
)
""")
# 0x210184: ldrb w2, [x0, x1]
# query: ldrb reg0, [reg1(not-stack), reg2] ; capture reg2"
@@ -550,11 +543,9 @@ BE2_EXTRACTOR_687 = fixtures.get_binexport_extractor(
def test_pattern_matching_exclamation():
queries = BinExport2InstructionPatternMatcher.from_str(
"""
queries = BinExport2InstructionPatternMatcher.from_str("""
stp reg, reg, [reg, #int]! ; capture #int
"""
)
""")
# note this captures the sp
# 0x107918: stp x20, x19, [sp,0xFFFFFFFFFFFFFFE0]!
@@ -564,11 +555,9 @@ def test_pattern_matching_exclamation():
def test_pattern_matching_stack():
queries = BinExport2InstructionPatternMatcher.from_str(
"""
queries = BinExport2InstructionPatternMatcher.from_str("""
stp reg, reg, [reg(stack), #int]! ; capture #int
"""
)
""")
# note this does capture the sp
# compare this with the test above (exclamation)
@@ -579,11 +568,9 @@ def test_pattern_matching_stack():
def test_pattern_matching_not_stack():
queries = BinExport2InstructionPatternMatcher.from_str(
"""
queries = BinExport2InstructionPatternMatcher.from_str("""
stp reg, reg, [reg(not-stack), #int]! ; capture #int
"""
)
""")
# note this does not capture the sp
# compare this with the test above (exclamation)
@@ -597,11 +584,9 @@ BE2_EXTRACTOR_MIMI = fixtures.get_binexport_extractor(CD / "data" / "binexport2"
def test_pattern_matching_x86():
queries = BinExport2InstructionPatternMatcher.from_str(
"""
queries = BinExport2InstructionPatternMatcher.from_str("""
cmp|lea reg, [reg(not-stack) + #int0] ; capture #int0
"""
)
""")
# 0x4018c0: LEA ECX, [EBX+0x2]
# query: cmp|lea reg, [reg(not-stack) + #int0] ; capture #int0

View File

@@ -70,4 +70,4 @@ def test_standalone_binja_backend():
@pytest.mark.skipif(binja_present is False, reason="Skip binja tests if the binaryninja Python API is not installed")
def test_binja_version():
version = binaryninja.core_version_info()
assert version.major == 5 and version.minor == 1
assert version.major == 5 and version.minor == 2

View File

@@ -23,9 +23,7 @@ def test_match_across_scopes_file_function(z9324d_extractor):
rules = capa.rules.RuleSet(
[
# this rule should match on a function (0x4073F0)
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: install service
@@ -39,13 +37,9 @@ def test_match_across_scopes_file_function(z9324d_extractor):
- api: advapi32.OpenSCManagerA
- api: advapi32.CreateServiceA
- api: advapi32.StartServiceA
"""
)
),
""")),
# this rule should match on a file feature
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: .text section
@@ -56,15 +50,11 @@ def test_match_across_scopes_file_function(z9324d_extractor):
- 9324d1a8ae37a36ae560c37448c9705a
features:
- section: .text
"""
)
),
""")),
# this rule should match on earlier rule matches:
# - install service, with function scope
# - .text section, with file scope
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: .text section and install service
@@ -77,9 +67,7 @@ def test_match_across_scopes_file_function(z9324d_extractor):
- and:
- match: install service
- match: .text section
"""
)
),
""")),
]
)
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
@@ -92,9 +80,7 @@ def test_match_across_scopes(z9324d_extractor):
rules = capa.rules.RuleSet(
[
# this rule should match on a basic block (including at least 0x403685)
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: tight loop
@@ -105,14 +91,10 @@ def test_match_across_scopes(z9324d_extractor):
- 9324d1a8ae37a36ae560c37448c9705a:0x403685
features:
- characteristic: tight loop
"""
)
),
""")),
# this rule should match on a function (0x403660)
# based on API, as well as prior basic block rule match
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: kill thread loop
@@ -126,13 +108,9 @@ def test_match_across_scopes(z9324d_extractor):
- api: kernel32.TerminateThread
- api: kernel32.CloseHandle
- match: tight loop
"""
)
),
""")),
# this rule should match on a file feature and a prior function rule match
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: kill thread program
@@ -145,9 +123,7 @@ def test_match_across_scopes(z9324d_extractor):
- and:
- section: .text
- match: kill thread loop
"""
)
),
""")),
]
)
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
@@ -157,11 +133,7 @@ def test_match_across_scopes(z9324d_extractor):
def test_subscope_bb_rules(z9324d_extractor):
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -172,22 +144,14 @@ def test_subscope_bb_rules(z9324d_extractor):
- and:
- basic block:
- characteristic: tight loop
"""
)
)
]
)
"""))])
# tight loop at 0x403685
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
assert "test rule" in capabilities.matches
def test_match_specific_functions(z9324d_extractor):
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: receive data
@@ -199,11 +163,7 @@ def test_match_specific_functions(z9324d_extractor):
features:
- or:
- api: recv
"""
)
)
]
)
"""))])
extractor = FunctionFilter(z9324d_extractor, {0x4019C0})
capabilities = capa.capabilities.common.find_capabilities(rules, extractor)
matches = capabilities.matches["receive data"]
@@ -214,11 +174,7 @@ def test_match_specific_functions(z9324d_extractor):
def test_byte_matching(z9324d_extractor):
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: byte match test
@@ -228,21 +184,13 @@ def test_byte_matching(z9324d_extractor):
features:
- and:
- bytes: ED 24 9E F4 52 A9 07 47 55 8E E1 AB 30 8E 23 61
"""
)
)
]
)
"""))])
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
assert "byte match test" in capabilities.matches
def test_com_feature_matching(z395eb_extractor):
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: initialize IWebBrowser2
@@ -254,21 +202,13 @@ def test_com_feature_matching(z395eb_extractor):
- api: ole32.CoCreateInstance
- com/class: InternetExplorer #bytes: 01 DF 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = CLSID_InternetExplorer
- com/interface: IWebBrowser2 #bytes: 61 16 0C D3 AF CD D0 11 8A 3E 00 C0 4F C9 E2 6E = IID_IWebBrowser2
"""
)
)
]
)
"""))])
capabilities = capa.main.find_capabilities(rules, z395eb_extractor)
assert "initialize IWebBrowser2" in capabilities.matches
def test_count_bb(z9324d_extractor):
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: count bb
@@ -279,22 +219,14 @@ def test_count_bb(z9324d_extractor):
features:
- and:
- count(basic blocks): 1 or more
"""
)
)
]
)
"""))])
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
assert "count bb" in capabilities.matches
def test_instruction_scope(z9324d_extractor):
# .text:004071A4 68 E8 03 00 00 push 3E8h
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: push 1000
@@ -306,11 +238,7 @@ def test_instruction_scope(z9324d_extractor):
- and:
- mnemonic: push
- number: 1000
"""
)
)
]
)
"""))])
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
assert "push 1000" in capabilities.matches
assert 0x4071A4 in {result[0] for result in capabilities.matches["push 1000"]}
@@ -320,11 +248,7 @@ def test_instruction_subscope(z9324d_extractor):
# .text:00406F60 sub_406F60 proc near
# [...]
# .text:004071A4 68 E8 03 00 00 push 3E8h
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rules = capa.rules.RuleSet([capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: push 1000 on i386
@@ -338,11 +262,7 @@ def test_instruction_subscope(z9324d_extractor):
- instruction:
- mnemonic: push
- number: 1000
"""
)
)
]
)
"""))])
capabilities = capa.capabilities.common.find_capabilities(rules, z9324d_extractor)
assert "push 1000 on i386" in capabilities.matches
assert 0x406F60 in {result[0] for result in capabilities.matches["push 1000 on i386"]}

View File

@@ -81,8 +81,7 @@ def test_cape_extractor(version: str, filename: str, exception: Type[BaseExcepti
def test_cape_model_argument():
call = Call.model_validate_json(
"""
call = Call.model_validate_json("""
{
"timestamp": "2023-10-20 12:30:14,015",
"thread_id": "2380",
@@ -105,7 +104,6 @@ def test_cape_model_argument():
"repeated": 19,
"id": 0
}
"""
)
""")
assert call.arguments[0].value == 30
assert call.arguments[1].value == 0x30

View File

@@ -18,8 +18,7 @@ from capa.features.extractors.drakvuf.models import SystemCall
def test_syscall_argument_construction():
call_dictionary = json.loads(
r"""
call_dictionary = json.loads(r"""
{
"Plugin": "syscall",
"TimeStamp": "1716999134.581449",
@@ -43,8 +42,7 @@ def test_syscall_argument_construction():
"Timeout": "0xfffff506a02846d8",
"Alertable": "0x0"
}
"""
)
""")
call = SystemCall(**call_dictionary)
assert len(call.arguments) == call.nargs
assert call.arguments["IoCompletionHandle"] == "0xffffffff80001ac0"

View File

@@ -83,8 +83,7 @@ def get_call_ids(matches) -> Iterator[int]:
def test_dynamic_call_scope():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -93,8 +92,7 @@ def test_dynamic_call_scope():
dynamic: call
features:
- api: GetSystemTimeAsFileTime
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -116,8 +114,7 @@ def test_dynamic_call_scope():
def test_dynamic_span_scope():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -131,8 +128,7 @@ def test_dynamic_span_scope():
- api: LdrGetDllHandle
- api: LdrGetProcedureAddress
- count(api(LdrGetDllHandle)): 2
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -158,8 +154,7 @@ def test_dynamic_span_scope():
def test_dynamic_span_scope_length():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -170,8 +165,7 @@ def test_dynamic_span_scope_length():
- and:
- api: GetSystemTimeAsFileTime
- api: RtlAddVectoredExceptionHandler
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -196,8 +190,7 @@ def test_dynamic_span_scope_length():
def test_dynamic_span_call_subscope():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -210,8 +203,7 @@ def test_dynamic_span_call_subscope():
- and:
- api: LdrGetProcedureAddress
- string: AddVectoredExceptionHandler
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -234,8 +226,7 @@ def test_dynamic_span_call_subscope():
def test_dynamic_span_scope_span_subscope():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -256,8 +247,7 @@ def test_dynamic_span_scope_span_subscope():
- api: LdrGetDllHandle
- api: LdrGetProcedureAddress
- string: RemoveVectoredExceptionHandler
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -269,8 +259,7 @@ def test_dynamic_span_scope_span_subscope():
# show that you can't use thread subscope in span rules.
def test_dynamic_span_scope_thread_subscope():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -281,8 +270,7 @@ def test_dynamic_span_scope_thread_subscope():
- and:
- thread:
- string: "foo"
"""
)
""")
with pytest.raises(capa.rules.InvalidRule):
capa.rules.Rule.from_yaml(rule)
@@ -300,8 +288,7 @@ def test_dynamic_span_scope_thread_subscope():
def test_dynamic_span_example():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -319,8 +306,7 @@ def test_dynamic_span_example():
- api: LdrGetProcedureAddress
- string: "AddVectoredExceptionHandler"
- api: RtlAddVectoredExceptionHandler
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -345,8 +331,7 @@ def test_dynamic_span_example():
def test_dynamic_span_multiple_spans_overlapping_single_event():
extractor = get_0000a657_thread3064()
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -359,8 +344,7 @@ def test_dynamic_span_multiple_spans_overlapping_single_event():
- and:
- api: LdrGetProcedureAddress
- string: "AddVectoredExceptionHandler"
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
ruleset = capa.rules.RuleSet([r])
@@ -386,9 +370,7 @@ def test_dynamic_span_scope_match_statements():
ruleset = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: resolve add VEH
@@ -401,12 +383,8 @@ def test_dynamic_span_scope_match_statements():
- api: LdrGetDllHandle
- api: LdrGetProcedureAddress
- string: AddVectoredExceptionHandler
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: resolve remove VEH
@@ -419,12 +397,8 @@ def test_dynamic_span_scope_match_statements():
- api: LdrGetDllHandle
- api: LdrGetProcedureAddress
- string: RemoveVectoredExceptionHandler
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: resolve add and remove VEH
@@ -435,12 +409,8 @@ def test_dynamic_span_scope_match_statements():
- and:
- match: resolve add VEH
- match: resolve remove VEH
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: has VEH runtime linking
@@ -450,9 +420,7 @@ def test_dynamic_span_scope_match_statements():
features:
- and:
- match: linking/runtime-linking/veh
"""
)
),
""")),
]
)

View File

@@ -17,8 +17,7 @@ import textwrap
import capa.rules
EXPECTED = textwrap.dedent(
"""\
EXPECTED = textwrap.dedent("""\
rule:
meta:
name: test rule
@@ -34,13 +33,11 @@ EXPECTED = textwrap.dedent(
- and:
- number: 1
- number: 2
"""
)
""")
def test_rule_reformat_top_level_elements():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
features:
- and:
@@ -56,15 +53,13 @@ def test_rule_reformat_top_level_elements():
examples:
- foo1234
- bar5678
"""
)
""")
assert capa.rules.Rule.from_yaml(rule).to_yaml() == EXPECTED
def test_rule_reformat_indentation():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -80,15 +75,13 @@ def test_rule_reformat_indentation():
- and:
- number: 1
- number: 2
"""
)
""")
assert capa.rules.Rule.from_yaml(rule).to_yaml() == EXPECTED
def test_rule_reformat_order():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
authors:
@@ -104,8 +97,7 @@ def test_rule_reformat_order():
- and:
- number: 1
- number: 2
"""
)
""")
assert capa.rules.Rule.from_yaml(rule).to_yaml() == EXPECTED
@@ -113,8 +105,7 @@ def test_rule_reformat_order():
def test_rule_reformat_meta_update():
# test updating the rule content after parsing
src = textwrap.dedent(
"""
src = textwrap.dedent("""
rule:
meta:
authors:
@@ -130,8 +121,7 @@ def test_rule_reformat_meta_update():
- and:
- number: 1
- number: 2
"""
)
""")
rule = capa.rules.Rule.from_yaml(src)
rule.name = "test rule"
@@ -141,8 +131,7 @@ def test_rule_reformat_meta_update():
def test_rule_reformat_string_description():
# the `description` should be aligned with the preceding feature name.
# see #263
src = textwrap.dedent(
"""
src = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -155,8 +144,7 @@ def test_rule_reformat_string_description():
- and:
- string: foo
description: bar
"""
).lstrip()
""").lstrip()
rule = capa.rules.Rule.from_yaml(src)
assert rule.to_yaml() == src

View File

@@ -108,9 +108,7 @@ def test_null_feature_extractor():
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: create file
@@ -120,9 +118,7 @@ def test_null_feature_extractor():
features:
- and:
- api: CreateFile
"""
)
),
""")),
]
)
capabilities = capa.main.find_capabilities(rules, EXTRACTOR)

View File

@@ -88,9 +88,7 @@ def test_null_feature_extractor():
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: xor loop
@@ -102,9 +100,7 @@ def test_null_feature_extractor():
- characteristic: tight loop
- mnemonic: xor
- characteristic: nzxor
"""
)
),
""")),
]
)
capabilities = capa.main.find_capabilities(rules, EXTRACTOR)

View File

@@ -11,95 +11,42 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Must invoke this script from within the Ghidra Runtime Environment
"""
import sys
import logging
from pathlib import Path
import os
import importlib.util
import pytest
import fixtures
try:
sys.path.append(str(Path(__file__).parent))
import fixtures
finally:
sys.path.pop()
import capa.features.common
ghidra_present = importlib.util.find_spec("pyghidra") is not None and "GHIDRA_INSTALL_DIR" in os.environ
logger = logging.getLogger("test_ghidra_features")
ghidra_present: bool = False
try:
import ghidra # noqa: F401
ghidra_present = True
except ImportError:
pass
def standardize_posix_str(psx_str):
"""fixture test passes the PosixPath to the test data
params: psx_str - PosixPath() to the test data
return: string that matches test-id sample name
"""
if "Practical Malware Analysis Lab" in str(psx_str):
# <PosixPath>/'Practical Malware Analysis Lab 16-01.exe_' -> 'pma16-01'
wanted_str = "pma" + str(psx_str).split("/")[-1][len("Practical Malware Analysis Lab ") : -5]
else:
# <PosixPath>/mimikatz.exe_ -> mimikatz
wanted_str = str(psx_str).split("/")[-1][:-5]
if "_" in wanted_str:
# al-khaser_x86 -> al-khaser x86
wanted_str = wanted_str.replace("_", " ")
return wanted_str
def check_input_file(wanted):
"""check that test is running on the loaded sample
params: wanted - PosixPath() passed from test arg
"""
import capa.ghidra.helpers as ghidra_helpers
found = ghidra_helpers.get_file_md5()
sample_name = standardize_posix_str(wanted)
if not found.startswith(fixtures.get_sample_md5_by_name(sample_name)):
raise RuntimeError(f"please run the tests against sample with MD5: `{found}`")
@pytest.mark.skipif(ghidra_present is False, reason="Ghidra tests must be ran within Ghidra")
@fixtures.parametrize("sample,scope,feature,expected", fixtures.FEATURE_PRESENCE_TESTS, indirect=["sample", "scope"])
@pytest.mark.skipif(ghidra_present is False, reason="PyGhidra not installed")
@fixtures.parametrize(
"sample,scope,feature,expected",
[
(
pytest.param(
*t,
marks=pytest.mark.xfail(
reason="specific to Vivisect and basic blocks do not align with Ghidra's analysis"
),
)
if t[0] == "294b8d..." and t[2] == capa.features.common.String("\r\n\x00:ht")
else t
)
for t in fixtures.FEATURE_PRESENCE_TESTS
],
indirect=["sample", "scope"],
)
def test_ghidra_features(sample, scope, feature, expected):
try:
check_input_file(sample)
except RuntimeError:
pytest.skip(reason="Test must be ran against sample loaded in Ghidra")
fixtures.do_test_feature_presence(fixtures.get_ghidra_extractor, sample, scope, feature, expected)
@pytest.mark.skipif(ghidra_present is False, reason="Ghidra tests must be ran within Ghidra")
@pytest.mark.skipif(ghidra_present is False, reason="PyGhidra not installed")
@fixtures.parametrize(
"sample,scope,feature,expected", fixtures.FEATURE_COUNT_TESTS_GHIDRA, indirect=["sample", "scope"]
)
def test_ghidra_feature_counts(sample, scope, feature, expected):
try:
check_input_file(sample)
except RuntimeError:
pytest.skip(reason="Test must be ran against sample loaded in Ghidra")
fixtures.do_test_feature_count(fixtures.get_ghidra_extractor, sample, scope, feature, expected)
if __name__ == "__main__":
# No support for faulthandler module in Ghidrathon, see:
# https://github.com/mandiant/Ghidrathon/issues/70
sys.exit(pytest.main(["--pyargs", "-p no:faulthandler", "test_ghidra_features"]))

View File

@@ -1,187 +0,0 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
run this script from within IDA to test the IDA feature extractor.
you must have loaded a file referenced by a test case in order
for this to do anything meaningful. for example, mimikatz.exe from testfiles.
you can invoke from the command line like this:
& 'C:\\Program Files\\IDA Pro 8.2\\idat.exe' \
-S"C:\\Exclusions\\code\\capa\\tests\\test_ida_features.py --CAPA_AUTOEXIT=true" \
-A \
-Lidalog \
'C:\\Exclusions\\code\\capa\\tests\\data\\mimikatz.exe_'
if you invoke from the command line, and provide the script argument `--CAPA_AUTOEXIT=true`,
then the script will exit IDA after running the tests.
the output (in idalog) will look like this:
```
Loading processor module C:\\Program Files\\IDA Pro 8.2\\procs\\pc.dll for metapc...Initializing processor module metapc...OK
Loading type libraries...
Autoanalysis subsystem has been initialized.
Database for file 'mimikatz.exe_' has been loaded.
--------------------------------------------------------------------------------
PASS: test_ida_feature_counts/mimikatz-function=0x40E5C2-basic block-7
PASS: test_ida_feature_counts/mimikatz-function=0x4702FD-characteristic(calls from)-0
SKIP: test_ida_features/294b8d...-function=0x404970,bb=0x404970,insn=0x40499F-string(\r\n\x00:ht)-False
SKIP: test_ida_features/64d9f-function=0x10001510,bb=0x100015B0-offset(0x4000)-True
...
SKIP: test_ida_features/pma16-01-function=0x404356,bb=0x4043B9-arch(i386)-True
PASS: test_ida_features/mimikatz-file-import(cabinet.FCIAddFile)-True
DONE
C:\\Exclusions\\code\\capa\\tests\\test_ida_features.py: Traceback (most recent call last):
File "C:\\Program Files\\IDA Pro 8.2\\python\\3\\ida_idaapi.py", line 588, in IDAPython_ExecScript
exec(code, g)
File "C:/Exclusions/code/capa/tests/test_ida_features.py", line 120, in <module>
sys.exit(0)
SystemExit: 0
-> OK
Flushing buffers, please wait...ok
```
Look for lines that start with "FAIL" to identify test failures.
"""
import io
import sys
import inspect
import logging
import traceback
from pathlib import Path
import pytest
try:
sys.path.append(str(Path(__file__).parent))
import fixtures
finally:
sys.path.pop()
logger = logging.getLogger("test_ida_features")
def check_input_file(wanted):
import idautils
# some versions (7.4) of IDA return a truncated version of the MD5.
# https://github.com/idapython/bin/issues/11
try:
found = idautils.GetInputFileMD5()[:31].decode("ascii").lower()
except UnicodeDecodeError:
# in IDA 7.5 or so, GetInputFileMD5 started returning raw binary
# rather than the hex digest
found = bytes.hex(idautils.GetInputFileMD5()[:15]).lower()
if not wanted.startswith(found):
raise RuntimeError(f"please run the tests against sample with MD5: `{wanted}`")
def get_ida_extractor(_path):
# have to import this inline so pytest doesn't bail outside of IDA
import capa.features.extractors.ida.extractor
return capa.features.extractors.ida.extractor.IdaFeatureExtractor()
def nocollect(f):
"don't collect the decorated function as a pytest test"
f.__test__ = False
return f
# although these look like pytest tests, they're not, because they don't run within pytest
# (the runner is below) and they use `yield`, which is deprecated.
@nocollect
@pytest.mark.skip(reason="IDA Pro tests must be run within IDA")
def test_ida_features():
# we're guaranteed to be in a function here, so there's a stack frame
this_name = inspect.currentframe().f_code.co_name # type: ignore
for sample, scope, feature, expected in fixtures.FEATURE_PRESENCE_TESTS + fixtures.FEATURE_PRESENCE_TESTS_IDA:
id = fixtures.make_test_id((sample, scope, feature, expected))
try:
check_input_file(fixtures.get_sample_md5_by_name(sample))
except RuntimeError:
yield this_name, id, "skip", None
continue
scope = fixtures.resolve_scope(scope)
sample = fixtures.resolve_sample(sample)
try:
fixtures.do_test_feature_presence(get_ida_extractor, sample, scope, feature, expected)
except Exception:
f = io.StringIO()
traceback.print_exc(file=f)
yield this_name, id, "fail", f.getvalue()
else:
yield this_name, id, "pass", None
@nocollect
@pytest.mark.skip(reason="IDA Pro tests must be run within IDA")
def test_ida_feature_counts():
# we're guaranteed to be in a function here, so there's a stack frame
this_name = inspect.currentframe().f_code.co_name # type: ignore
for sample, scope, feature, expected in fixtures.FEATURE_COUNT_TESTS:
id = fixtures.make_test_id((sample, scope, feature, expected))
try:
check_input_file(fixtures.get_sample_md5_by_name(sample))
except RuntimeError:
yield this_name, id, "skip", None
continue
scope = fixtures.resolve_scope(scope)
sample = fixtures.resolve_sample(sample)
try:
fixtures.do_test_feature_count(get_ida_extractor, sample, scope, feature, expected)
except Exception:
f = io.StringIO()
traceback.print_exc(file=f)
yield this_name, id, "fail", f.getvalue()
else:
yield this_name, id, "pass", None
if __name__ == "__main__":
import idc
import ida_auto
ida_auto.auto_wait()
print("-" * 80)
# invoke all functions in this module that start with `test_`
for name in dir(sys.modules[__name__]):
if not name.startswith("test_"):
continue
test = getattr(sys.modules[__name__], name)
logger.debug("invoking test: %s", name)
sys.stderr.flush()
for name, id, state, info in test():
print(f"{state.upper()}: {name}/{id}")
if info:
print(info)
print("DONE")
if "--CAPA_AUTOEXIT=true" in idc.ARGV:
sys.exit(0)

View File

@@ -0,0 +1,86 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from pathlib import Path
import pytest
import fixtures
import capa.features.extractors.ida.idalib
from capa.features.file import FunctionName
from capa.features.insn import API
from capa.features.common import Characteristic
logger = logging.getLogger(__name__)
idalib_present = capa.features.extractors.ida.idalib.has_idalib()
if idalib_present:
try:
import idapro # noqa: F401 [imported but unused]
import ida_kernwin
kernel_version: str = ida_kernwin.get_kernel_version()
except ImportError:
idalib_present = False
kernel_version = "0.0"
@pytest.mark.skipif(idalib_present is False, reason="Skip idalib tests if the idalib Python API is not installed")
@fixtures.parametrize(
"sample,scope,feature,expected",
fixtures.FEATURE_PRESENCE_TESTS + fixtures.FEATURE_SYMTAB_FUNC_TESTS,
indirect=["sample", "scope"],
)
def test_idalib_features(sample: Path, scope, feature, expected):
if kernel_version in {"9.0", "9.1"} and sample.name.startswith("2bf18d"):
if isinstance(feature, (API, FunctionName)) and feature.value == "__libc_connect":
# see discussion here: https://github.com/mandiant/capa/pull/2742#issuecomment-3674146335
#
# > i confirmed that there were changes in 9.2 related to the ELF loader handling names,
# > so I think its reasonable to conclude that 9.1 and older had a bug that
# > prevented this name from surfacing.
pytest.xfail(f"IDA {kernel_version} does not extract all ELF symbols")
if kernel_version in {"9.0"} and sample.name.startswith("Practical Malware Analysis Lab 12-04.exe_"):
if isinstance(feature, Characteristic) and feature.value == "embedded pe":
# see discussion here: https://github.com/mandiant/capa/pull/2742#issuecomment-3667086165
#
# idalib for IDA 9.0 doesn't support argv arguments, so we can't ask that resources are loaded
pytest.xfail("idalib 9.0 does not support loading resource segments")
try:
fixtures.do_test_feature_presence(fixtures.get_idalib_extractor, sample, scope, feature, expected)
finally:
logger.debug("closing database...")
import idapro
idapro.close_database(save=False)
logger.debug("closed database.")
@pytest.mark.skipif(idalib_present is False, reason="Skip idalib tests if the idalib Python API is not installed")
@fixtures.parametrize(
"sample,scope,feature,expected",
fixtures.FEATURE_COUNT_TESTS,
indirect=["sample", "scope"],
)
def test_idalib_feature_counts(sample, scope, feature, expected):
try:
fixtures.do_test_feature_count(fixtures.get_idalib_extractor, sample, scope, feature, expected)
finally:
logger.debug("closing database...")
import idapro
idapro.close_database(save=False)
logger.debug("closed database.")

View File

@@ -0,0 +1,60 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from pathlib import Path
from unittest.mock import patch
import pytest
import envi.exc
from capa.loader import CorruptFile, get_workspace
from capa.features.common import FORMAT_PE, FORMAT_ELF
def test_segmentation_violation_handling():
"""
Test that SegmentationViolation from vivisect is caught and
converted to a CorruptFile exception.
See #2794.
"""
fake_path = Path("/tmp/fake_malformed.elf")
with patch("viv_utils.getWorkspace") as mock_workspace:
mock_workspace.side_effect = envi.exc.SegmentationViolation(
0x30A4B8BD60,
)
with pytest.raises(CorruptFile, match="Invalid memory access"):
get_workspace(fake_path, FORMAT_ELF, [])
def test_corrupt_pe_with_unrealistic_section_size_short_circuits():
"""
Test that a PE with an unrealistically large section virtual size
is caught early and raises CorruptFile before vivisect is invoked.
See #1989.
"""
fake_path = Path("/tmp/fake_corrupt.exe")
with (
patch("capa.loader._is_probably_corrupt_pe", return_value=True),
patch("viv_utils.getWorkspace") as mock_workspace,
):
with pytest.raises(CorruptFile, match="unrealistically large sections"):
get_workspace(fake_path, FORMAT_PE, [])
# vivisect should never have been called
mock_workspace.assert_not_called()

View File

@@ -38,8 +38,7 @@ def test_main(z9324d_extractor):
def test_main_single_rule(z9324d_extractor, tmpdir):
# tests a single rule can be loaded successfully
RULE_CONTENT = textwrap.dedent(
"""
RULE_CONTENT = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -50,8 +49,7 @@ def test_main_single_rule(z9324d_extractor, tmpdir):
- test
features:
- string: test
"""
)
""")
path = z9324d_extractor.path
rule_file = tmpdir.mkdir("capa").join("rule.yml")
rule_file.write(RULE_CONTENT)
@@ -100,9 +98,7 @@ def test_main_shellcode(z499c2_extractor):
def test_ruleset():
rules = capa.rules.RuleSet(
[
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: file rule
@@ -111,12 +107,8 @@ def test_ruleset():
dynamic: process
features:
- characteristic: embedded pe
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: function rule
@@ -125,12 +117,8 @@ def test_ruleset():
dynamic: process
features:
- characteristic: tight loop
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: basic block rule
@@ -139,12 +127,8 @@ def test_ruleset():
dynamic: process
features:
- characteristic: nzxor
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: process rule
@@ -153,12 +137,8 @@ def test_ruleset():
dynamic: process
features:
- string: "explorer.exe"
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: thread rule
@@ -167,12 +147,8 @@ def test_ruleset():
dynamic: thread
features:
- api: RegDeleteKey
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test call subscope
@@ -184,12 +160,8 @@ def test_ruleset():
- string: "explorer.exe"
- call:
- api: HttpOpenRequestW
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -207,9 +179,7 @@ def test_ruleset():
- number: 6 = IPPROTO_TCP
- number: 1 = SOCK_STREAM
- number: 2 = AF_INET
"""
)
),
""")),
]
)
assert len(rules.file_rules) == 2
@@ -322,9 +292,7 @@ def test_main_cape1(tmp_path):
# https://github.com/mandiant/capa/pull/1696
rules = tmp_path / "rules"
rules.mkdir()
(rules / "create-or-open-registry-key.yml").write_text(
textwrap.dedent(
"""
(rules / "create-or-open-registry-key.yml").write_text(textwrap.dedent("""
rule:
meta:
name: create or open registry key
@@ -354,9 +322,7 @@ def test_main_cape1(tmp_path):
- api: SHRegOpenUSKey
- api: SHRegCreateUSKey
- api: RtlCreateRegistryKey
"""
)
)
"""))
assert capa.main.main([str(path), "-r", str(rules)]) == 0
assert capa.main.main([str(path), "-q", "-r", str(rules)]) == 0

View File

@@ -46,8 +46,7 @@ def match(rules, features, va, scope=Scope.FUNCTION):
def test_match_simple():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -57,8 +56,7 @@ def test_match_simple():
namespace: testns1/testns2
features:
- number: 100
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
features, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
@@ -69,8 +67,7 @@ def test_match_simple():
def test_match_range_exact():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -79,8 +76,7 @@ def test_match_range_exact():
dynamic: process
features:
- count(number(100)): 2
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
# just enough matches
@@ -97,8 +93,7 @@ def test_match_range_exact():
def test_match_range_range():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -107,8 +102,7 @@ def test_match_range_range():
dynamic: process
features:
- count(number(100)): (2, 3)
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
# just enough matches
@@ -129,8 +123,7 @@ def test_match_range_range():
def test_match_range_exact_zero():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -146,8 +139,7 @@ def test_match_range_exact_zero():
# so we have this additional trivial feature.
- mnemonic: mov
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
# feature isn't indexed - good.
@@ -165,8 +157,7 @@ def test_match_range_exact_zero():
def test_match_range_with_zero():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -181,8 +172,7 @@ def test_match_range_with_zero():
# since we don't support top level NOT statements.
# so we have this additional trivial feature.
- mnemonic: mov
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
# ok
@@ -200,8 +190,7 @@ def test_match_range_with_zero():
def test_match_adds_matched_rule_feature():
"""show that using `match` adds a feature for matched rules."""
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -210,8 +199,7 @@ def test_match_adds_matched_rule_feature():
dynamic: process
features:
- number: 100
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
features, _ = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
assert capa.features.common.MatchedRule("test rule") in features
@@ -220,9 +208,7 @@ def test_match_adds_matched_rule_feature():
def test_match_matched_rules():
"""show that using `match` adds a feature for matched rules."""
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule1
@@ -231,12 +217,8 @@ def test_match_matched_rules():
dynamic: process
features:
- number: 100
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule2
@@ -245,9 +227,7 @@ def test_match_matched_rules():
dynamic: process
features:
- match: test rule1
"""
)
),
""")),
]
features, _ = match(
@@ -271,9 +251,7 @@ def test_match_matched_rules():
def test_match_namespace():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: CreateFile API
@@ -283,12 +261,8 @@ def test_match_namespace():
namespace: file/create/CreateFile
features:
- api: CreateFile
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: WriteFile API
@@ -298,12 +272,8 @@ def test_match_namespace():
namespace: file/write
features:
- api: WriteFile
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: file-create
@@ -312,12 +282,8 @@ def test_match_namespace():
dynamic: process
features:
- match: file/create
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: filesystem-any
@@ -326,9 +292,7 @@ def test_match_namespace():
dynamic: process
features:
- match: file
"""
)
),
""")),
]
features, matches = match(
@@ -355,9 +319,7 @@ def test_match_namespace():
def test_match_substring():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -367,9 +329,7 @@ def test_match_substring():
features:
- and:
- substring: abc
"""
)
),
""")),
]
features, _ = match(
capa.rules.topologically_order_rules(rules),
@@ -409,9 +369,7 @@ def test_match_substring():
def test_match_regex():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -421,12 +379,8 @@ def test_match_regex():
features:
- and:
- string: /.*bbbb.*/
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: rule with implied wildcards
@@ -436,12 +390,8 @@ def test_match_regex():
features:
- and:
- string: /bbbb/
"""
)
),
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
""")),
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: rule with anchor
@@ -451,9 +401,7 @@ def test_match_regex():
features:
- and:
- string: /^bbbb/
"""
)
),
""")),
]
features, _ = match(
capa.rules.topologically_order_rules(rules),
@@ -488,9 +436,7 @@ def test_match_regex():
def test_match_regex_ignorecase():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -500,9 +446,7 @@ def test_match_regex_ignorecase():
features:
- and:
- string: /.*bbbb.*/i
"""
)
),
""")),
]
features, _ = match(
capa.rules.topologically_order_rules(rules),
@@ -514,9 +458,7 @@ def test_match_regex_ignorecase():
def test_match_regex_complex():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
r"""
capa.rules.Rule.from_yaml(textwrap.dedent(r"""
rule:
meta:
name: test rule
@@ -526,9 +468,7 @@ def test_match_regex_complex():
features:
- or:
- string: /.*HARDWARE\\Key\\key with spaces\\.*/i
"""
)
),
""")),
]
features, _ = match(
capa.rules.topologically_order_rules(rules),
@@ -540,9 +480,7 @@ def test_match_regex_complex():
def test_match_regex_values_always_string():
rules = [
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
capa.rules.Rule.from_yaml(textwrap.dedent("""
rule:
meta:
name: test rule
@@ -553,9 +491,7 @@ def test_match_regex_values_always_string():
- or:
- string: /123/
- string: /0x123/
"""
)
),
""")),
]
features, _ = match(
capa.rules.topologically_order_rules(rules),
@@ -572,10 +508,22 @@ def test_match_regex_values_always_string():
assert capa.features.common.MatchedRule("test rule") in features
@pytest.mark.parametrize(
"pattern",
[
"/test\\.exe/",
"/hello/i",
"/foo\\\\bar/",
],
)
def test_regex_get_value_str(pattern):
# Regex.get_value_str() must return the raw pattern without escaping, see #1909.
assert capa.features.common.Regex(pattern).get_value_str() == pattern
@pytest.mark.xfail(reason="can't have top level NOT")
def test_match_only_not():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -586,8 +534,7 @@ def test_match_only_not():
features:
- not:
- number: 99
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
_, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
@@ -595,8 +542,7 @@ def test_match_only_not():
def test_match_not():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -609,8 +555,7 @@ def test_match_not():
- mnemonic: mov
- not:
- number: 99
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
_, matches = match([r], {capa.features.insn.Number(100): {1, 2}, capa.features.insn.Mnemonic("mov"): {1, 2}}, 0x0)
@@ -619,8 +564,7 @@ def test_match_not():
@pytest.mark.xfail(reason="can't have nested NOT")
def test_match_not_not():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -632,8 +576,7 @@ def test_match_not_not():
- not:
- not:
- number: 100
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
_, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
@@ -641,8 +584,7 @@ def test_match_not_not():
def test_match_operand_number():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -652,8 +594,7 @@ def test_match_operand_number():
features:
- and:
- operand[0].number: 0x10
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
assert capa.features.insn.OperandNumber(0, 0x10) in {capa.features.insn.OperandNumber(0, 0x10)}
@@ -671,8 +612,7 @@ def test_match_operand_number():
def test_match_operand_offset():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -682,8 +622,7 @@ def test_match_operand_offset():
features:
- and:
- operand[0].offset: 0x10
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
assert capa.features.insn.OperandOffset(0, 0x10) in {capa.features.insn.OperandOffset(0, 0x10)}
@@ -701,8 +640,7 @@ def test_match_operand_offset():
def test_match_property_access():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -712,8 +650,7 @@ def test_match_property_access():
features:
- and:
- property/read: System.IO.FileInfo::Length
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
assert capa.features.insn.Property("System.IO.FileInfo::Length", capa.features.common.FeatureAccess.READ) in {
@@ -745,8 +682,7 @@ def test_match_property_access():
def test_match_os_any():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -764,8 +700,7 @@ def test_match_os_any():
- and:
- os: any
- string: "Goodbye world"
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
_, matches = match(
@@ -799,8 +734,7 @@ def test_match_os_any():
# this test demonstrates the behavior of unstable features that may change before the next major release.
def test_index_features_and_unstable():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -811,8 +745,7 @@ def test_index_features_and_unstable():
- and:
- mnemonic: mov
- api: CreateFileW
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
rr = capa.rules.RuleSet([r])
index: capa.rules.RuleSet._RuleFeatureIndex = rr._feature_indexes_by_scopes[capa.rules.Scope.FUNCTION]
@@ -828,8 +761,7 @@ def test_index_features_and_unstable():
# this test demonstrates the behavior of unstable features that may change before the next major release.
def test_index_features_or_unstable():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -840,8 +772,7 @@ def test_index_features_or_unstable():
- or:
- mnemonic: mov
- api: CreateFileW
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
rr = capa.rules.RuleSet([r])
index: capa.rules.RuleSet._RuleFeatureIndex = rr._feature_indexes_by_scopes[capa.rules.Scope.FUNCTION]
@@ -858,8 +789,7 @@ def test_index_features_or_unstable():
# this test demonstrates the behavior of unstable features that may change before the next major release.
def test_index_features_nested_unstable():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -872,8 +802,7 @@ def test_index_features_nested_unstable():
- or:
- api: CreateFileW
- string: foo
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
rr = capa.rules.RuleSet([r])
index: capa.rules.RuleSet._RuleFeatureIndex = rr._feature_indexes_by_scopes[capa.rules.Scope.FUNCTION]

View File

@@ -25,8 +25,7 @@ from capa.features.common import Arch, Substring
def test_optimizer_order():
rule = textwrap.dedent(
"""
rule = textwrap.dedent("""
rule:
meta:
name: test rule
@@ -44,8 +43,7 @@ def test_optimizer_order():
- or:
- number: 1
- offset: 4
"""
)
""")
r = capa.rules.Rule.from_yaml(rule)
# before optimization

Some files were not shown because too many files have changed in this diff Show More