24 Commits

Author SHA1 Message Date
Mike Hunhoff
ed7e0cd77d lint: replace black/isort/flake8 with ruff (#2992)
* lint: replace isort/flake8 with ruff

* update ruff links

* remove stale isort reference

* update CHANGELOG

* address review

* remove unused imports

* remove unnecessary list comprehension

* remove quotes from type annotation

* use dict.get instead of if-else block

* remove unnecessary utf-8 encoding declaration

* Revert "remove unused imports"

This reverts commit 18ba50a22b.

* skip check for unused imports

* fix UP036 Version block is outdated for minimum Python version

* add TODO comment for unused imports

* replace black with ruff

* address review comments
2026-04-07 12:10:41 -06:00
Mike Hunhoff
a6ac839eea fix mypy formatting (#2973) 2026-03-27 10:54:28 -06:00
dependabot[bot]
4ba1b5d233 build(deps): bump bump-my-version from 1.2.4 to 1.3.0 (#2963)
* build(deps): bump bump-my-version from 1.2.4 to 1.3.0

Bumps [bump-my-version](https://github.com/callowayproject/bump-my-version) from 1.2.4 to 1.3.0.
- [Release notes](https://github.com/callowayproject/bump-my-version/releases)
- [Changelog](https://github.com/callowayproject/bump-my-version/blob/master/CHANGELOG.md)
- [Commits](https://github.com/callowayproject/bump-my-version/compare/1.2.4...v1.3)

---
updated-dependencies:
- dependency-name: bump-my-version
  dependency-version: 1.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* style: auto-format with black and isort

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-26 15:30:46 -06:00
devs6186
c930891c21 rules: address code review feedback for bytes prefix index
- remove bytes_rules from _RuleFeatureIndex; bytes_prefix_index is the
  only structure needed for candidate selection
- build bytes_prefix_index directly in _index_rules_by_feature() instead
  of building bytes_rules then converting, removing one full pass
- add if -1 in bytes_prefix_index guard to avoid temporary object
  creation for the short-pattern fallback (almost never taken)
- remove assert isinstance(feature.value, bytes) checks in _match();
  add Bytes.value: bytes class-level annotation so mypy narrows the
  type without the runtime check
- remove cache structure compatibility block from cache.py per reviewer
  request to handle in a separate PR
- update test assertions from bytes_rules to bytes_prefix_index
2026-03-20 21:37:04 +01:00
devs6186
f572c01d10 rules: clarify bytes_prefix_index guard and add mixed-pattern test
- Change _match() guard from bytes_rules to bytes_prefix_index
  so the guard references the field actually used for candidate selection.
- Update stale comment to describe the prefix-bucket strategy.
- Clarify bytes_rules dataclass comment (retained for logging only).
- Add test_bytes_prefix_index_mixed_short_and_long_patterns covering
  rules with both short (<4B) and long (>=4B) patterns exercised together.
2026-03-20 21:37:04 +01:00
devs6186
b868be55b8 rules: simplify bytes prefix indexing and add collision tests 2026-03-20 21:37:04 +01:00
devs6186
ed256d2416 rules: index extracted bytes by length prefix for O(1) candidate selection
Instead of iterating all extracted Bytes features for every bytes-based rule,
build a prefix index keyed by fixed bucket sizes (4, 8, 16, 32, 64, 128, 256)
once per scope evaluation.  Each bytes pattern is looked up in the largest
bucket that fits its length, then only candidates sharing that prefix are
compared, replacing the previous O(n) linear scan with an O(1) hash lookup.
Patterns shorter than the minimum bucket still fall back to the full scan.
Adds a test to verify correctness for exact match, startswith match, mismatch,
and short-bytes cases.

Closes: https://github.com/mandiant/capa/issues/2128
2026-03-20 21:37:04 +01:00
dependabot[bot]
7b23834d8e build(deps-dev): bump black from 25.12.0 to 26.3.0 (#2902)
* build(deps-dev): bump black from 25.12.0 to 26.3.0

Bumps [black](https://github.com/psf/black) from 25.12.0 to 26.3.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/25.12.0...26.3.0)

---
updated-dependencies:
- dependency-name: black
  dependency-version: 26.3.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* style: auto-format with black and isort

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Capa Bot <capa-dev@mandiant.com>
2026-03-13 15:46:13 +01:00
Aditya Pandey
038c46da16 features: fix Regex.get_value_str() returning escaped pattern, breaking capa2yara #1909 (#2886)
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2026-03-05 12:14:27 +01:00
Ana Maria Martinez Gomez
3cd97ae9f2 [copyright + license] Fix headers
Replace the header from source code files using the following script:
```Python
for dir_path, dir_names, file_names in os.walk("capa"):
    for file_name in file_names:
        # header are only in `.py` and `.toml` files
        if file_name[-3:] not in (".py", "oml"):
            continue
        file_path = f"{dir_path}/{file_name}"
        f = open(file_path, "rb+")
        content = f.read()
        m = re.search(OLD_HEADER, content)
        if not m:
            continue
        print(f"{file_path}: {m.group('year')}")
        content = content.replace(m.group(0), NEW_HEADER % m.group("year"))
        f.seek(0)
        f.write(content)
```

Some files had the copyright headers inside a `"""` comment and needed
manual changes before applying the script. `hook-vivisect.py` and
`pyinstaller.spec` didn't include the license in the header and also
needed manual changes.

The old header had the confusing sentence `All rights reserved`, which
does not make sense for an open source license. Replace the header by
the default Google header that corrects this issue and keep capa
consistent with other Google projects.

Adapt the linter to work with the new header.

Replace also the copyright text in the `web/public/index.html` file for
consistency.
2025-01-15 08:52:42 -07:00
Willi Ballenthin
b068890fa6 rules: match: optimize rule matching by better indexing rule by features
Implement the "tighten rule pre-selection" algorithm described here:
https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720

In summary:

> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).
2024-06-07 05:54:49 +02:00
N0stalgikow
0eb4291b25 Updating copyright across all files based on when it was first introduced. (#2027)
* updating copyright, back to the date of origin of file

* updating regex to account for linter violation
2024-03-13 14:04:53 +01:00
Yacine Elhamer
462024ad03 update tests to explicitely specify scopes 2023-08-01 07:41:47 +01:00
Willi Ballenthin
c86ab51210 fix copyright headers everywhere 2023-07-13 05:03:33 +02:00
Willi Ballenthin
9441da4887 isort 2023-07-06 17:50:34 +02:00
Willi Ballenthin
47074fd129 fix ruff issues 2023-07-06 17:49:40 +02:00
Harsh Mehta
74009eb4a4 Updated Copyright (#1383)
* Updated Copyright
2023-03-14 17:58:43 +01:00
Mike Hunhoff
a07ca443f0 update OS to match OS_ANY for all supported OSes (#1324) 2023-02-24 07:51:40 -07:00
Willi Ballenthin
b819033da0 lots of mypy 2022-12-14 10:37:39 +01:00
Mike Hunhoff
3c1cd67f60 dotnet: support property feature extraction (#1168) 2022-09-09 12:09:41 -06:00
Willi Ballenthin
9da4ff10da *: rename OperandImmediate to OperandNumber 2022-03-31 10:37:06 -06:00
Willi Ballenthin
c7aadca25c tests: demonstrate OperandOffset and OperandImmediate 2022-03-30 13:13:50 -06:00
William Ballenthin
2d68fb2536 pep8 2021-11-10 12:51:27 -07:00
William Ballenthin
845df282ef tests: split out match tests and validate alternative algorithms 2021-11-10 12:44:58 -07:00