Commit Graph

15 Commits

Author SHA1 Message Date
Ana Maria Martinez Gomez
3cd97ae9f2 [copyright + license] Fix headers
Replace the header from source code files using the following script:
```Python
for dir_path, dir_names, file_names in os.walk("capa"):
    for file_name in file_names:
        # header are only in `.py` and `.toml` files
        if file_name[-3:] not in (".py", "oml"):
            continue
        file_path = f"{dir_path}/{file_name}"
        f = open(file_path, "rb+")
        content = f.read()
        m = re.search(OLD_HEADER, content)
        if not m:
            continue
        print(f"{file_path}: {m.group('year')}")
        content = content.replace(m.group(0), NEW_HEADER % m.group("year"))
        f.seek(0)
        f.write(content)
```

Some files had the copyright headers inside a `"""` comment and needed
manual changes before applying the script. `hook-vivisect.py` and
`pyinstaller.spec` didn't include the license in the header and also
needed manual changes.

The old header had the confusing sentence `All rights reserved`, which
does not make sense for an open source license. Replace the header by
the default Google header that corrects this issue and keep capa
consistent with other Google projects.

Adapt the linter to work with the new header.

Replace also the copyright text in the `web/public/index.html` file for
consistency.
2025-01-15 08:52:42 -07:00
Willi Ballenthin
b068890fa6 rules: match: optimize rule matching by better indexing rule by features
Implement the "tighten rule pre-selection" algorithm described here:
https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720

In summary:

> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).
2024-06-07 05:54:49 +02:00
N0stalgikow
0eb4291b25 Updating copyright across all files based on when it was first introduced. (#2027)
* updating copyright, back to the date of origin of file

* updating regex to account for linter violation
2024-03-13 14:04:53 +01:00
Yacine Elhamer
462024ad03 update tests to explicitely specify scopes 2023-08-01 07:41:47 +01:00
Willi Ballenthin
c86ab51210 fix copyright headers everywhere 2023-07-13 05:03:33 +02:00
Willi Ballenthin
9441da4887 isort 2023-07-06 17:50:34 +02:00
Willi Ballenthin
47074fd129 fix ruff issues 2023-07-06 17:49:40 +02:00
Harsh Mehta
74009eb4a4 Updated Copyright (#1383)
* Updated Copyright
2023-03-14 17:58:43 +01:00
Mike Hunhoff
a07ca443f0 update OS to match OS_ANY for all supported OSes (#1324) 2023-02-24 07:51:40 -07:00
Willi Ballenthin
b819033da0 lots of mypy 2022-12-14 10:37:39 +01:00
Mike Hunhoff
3c1cd67f60 dotnet: support property feature extraction (#1168) 2022-09-09 12:09:41 -06:00
Willi Ballenthin
9da4ff10da *: rename OperandImmediate to OperandNumber 2022-03-31 10:37:06 -06:00
Willi Ballenthin
c7aadca25c tests: demonstrate OperandOffset and OperandImmediate 2022-03-30 13:13:50 -06:00
William Ballenthin
2d68fb2536 pep8 2021-11-10 12:51:27 -07:00
William Ballenthin
845df282ef tests: split out match tests and validate alternative algorithms 2021-11-10 12:44:58 -07:00