capa/tests/test_match.py at 585dff8b481268fb67608dd9d6682b0f2b7be901

mirror of https://github.com/mandiant/capa.git synced 2026-01-06 17:53:59 -08:00

Files

Willi Ballenthin b068890fa6 rules: match: optimize rule matching by better indexing rule by features

Implement the "tighten rule pre-selection" algorithm described here:
https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720

In summary:

> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).

2024-06-07 05:54:49 +02:00

26 KiB

Raw Blame History

View Raw

26 KiB Raw Blame History

26 KiB

Raw Blame History