Files
capa/tests/test_match.py
Willi Ballenthin b068890fa6 rules: match: optimize rule matching by better indexing rule by features
Implement the "tighten rule pre-selection" algorithm described here:
https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720

In summary:

> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).
2024-06-07 05:54:49 +02:00

26 KiB