Merge pull request #863 from mandiant/v3.1.0

version: v3.1.0
changelog: add additional contributor
2025-12-19 02:32:30 -08:00 · 2022-01-12 14:18:22 -07:00 · 2022-01-11 14:29:15 -07:00 · 2022-01-11 14:28:17 -07:00 · 2022-01-11 14:27:59 -07:00 · 2022-01-11 10:05:40 -07:00
30 changed files with 1497 additions and 599 deletions
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -30,7 +30,7 @@ jobs:
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
-        python-version: 3.8
+        python-version: "3.8"
    - name: Install dependencies
      run: pip install -e .[dev]
    - name: Lint with isort
@@ -50,7 +50,7 @@ jobs:
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
-        python-version: 3.8
+        python-version: "3.8"
    - name: Install capa
      run: pip install -e .
    - name: Run rule linter
@@ -65,13 +65,15 @@ jobs:
      matrix:
        os: [ubuntu-20.04, windows-2019, macos-10.15]
        # across all operating systems
-        python-version: [3.6, 3.9]
+        python-version: ["3.6", "3.10"]
        include:
          # on Ubuntu run these as well
          - os: ubuntu-20.04
-            python-version: 3.7
+            python-version: "3.7"
          - os: ubuntu-20.04
-            python-version: 3.8
+            python-version: "3.8"
+          - os: ubuntu-20.04
+            python-version: "3.9"
    steps:
    - name: Checkout capa with submodules
      uses: actions/checkout@v2
--- a/.gitignore
+++ b/.gitignore
@@ -115,3 +115,6 @@ isort-output.log
 black-output.log
 rule-linter-output.log
 .vscode
+scripts/perf/*.txt
+scripts/perf/*.svg
+scripts/perf/*.zip
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,8 +17,81 @@
 ### Development

 ### Raw diffs
- [capa <release>...master](https://github.com/mandiant/capa/compare/v3.0.3...master)
- [capa-rules <release>...master](https://github.com/mandiant/capa-rules/compare/v3.0.3...master)
+- [capa v3.1.0...master](https://github.com/mandiant/capa/compare/v3.1.0...master)
+- [capa-rules v3.1.0...master](https://github.com/mandiant/capa-rules/compare/v3.1.0...master)
+
+## v3.1.0 (2022-01-10)
+This release improves the performance of capa while also adding 23 new rules and many code quality enhancements. We profiled capa's CPU usage and optimized the way that it matches rules, such as by short circuiting when appropriate. According to our testing, the matching phase is approximately 66% faster than v3.0.3! We also added support for Python 3.10, aarch64 builds, and additional MAEC metadata in the rule headers.
+  
+This release adds 23 new rules, including nine by Jakub Jozwiak of Mandiant. @ryantxu1 and @dzbeck updated the ATT&CK and MBC mappings for many rules. Thank you!
+  
+And as always, welcome first time contributors!
+
+  - @kn0wl3dge
+  - @jtothej
+  - @cl30
+  
+
+### New Features
+
+- engine: short circuit logic nodes for better performance #824 @williballenthin
+- engine: add optimizer the order faster nodes first #829 @williballenthin
+- engine: optimize rule evaluation by skipping rules that can't match #830 @williballenthin
+- support python 3.10 #816 @williballenthin
+- support aarch64 #683 @williballenthin
+- rules: support maec/malware-family meta #841 @mr-tz
+- engine: better type annotations/exhaustiveness checking #839 @cl30
+
+### Breaking Changes: None
+
+### New Rules (23)
+
+- nursery/delete-windows-backup-catalog michael.hunhoff@mandiant.com
+- nursery/disable-automatic-windows-recovery-features michael.hunhoff@mandiant.com
+- nursery/capture-webcam-video @johnk3r
+- nursery/create-registry-key-via-stdregprov michael.hunhoff@mandiant.com
+- nursery/delete-registry-key-via-stdregprov michael.hunhoff@mandiant.com
+- nursery/delete-registry-value-via-stdregprov michael.hunhoff@mandiant.com
+- nursery/query-or-enumerate-registry-key-via-stdregprov michael.hunhoff@mandiant.com
+- nursery/query-or-enumerate-registry-value-via-stdregprov michael.hunhoff@mandiant.com
+- nursery/set-registry-value-via-stdregprov michael.hunhoff@mandiant.com
+- data-manipulation/compression/decompress-data-using-ucl jakub.jozwiak@mandiant.com
+- linking/static/wolfcrypt/linked-against-wolfcrypt jakub.jozwiak@mandiant.com
+- linking/static/wolfssl/linked-against-wolfssl jakub.jozwiak@mandiant.com
+- anti-analysis/packer/pespin/packed-with-pespin jakub.jozwiak@mandiant.com
+- load-code/shellcode/execute-shellcode-via-windows-fibers jakub.jozwiak@mandiant.com
+- load-code/shellcode/execute-shellcode-via-enumuilanguages jakub.jozwiak@mandiant.com
+- anti-analysis/packer/themida/packed-with-themida william.ballenthin@mandiant.com
+- load-code/shellcode/execute-shellcode-via-createthreadpoolwait jakub.jozwiak@mandiant.com
+- host-interaction/process/inject/inject-shellcode-using-a-file-mapping-object jakub.jozwiak@mandiant.com
+- load-code/shellcode/execute-shellcode-via-copyfile2 jakub.jozwiak@mandiant.com
+- malware-family/plugx/match-known-plugx-module still@teamt5.org
+
+### Rule Changes
+
+  - update ATT&CK mappings by @ryantxu1
+  - update ATT&CK and MBC mappings by @dzbeck
+  - aplib detection by @cdong1012
+  - golang runtime detection by @stevemk14eber
+
+### Bug Fixes
+
+- fix circular import error #825 @williballenthin
+- fix smda negative number extraction #430 @kn0wl3dge
+
+### capa explorer IDA Pro plugin
+
+- pin supported versions to >= 7.4 and < 8.0 #849 @mike-hunhoff
+
+### Development
+
+- add profiling infrastructure #828 @williballenthin
+- linter: detect shellcode extension #820 @mr-tz
+- show features script: add backend flag #430 @kn0wl3dge
+
+### Raw diffs
+- [capa v3.0.3...v3.1.0](https://github.com/mandiant/capa/compare/v3.0.3...v3.1.0)
+- [capa-rules v3.0.3...v3.1.0](https://github.com/mandiant/capa-rules/compare/v3.0.3...v3.1.0)


 ## v3.0.3 (2021-10-27)
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@

 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/flare-capa)](https://pypi.org/project/flare-capa)
 [![Last release](https://img.shields.io/github/v/release/mandiant/capa)](https://github.com/mandiant/capa/releases)
-[![Number of rules](https://img.shields.io/badge/rules-639-blue.svg)](https://github.com/mandiant/capa-rules)
+[![Number of rules](https://img.shields.io/badge/rules-658-blue.svg)](https://github.com/mandiant/capa-rules)
 [![CI status](https://github.com/mandiant/capa/workflows/CI/badge.svg)](https://github.com/mandiant/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
 [![Downloads](https://img.shields.io/github/downloads/mandiant/capa/total)](https://github.com/mandiant/capa/releases)
 [![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)
--- a/capa/engine.py
+++ b/capa/engine.py
@@ -8,11 +8,16 @@

 import copy
 import collections
-from typing import Set, Dict, List, Tuple, Union, Mapping, Iterable
+from typing import TYPE_CHECKING, Set, Dict, List, Tuple, Mapping, Iterable

-import capa.rules
+import capa.perf
 import capa.features.common
-from capa.features.common import Feature
+from capa.features.common import Result, Feature
+
+if TYPE_CHECKING:
+    # circular import, otherwise
+    import capa.rules
+

 # a collection of features and the locations at which they are found.
 #
@@ -45,15 +50,12 @@ class Statement:
    def __repr__(self):
        return str(self)

-    def evaluate(self, features: FeatureSet) -> "Result":
+    def evaluate(self, features: FeatureSet, short_circuit=True) -> Result:
        """
        classes that inherit `Statement` must implement `evaluate`

        args:
-          ctx (defaultdict[Feature, set[VA]])
-
-        returns:
-          Result
+            short_circuit (bool): if true, then statements like and/or/some may short circuit.
        """
        raise NotImplementedError()

@@ -77,70 +79,70 @@ class Statement:
                    children[i] = new


-class Result:
-    """
-    represents the results of an evaluation of statements against features.
-
-    instances of this class should behave like a bool,
-    e.g. `assert Result(True, ...) == True`
-
-    instances track additional metadata about evaluation results.
-    they contain references to the statement node (e.g. an And statement),
-     as well as the children Result instances.
-
-    we need this so that we can render the tree of expressions and their results.
-    """
-
-    def __init__(self, success: bool, statement: Union[Statement, Feature], children: List["Result"], locations=None):
-        """
-        args:
-          success (bool)
-          statement (capa.engine.Statement or capa.features.Feature)
-          children (list[Result])
-          locations (iterable[VA])
-        """
-        super(Result, self).__init__()
-        self.success = success
-        self.statement = statement
-        self.children = children
-        self.locations = locations if locations is not None else ()
-
-    def __eq__(self, other):
-        if isinstance(other, bool):
-            return self.success == other
-        return False
-
-    def __bool__(self):
-        return self.success
-
-    def __nonzero__(self):
-        return self.success
-
-
 class And(Statement):
-    """match if all of the children evaluate to True."""
+    """
+    match if all of the children evaluate to True.
+
+    the order of evaluation is dictated by the property
+    `And.children` (type: List[Statement|Feature]).
+    a query optimizer may safely manipulate the order of these children.
+    """

    def __init__(self, children, description=None):
        super(And, self).__init__(description=description)
        self.children = children

-    def evaluate(self, ctx):
-        results = [child.evaluate(ctx) for child in self.children]
-        success = all(results)
-        return Result(success, self, results)
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.and"] += 1
+
+        if short_circuit:
+            results = []
+            for child in self.children:
+                result = child.evaluate(ctx, short_circuit=short_circuit)
+                results.append(result)
+                if not result:
+                    # short circuit
+                    return Result(False, self, results)
+
+            return Result(True, self, results)
+        else:
+            results = [child.evaluate(ctx, short_circuit=short_circuit) for child in self.children]
+            success = all(results)
+            return Result(success, self, results)


 class Or(Statement):
-    """match if any of the children evaluate to True."""
+    """
+    match if any of the children evaluate to True.
+
+    the order of evaluation is dictated by the property
+    `Or.children` (type: List[Statement|Feature]).
+    a query optimizer may safely manipulate the order of these children.
+    """

    def __init__(self, children, description=None):
        super(Or, self).__init__(description=description)
        self.children = children

-    def evaluate(self, ctx):
-        results = [child.evaluate(ctx) for child in self.children]
-        success = any(results)
-        return Result(success, self, results)
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.or"] += 1
+
+        if short_circuit:
+            results = []
+            for child in self.children:
+                result = child.evaluate(ctx, short_circuit=short_circuit)
+                results.append(result)
+                if result:
+                    # short circuit as soon as we hit one match
+                    return Result(True, self, results)
+
+            return Result(False, self, results)
+        else:
+            results = [child.evaluate(ctx, short_circuit=short_circuit) for child in self.children]
+            success = any(results)
+            return Result(success, self, results)


 class Not(Statement):
@@ -150,28 +152,55 @@ class Not(Statement):
        super(Not, self).__init__(description=description)
        self.child = child

-    def evaluate(self, ctx):
-        results = [self.child.evaluate(ctx)]
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.not"] += 1
+
+        results = [self.child.evaluate(ctx, short_circuit=short_circuit)]
        success = not results[0]
        return Result(success, self, results)


 class Some(Statement):
-    """match if at least N of the children evaluate to True."""
+    """
+    match if at least N of the children evaluate to True.
+
+    the order of evaluation is dictated by the property
+    `Some.children` (type: List[Statement|Feature]).
+    a query optimizer may safely manipulate the order of these children.
+    """

    def __init__(self, count, children, description=None):
        super(Some, self).__init__(description=description)
        self.count = count
        self.children = children

-    def evaluate(self, ctx):
-        results = [child.evaluate(ctx) for child in self.children]
-        # note that here we cast the child result as a bool
-        # because we've overridden `__bool__` above.
-        #
-        # we can't use `if child is True` because the instance is not True.
-        success = sum([1 for child in results if bool(child) is True]) >= self.count
-        return Result(success, self, results)
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.some"] += 1
+
+        if short_circuit:
+            results = []
+            satisfied_children_count = 0
+            for child in self.children:
+                result = child.evaluate(ctx, short_circuit=short_circuit)
+                results.append(result)
+                if result:
+                    satisfied_children_count += 1
+
+                if satisfied_children_count >= self.count:
+                    # short circuit as soon as we hit the threshold
+                    return Result(True, self, results)
+
+            return Result(False, self, results)
+        else:
+            results = [child.evaluate(ctx, short_circuit=short_circuit) for child in self.children]
+            # note that here we cast the child result as a bool
+            # because we've overridden `__bool__` above.
+            #
+            # we can't use `if child is True` because the instance is not True.
+            success = sum([1 for child in results if bool(child) is True]) >= self.count
+            return Result(success, self, results)


 class Range(Statement):
@@ -183,7 +212,10 @@ class Range(Statement):
        self.min = min if min is not None else 0
        self.max = max if max is not None else (1 << 64 - 1)

-    def evaluate(self, ctx):
+    def evaluate(self, ctx, **kwargs):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.range"] += 1
+
        count = len(ctx.get(self.child, []))
        if self.min == 0 and count == 0:
            return Result(True, self, [])
@@ -208,7 +240,7 @@ class Subscope(Statement):
        self.scope = scope
        self.child = child

-    def evaluate(self, ctx):
+    def evaluate(self, ctx, **kwargs):
        raise ValueError("cannot evaluate a subscope directly!")


@@ -247,15 +279,20 @@ def index_rule_matches(features: FeatureSet, rule: "capa.rules.Rule", locations:

 def match(rules: List["capa.rules.Rule"], features: FeatureSet, va: int) -> Tuple[FeatureSet, MatchResults]:
    """
-    Args:
-      rules (List[capa.rules.Rule]): these must already be ordered topologically by dependency.
-      features (Mapping[capa.features.Feature, int]):
-      va (int): location of the features
+    match the given rules against the given features,
+    returning an updated set of features and the matches.

-    Returns:
-      Tuple[FeatureSet, MatchResults]: two-tuple with entries:
-        - set of features used for matching (which may be a superset of the given `features` argument, due to rule match features), and
-        - mapping from rule name to [(location of match, result object)]
+    the updated features are just like the input,
+    but extended to include the match features (e.g. names of rules that matched).
+    the given feature set is not modified; an updated copy is returned.
+
+    the given list of rules must be ordered topologically by dependency,
+    or else `match` statements will not be handled correctly.
+
+    this routine should be fairly optimized, but is not guaranteed to be the fastest matcher possible.
+    it has a particularly convenient signature: (rules, features) -> matches
+    other strategies can be imagined that match differently; implement these elsewhere.
+    specifically, this routine does "top down" matching of the given rules against the feature set.
    """
    results = collections.defaultdict(list)  # type: MatchResults

@@ -266,8 +303,18 @@ def match(rules: List["capa.rules.Rule"], features: FeatureSet, va: int) -> Tupl
    features = collections.defaultdict(set, copy.copy(features))

    for rule in rules:
-        res = rule.evaluate(features)
+        res = rule.evaluate(features, short_circuit=True)
        if res:
+            # we first matched the rule with short circuiting enabled.
+            # this is much faster than without short circuiting.
+            # however, we want to collect all results thoroughly,
+            # so once we've found a match quickly,
+            # go back and capture results without short circuiting.
+            res = rule.evaluate(features, short_circuit=False)
+
+            # sanity check
+            assert bool(res) is True
+
            results[rule.name].append((va, res))
            # we need to update the current `features`
            # because subsequent iterations of this loop may use newly added features,
--- a/capa/features/common.py
+++ b/capa/features/common.py
@@ -10,9 +10,13 @@ import re
 import codecs
 import logging
 import collections
-from typing import Set, Dict, Union
+from typing import TYPE_CHECKING, Set, Dict, List, Union

-import capa.engine
+if TYPE_CHECKING:
+    # circular import, otherwise
+    import capa.engine
+
+import capa.perf
 import capa.features
 import capa.features.extractors.elf

@@ -46,6 +50,52 @@ def escape_string(s: str) -> str:
    return s


+class Result:
+    """
+    represents the results of an evaluation of statements against features.
+
+    instances of this class should behave like a bool,
+    e.g. `assert Result(True, ...) == True`
+
+    instances track additional metadata about evaluation results.
+    they contain references to the statement node (e.g. an And statement),
+     as well as the children Result instances.
+
+    we need this so that we can render the tree of expressions and their results.
+    """
+
+    def __init__(
+        self,
+        success: bool,
+        statement: Union["capa.engine.Statement", "Feature"],
+        children: List["Result"],
+        locations=None,
+    ):
+        """
+        args:
+          success (bool)
+          statement (capa.engine.Statement or capa.features.Feature)
+          children (list[Result])
+          locations (iterable[VA])
+        """
+        super(Result, self).__init__()
+        self.success = success
+        self.statement = statement
+        self.children = children
+        self.locations = locations if locations is not None else ()
+
+    def __eq__(self, other):
+        if isinstance(other, bool):
+            return self.success == other
+        return False
+
+    def __bool__(self):
+        return self.success
+
+    def __nonzero__(self):
+        return self.success
+
+
 class Feature:
    def __init__(self, value: Union[str, int, bytes], bitness=None, description=None):
        """
@@ -96,8 +146,10 @@ class Feature:
    def __repr__(self):
        return str(self)

-    def evaluate(self, ctx: Dict["Feature", Set[int]]) -> "capa.engine.Result":
-        return capa.engine.Result(self in ctx, self, [], locations=ctx.get(self, []))
+    def evaluate(self, ctx: Dict["Feature", Set[int]], **kwargs) -> Result:
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature." + self.name] += 1
+        return Result(self in ctx, self, [], locations=ctx.get(self, []))

    def freeze_serialize(self):
        if self.bitness is not None:
@@ -140,7 +192,10 @@ class Substring(String):
        super(Substring, self).__init__(value, description=description)
        self.value = value

-    def evaluate(self, ctx):
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.substring"] += 1
+
        # mapping from string value to list of locations.
        # will unique the locations later on.
        matches = collections.defaultdict(list)
@@ -155,6 +210,10 @@ class Substring(String):

            if self.value in feature.value:
                matches[feature.value].extend(locations)
+                if short_circuit:
+                    # we found one matching string, thats sufficient to match.
+                    # don't collect other matching strings in this mode.
+                    break

        if matches:
            # finalize: defaultdict -> dict
@@ -170,9 +229,9 @@ class Substring(String):
            # unlike other features, we cannot return put a reference to `self` directly in a `Result`.
            # this is because `self` may match on many strings, so we can't stuff the matched value into it.
            # instead, return a new instance that has a reference to both the substring and the matched values.
-            return capa.engine.Result(True, _MatchedSubstring(self, matches), [], locations=locations)
+            return Result(True, _MatchedSubstring(self, matches), [], locations=locations)
        else:
-            return capa.engine.Result(False, _MatchedSubstring(self, None), [])
+            return Result(False, _MatchedSubstring(self, None), [])

    def __str__(self):
        return "substring(%s)" % self.value
@@ -225,7 +284,10 @@ class Regex(String):
                "invalid regular expression: %s it should use Python syntax, try it at https://pythex.org" % value
            )

-    def evaluate(self, ctx):
+    def evaluate(self, ctx, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.regex"] += 1
+
        # mapping from string value to list of locations.
        # will unique the locations later on.
        matches = collections.defaultdict(list)
@@ -244,6 +306,10 @@ class Regex(String):
            # so that they don't have to prefix/suffix their terms like: /.*foo.*/.
            if self.re.search(feature.value):
                matches[feature.value].extend(locations)
+                if short_circuit:
+                    # we found one matching string, thats sufficient to match.
+                    # don't collect other matching strings in this mode.
+                    break

        if matches:
            # finalize: defaultdict -> dict
@@ -260,9 +326,9 @@ class Regex(String):
            # this is because `self` may match on many strings, so we can't stuff the matched value into it.
            # instead, return a new instance that has a reference to both the regex and the matched values.
            # see #262.
-            return capa.engine.Result(True, _MatchedRegex(self, matches), [], locations=locations)
+            return Result(True, _MatchedRegex(self, matches), [], locations=locations)
        else:
-            return capa.engine.Result(False, _MatchedRegex(self, None), [])
+            return Result(False, _MatchedRegex(self, None), [])

    def __str__(self):
        return "regex(string =~ %s)" % self.value
@@ -308,15 +374,18 @@ class Bytes(Feature):
        super(Bytes, self).__init__(value, description=description)
        self.value = value

-    def evaluate(self, ctx):
+    def evaluate(self, ctx, **kwargs):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.bytes"] += 1
+
        for feature, locations in ctx.items():
            if not isinstance(feature, (Bytes,)):
                continue

            if feature.value.startswith(self.value):
-                return capa.engine.Result(True, self, [], locations=locations)
+                return Result(True, self, [], locations=locations)

-        return capa.engine.Result(False, self, [])
+        return Result(False, self, [])

    def get_value_str(self):
        return hex_string(bytes_to_str(self.value))
--- a/capa/features/extractors/ida/global_.py
+++ b/capa/features/extractors/ida/global_.py
@@ -40,11 +40,11 @@ def extract_os():

 def extract_arch():
    info = idaapi.get_inf_structure()
-    if info.procName == "metapc" and info.is_64bit():
+    if info.procname == "metapc" and info.is_64bit():
        yield Arch(ARCH_AMD64), 0x0
-    elif info.procName == "metapc" and info.is_32bit():
+    elif info.procname == "metapc" and info.is_32bit():
        yield Arch(ARCH_I386), 0x0
-    elif info.procName == "metapc":
+    elif info.procname == "metapc":
        logger.debug("unsupported architecture: non-32-bit nor non-64-bit intel")
        return
    else:
@@ -52,5 +52,5 @@ def extract_arch():
        #  1. handling a new architecture (e.g. aarch64)
        #
        # for (1), this logic will need to be updated as the format is implemented.
-        logger.debug("unsupported architecture: %s", info.procName)
+        logger.debug("unsupported architecture: %s", info.procname)
        return
--- a/capa/features/extractors/smda/insn.py
+++ b/capa/features/extractors/smda/insn.py
@@ -84,8 +84,12 @@ def extract_insn_number_features(f, bb, insn):
        return
    for operand in operands:
        try:
-            yield Number(int(operand, 16)), insn.offset
-            yield Number(int(operand, 16), bitness=get_bitness(f.smda_report)), insn.offset
+            # The result of bitwise operations is calculated as though carried out
+            # in two’s complement with an infinite number of sign bits
+            value = int(operand, 16) & ((1 << f.smda_report.bitness) - 1)
+
+            yield Number(value), insn.offset
+            yield Number(value, bitness=get_bitness(f.smda_report)), insn.offset
        except:
            continue

--- a/capa/features/freeze.py
+++ b/capa/features/freeze.py
@@ -8,12 +8,7 @@ json format:
      'base address': int(base address),
      'functions': {
        int(function va): {
-          'basic blocks': {
-            int(basic block va): {
-              'instructions': [instruction va, ...]
-            },
-            ...
-          },
+          int(basic block va): [int(instruction va), ...]
          ...
        },
        ...
--- a/capa/helpers.py
+++ b/capa/helpers.py
@@ -7,6 +7,7 @@
 # See the License for the specific language governing permissions and limitations under the License.

 import os
+from typing import NoReturn

 _hex = hex

@@ -30,3 +31,7 @@ def is_runtime_ida():
        return False
    else:
        return True
+
+
+def assert_never(value: NoReturn) -> NoReturn:
+    assert False, f"Unhandled value: {value} ({type(value).__name__})"
--- a/capa/ida/helpers.py
+++ b/capa/ida/helpers.py
@@ -21,13 +21,6 @@ import capa.features.common

 logger = logging.getLogger("capa")

-# IDA version as returned by idaapi.get_kernel_version()
-SUPPORTED_IDA_VERSIONS = (
-    "7.4",
-    "7.5",
-    "7.6",
-)
-
 # file type as returned by idainfo.file_type
 SUPPORTED_FILE_TYPES = (
    idaapi.f_PE,
@@ -45,13 +38,11 @@ def inform_user_ida_ui(message):


 def is_supported_ida_version():
-    version = idaapi.get_kernel_version()
-    if version not in SUPPORTED_IDA_VERSIONS:
+    version = float(idaapi.get_kernel_version())
+    if version < 7.4 or version >= 8:
        warning_msg = "This plugin does not support your IDA Pro version"
        logger.warning(warning_msg)
-        logger.warning(
-            "Your IDA Pro version is: %s. Supported versions are: %s." % (version, ", ".join(SUPPORTED_IDA_VERSIONS))
-        )
+        logger.warning("Your IDA Pro version is: %s. Supported versions are: IDA >= 7.4 and IDA < 8.0." % version)
        return False
    return True

--- a/capa/ida/plugin/README.md
+++ b/capa/ida/plugin/README.md
@@ -39,6 +39,7 @@ capa explorer supports Python versions >= 3.6.x and the following IDA Pro versio
 * IDA 7.4
 * IDA 7.5
 * IDA 7.6 (caveat below)
+* IDA 7.7

 capa explorer is however limited to the Python versions supported by your IDA installation (which may not include all Python versions >= 3.6.x). Based on our testing the following matrix shows the Python versions supported
 by each supported IDA version:
--- a/capa/main.py
+++ b/capa/main.py
@@ -10,6 +10,7 @@ See the License for the specific language governing permissions and limitations
 """
 import os
 import sys
+import time
 import hashlib
 import logging
 import os.path
@@ -17,6 +18,7 @@ import argparse
 import datetime
 import textwrap
 import itertools
+import contextlib
 import collections
 from typing import Any, Dict, List, Tuple

@@ -26,6 +28,7 @@ import colorama
 from pefile import PEFormatError
 from elftools.common.exceptions import ELFError

+import capa.perf
 import capa.rules
 import capa.engine
 import capa.version
@@ -39,7 +42,7 @@ import capa.features.extractors
 import capa.features.extractors.common
 import capa.features.extractors.pefile
 import capa.features.extractors.elffile
-from capa.rules import Rule, RuleSet
+from capa.rules import Rule, Scope, RuleSet
 from capa.engine import FeatureSet, MatchResults
 from capa.helpers import get_file_taste
 from capa.features.extractors.base_extractor import FunctionHandle, FeatureExtractor
@@ -65,6 +68,14 @@ E_UNSUPPORTED_IDA_VERSION = -19
 logger = logging.getLogger("capa")


+@contextlib.contextmanager
+def timing(msg: str):
+    t0 = time.time()
+    yield
+    t1 = time.time()
+    logger.debug("perf: %s: %0.2fs", msg, t1 - t0)
+
+
 def set_vivisect_log_level(level):
    logging.getLogger("vivisect").setLevel(level)
    logging.getLogger("vivisect.base").setLevel(level)
@@ -103,7 +114,7 @@ def find_function_capabilities(ruleset: RuleSet, extractor: FeatureExtractor, f:
                bb_features[feature].add(va)
                function_features[feature].add(va)

-        _, matches = capa.engine.match(ruleset.basic_block_rules, bb_features, int(bb))
+        _, matches = ruleset.match(Scope.BASIC_BLOCK, bb_features, int(bb))

        for rule_name, res in matches.items():
            bb_matches[rule_name].extend(res)
@@ -111,7 +122,7 @@ def find_function_capabilities(ruleset: RuleSet, extractor: FeatureExtractor, f:
            for va, _ in res:
                capa.engine.index_rule_matches(function_features, rule, [va])

-    _, function_matches = capa.engine.match(ruleset.function_rules, function_features, int(f))
+    _, function_matches = ruleset.match(Scope.FUNCTION, function_features, int(f))
    return function_matches, bb_matches, len(function_features)


@@ -132,7 +143,7 @@ def find_file_capabilities(ruleset: RuleSet, extractor: FeatureExtractor, functi

    file_features.update(function_features)

-    _, matches = capa.engine.match(ruleset.file_rules, file_features, 0x0)
+    _, matches = ruleset.match(Scope.FILE, file_features, 0x0)
    return matches, len(file_features)


@@ -892,6 +903,7 @@ def main(argv=None):
    try:
        rules = get_rules(args.rules, disable_progress=args.quiet)
        rules = capa.rules.RuleSet(rules)
+
        logger.debug(
            "successfully loaded %s rules",
            # during the load of the RuleSet, we extract subscope statements into their own rules
--- a/capa/optimizer.py
+++ b/capa/optimizer.py
@@ -0,0 +1,70 @@
+import logging
+
+import capa.engine as ceng
+import capa.features.common
+
+logger = logging.getLogger(__name__)
+
+
+def get_node_cost(node):
+    if isinstance(node, (capa.features.common.OS, capa.features.common.Arch, capa.features.common.Format)):
+        # we assume these are the most restrictive features:
+        # authors commonly use them at the start of rules to restrict the category of samples to inspect
+        return 0
+
+    # elif "everything else":
+    #   return 1
+    #
+    # this should be all hash-lookup features.
+    # see below.
+
+    elif isinstance(node, (capa.features.common.Substring, capa.features.common.Regex, capa.features.common.Bytes)):
+        # substring and regex features require a full scan of each string
+        # which we anticipate is more expensive then a hash lookup feature (e.g. mnemonic or count).
+        #
+        # TODO: compute the average cost of these feature relative to hash feature
+        # and adjust the factor accordingly.
+        return 2
+
+    elif isinstance(node, (ceng.Not, ceng.Range)):
+        # the cost of these nodes are defined by the complexity of their single child.
+        return 1 + get_node_cost(node.child)
+
+    elif isinstance(node, (ceng.And, ceng.Or, ceng.Some)):
+        # the cost of these nodes is the full cost of their children
+        # as this is the worst-case scenario.
+        return 1 + sum(map(get_node_cost, node.children))
+
+    else:
+        # this should be all hash-lookup features.
+        # we give this a arbitrary weight of 1.
+        # the only thing more "important" than this is checking OS/Arch/Format.
+        return 1
+
+
+def optimize_statement(statement):
+    # this routine operates in-place
+
+    if isinstance(statement, (ceng.And, ceng.Or, ceng.Some)):
+        # has .children
+        statement.children = sorted(statement.children, key=lambda n: get_node_cost(n))
+        return
+    elif isinstance(statement, (ceng.Not, ceng.Range)):
+        # has .child
+        optimize_statement(statement.child)
+        return
+    else:
+        # appears to be "simple"
+        return
+
+
+def optimize_rule(rule):
+    # this routine operates in-place
+    optimize_statement(rule.statement)
+
+
+def optimize_rules(rules):
+    logger.debug("optimizing %d rules", len(rules))
+    for rule in rules:
+        optimize_rule(rule)
+    return rules
--- a/capa/perf.py
+++ b/capa/perf.py
@@ -0,0 +1,10 @@
+import collections
+from typing import Dict
+
+# this structure is unstable and may change before the next major release.
+counters: Dict[str, int] = collections.Counter()
+
+
+def reset():
+    global counters
+    counters = collections.Counter()
--- a/capa/render/utils.py
+++ b/capa/render/utils.py
@@ -60,6 +60,8 @@ def capability_rules(doc):
            continue
        if rule["meta"].get("maec/analysis-conclusion-ov"):
            continue
+        if rule["meta"].get("maec/malware-family"):
+            continue
        if rule["meta"].get("maec/malware-category"):
            continue
        if rule["meta"].get("maec/malware-category-ov"):
--- a/capa/rules.py
+++ b/capa/rules.py
@@ -14,6 +14,9 @@ import logging
 import binascii
 import functools
 import collections
+from enum import Enum
+
+from capa.helpers import assert_never

 try:
    from functools import lru_cache
@@ -22,13 +25,15 @@ except ImportError:
    # https://github.com/python/mypy/issues/1153
    from backports.functools_lru_cache import lru_cache  # type: ignore

-from typing import Any, Dict, List, Union, Iterator
+from typing import Any, Set, Dict, List, Tuple, Union, Iterator

 import yaml
 import ruamel.yaml

+import capa.perf
 import capa.engine as ceng
 import capa.features
+import capa.optimizer
 import capa.features.file
 import capa.features.insn
 import capa.features.common
@@ -46,6 +51,7 @@ META_KEYS = (
    "rule-category",
    "maec/analysis-conclusion",
    "maec/analysis-conclusion-ov",
+    "maec/malware-family",
    "maec/malware-category",
    "maec/malware-category-ov",
    "author",
@@ -64,9 +70,15 @@ META_KEYS = (
 HIDDEN_META_KEYS = ("capa/nursery", "capa/path")


-FILE_SCOPE = "file"
-FUNCTION_SCOPE = "function"
-BASIC_BLOCK_SCOPE = "basic block"
+class Scope(str, Enum):
+    FILE = "file"
+    FUNCTION = "function"
+    BASIC_BLOCK = "basic block"
+
+
+FILE_SCOPE = Scope.FILE.value
+FUNCTION_SCOPE = Scope.FUNCTION.value
+BASIC_BLOCK_SCOPE = Scope.BASIC_BLOCK.value


 SUPPORTED_FEATURES = {
@@ -619,8 +631,10 @@ class Rule:
        for new_rule in self._extract_subscope_rules_rec(self.statement):
            yield new_rule

-    def evaluate(self, features: FeatureSet):
-        return self.statement.evaluate(features)
+    def evaluate(self, features: FeatureSet, short_circuit=True):
+        capa.perf.counters["evaluate.feature"] += 1
+        capa.perf.counters["evaluate.feature.rule"] += 1
+        return self.statement.evaluate(features, short_circuit=short_circuit)

    @classmethod
    def from_dict(cls, d, definition):
@@ -958,12 +972,23 @@ class RuleSet:
        if len(rules) == 0:
            raise InvalidRuleSet("no rules selected")

+        rules = capa.optimizer.optimize_rules(rules)
+
        self.file_rules = self._get_rules_for_scope(rules, FILE_SCOPE)
        self.function_rules = self._get_rules_for_scope(rules, FUNCTION_SCOPE)
        self.basic_block_rules = self._get_rules_for_scope(rules, BASIC_BLOCK_SCOPE)
        self.rules = {rule.name: rule for rule in rules}
        self.rules_by_namespace = index_rules_by_namespace(rules)

+        # unstable
+        (self._easy_file_rules_by_feature, self._hard_file_rules) = self._index_rules_by_feature(self.file_rules)
+        (self._easy_function_rules_by_feature, self._hard_function_rules) = self._index_rules_by_feature(
+            self.function_rules
+        )
+        (self._easy_basic_block_rules_by_feature, self._hard_basic_block_rules) = self._index_rules_by_feature(
+            self.basic_block_rules
+        )
+
    def __len__(self):
        return len(self.rules)

@@ -973,6 +998,141 @@ class RuleSet:
    def __contains__(self, rulename):
        return rulename in self.rules

+    @staticmethod
+    def _index_rules_by_feature(rules) -> Tuple[Dict[Feature, Set[str]], List[str]]:
+        """
+        split the given rules into two structures:
+          - "easy rules" are indexed by feature,
+            such that you can quickly find the rules that contain a given feature.
+          - "hard rules" are those that contain substring/regex/bytes features or match statements.
+            these continue to be ordered topologically.
+
+        a rule evaluator can use the "easy rule" index to restrict the
+        candidate rules that might match a given set of features.
+
+        at this time, a rule evaluator can't do anything special with
+        the "hard rules". it must still do a full top-down match of each
+        rule, in topological order.
+        """
+
+        # we'll do a couple phases:
+        #
+        #  1. recursively visit all nodes in all rules,
+        #    a. indexing all features
+        #    b. recording the types of features found per rule
+        #  2. compute the easy and hard rule sets
+        #  3. remove hard rules from the rules-by-feature index
+        #  4. construct the topologically ordered list of hard rules
+        rules_with_easy_features: Set[str] = set()
+        rules_with_hard_features: Set[str] = set()
+        rules_by_feature: Dict[Feature, Set[str]] = collections.defaultdict(set)
+
+        def rec(rule_name: str, node: Union[Feature, Statement]):
+            """
+            walk through a rule's logic tree, indexing the easy and hard rules,
+            and the features referenced by easy rules.
+            """
+            if isinstance(
+                node,
+                (
+                    # these are the "hard features"
+                    # substring: scanning feature
+                    capa.features.common.Substring,
+                    # regex: scanning feature
+                    capa.features.common.Regex,
+                    # bytes: scanning feature
+                    capa.features.common.Bytes,
+                    # match: dependency on another rule,
+                    # which we have to evaluate first,
+                    # and is therefore tricky.
+                    capa.features.common.MatchedRule,
+                ),
+            ):
+                # hard feature: requires scan or match lookup
+                rules_with_hard_features.add(rule_name)
+            elif isinstance(node, capa.features.common.Feature):
+                # easy feature: hash lookup
+                rules_with_easy_features.add(rule_name)
+                rules_by_feature[node].add(rule_name)
+            elif isinstance(node, (ceng.Not)):
+                # `not:` statements are tricky to deal with.
+                #
+                # first, features found under a `not:` should not be indexed,
+                # because they're not wanted to be found.
+                # second, `not:` can be nested under another `not:`, or two, etc.
+                # third, `not:` at the root or directly under an `or:`
+                # means the rule will match against *anything* not specified there,
+                # which is a difficult set of things to compute and index.
+                #
+                # so, if a rule has a `not:` statement, its hard.
+                # as of writing, this is an uncommon statement, with only 6 instances in 740 rules.
+                rules_with_hard_features.add(rule_name)
+            elif isinstance(node, (ceng.Some)) and node.count == 0:
+                # `optional:` and `0 or more:` are tricky to deal with.
+                #
+                # when a subtree is optional, it may match, but not matching
+                # doesn't have any impact either.
+                # now, our rule authors *should* not put this under `or:`
+                # and this is checked by the linter,
+                # but this could still happen (e.g. private rule set without linting)
+                # and would be hard to trace down.
+                #
+                # so better to be safe than sorry and consider this a hard case.
+                rules_with_hard_features.add(rule_name)
+            elif isinstance(node, (ceng.Range)) and node.min == 0:
+                # `count(foo): 0 or more` are tricky to deal with.
+                # because the min is 0,
+                # this subtree *can* match just about any feature
+                # (except the given one)
+                # which is a difficult set of things to compute and index.
+                rules_with_hard_features.add(rule_name)
+            elif isinstance(node, (ceng.Range)):
+                rec(rule_name, node.child)
+            elif isinstance(node, (ceng.And, ceng.Or, ceng.Some)):
+                for child in node.children:
+                    rec(rule_name, child)
+            elif isinstance(node, ceng.Statement):
+                # unhandled type of statement.
+                # this should only happen if a new subtype of `Statement`
+                # has since been added to capa.
+                #
+                # ideally, we'd like to use mypy for exhaustiveness checking
+                # for all the subtypes of `Statement`.
+                # but, as far as i can tell, mypy does not support this type
+                # of checking.
+                #
+                # in a way, this makes some intuitive sense:
+                # the set of subtypes of type A is unbounded,
+                # because any user might come along and create a new subtype B,
+                # so mypy can't reason about this set of types.
+                assert False, f"Unhandled value: {node} ({type(node).__name__})"
+            else:
+                # programming error
+                assert_never(node)
+
+        for rule in rules:
+            rule_name = rule.meta["name"]
+            root = rule.statement
+            rec(rule_name, root)
+
+        # if a rule has a hard feature,
+        # dont consider it easy, and therefore,
+        # don't index any of its features.
+        #
+        # otherwise, its an easy rule, and index its features
+        for rules_with_feature in rules_by_feature.values():
+            rules_with_feature.difference_update(rules_with_hard_features)
+        easy_rules_by_feature = rules_by_feature
+
+        # `rules` is already topologically ordered,
+        # so extract our hard set into the topological ordering.
+        hard_rules = []
+        for rule in rules:
+            if rule.meta["name"] in rules_with_hard_features:
+                hard_rules.append(rule.meta["name"])
+
+        return (easy_rules_by_feature, hard_rules)
+
    @staticmethod
    def _get_rules_for_scope(rules, scope):
        """
@@ -1035,3 +1195,66 @@ class RuleSet:
                    rules_filtered.update(set(capa.rules.get_rules_and_dependencies(rules, rule.name)))
                    break
        return RuleSet(list(rules_filtered))
+
+    def match(self, scope: Scope, features: FeatureSet, va: int) -> Tuple[FeatureSet, ceng.MatchResults]:
+        """
+        match rules from this ruleset at the given scope against the given features.
+
+        this routine should act just like `capa.engine.match`,
+        except that it may be more performant.
+        """
+        easy_rules_by_feature = {}
+        if scope is Scope.FILE:
+            easy_rules_by_feature = self._easy_file_rules_by_feature
+            hard_rule_names = self._hard_file_rules
+        elif scope is Scope.FUNCTION:
+            easy_rules_by_feature = self._easy_function_rules_by_feature
+            hard_rule_names = self._hard_function_rules
+        elif scope is Scope.BASIC_BLOCK:
+            easy_rules_by_feature = self._easy_basic_block_rules_by_feature
+            hard_rule_names = self._hard_basic_block_rules
+        else:
+            assert_never(scope)
+
+        candidate_rule_names = set()
+        for feature in features:
+            easy_rule_names = easy_rules_by_feature.get(feature)
+            if easy_rule_names:
+                candidate_rule_names.update(easy_rule_names)
+
+        # first, match against the set of rules that have at least one
+        # feature shared with our feature set.
+        candidate_rules = [self.rules[name] for name in candidate_rule_names]
+        features2, easy_matches = ceng.match(candidate_rules, features, va)
+
+        # note that we've stored the updated feature set in `features2`.
+        # this contains a superset of the features in `features`;
+        # it contains additional features for any easy rule matches.
+        # we'll pass this feature set to hard rule matching, since one
+        # of those rules might rely on an easy rule match.
+        #
+        # the updated feature set from hard matching will go into `features3`.
+        # this is a superset of `features2` is a superset of `features`.
+        # ultimately, this is what we'll return to the caller.
+        #
+        # in each case, we could have assigned the updated feature set back to `features`,
+        # but this is slightly more explicit how we're tracking the data.
+
+        # now, match against (topologically ordered) list of rules
+        # that we can't really make any guesses about.
+        # these are rules with hard features, like substring/regex/bytes and match statements.
+        hard_rules = [self.rules[name] for name in hard_rule_names]
+        features3, hard_matches = ceng.match(hard_rules, features2, va)
+
+        # note that above, we probably are skipping matching a bunch of
+        # rules that definitely would never hit.
+        # specifically, "easy rules" that don't share any features with
+        # feature set.
+
+        # MatchResults doesn't technically have an .update() method
+        # but a dict does.
+        matches = {}  # type: ignore
+        matches.update(easy_matches)
+        matches.update(hard_matches)
+
+        return (features3, matches)
--- a/capa/version.py
+++ b/capa/version.py
@@ -1 +1 @@
-__version__ = "3.0.3"
+__version__ = "3.1.0"
--- a/2
+++ b/2
--- a/scripts/lint.py
+++ b/scripts/lint.py
@@ -230,9 +230,16 @@ def get_sample_capabilities(ctx: Context, path: Path) -> Set[str]:
        logger.debug("found cached results: %s: %d capabilities", nice_path, len(ctx.capabilities_by_sample[path]))
        return ctx.capabilities_by_sample[path]

+    if nice_path.endswith(capa.main.EXTENSIONS_SHELLCODE_32):
+        format = "sc32"
+    elif nice_path.endswith(capa.main.EXTENSIONS_SHELLCODE_64):
+        format = "sc64"
+    else:
+        format = "auto"
+
    logger.debug("analyzing sample: %s", nice_path)
    extractor = capa.main.get_extractor(
-        nice_path, "auto", capa.main.BACKEND_VIV, DEFAULT_SIGNATURES, False, disable_progress=True
+        nice_path, format, capa.main.BACKEND_VIV, DEFAULT_SIGNATURES, False, disable_progress=True
    )

    capabilities, _ = capa.main.find_capabilities(ctx.rules, extractor, disable_progress=True)
@@ -332,6 +339,52 @@ class OrStatementWithAlwaysTrueChild(Lint):
        return self.violation


+class NotNotUnderAnd(Lint):
+    name = "rule contains a `not` statement that's not found under an `and` statement"
+    recommendation = "clarify the rule logic and ensure `not` is always found under `and`"
+    violation = False
+
+    def check_rule(self, ctx: Context, rule: Rule):
+        self.violation = False
+
+        def rec(statement):
+            if isinstance(statement, capa.engine.Statement):
+                if not isinstance(statement, capa.engine.And):
+                    for child in statement.get_children():
+                        if isinstance(child, capa.engine.Not):
+                            self.violation = True
+
+                for child in statement.get_children():
+                    rec(child)
+
+        rec(rule.statement)
+
+        return self.violation
+
+
+class OptionalNotUnderAnd(Lint):
+    name = "rule contains an `optional` or `0 or more` statement that's not found under an `and` statement"
+    recommendation = "clarify the rule logic and ensure `optional` and `0 or more` is always found under `and`"
+    violation = False
+
+    def check_rule(self, ctx: Context, rule: Rule):
+        self.violation = False
+
+        def rec(statement):
+            if isinstance(statement, capa.engine.Statement):
+                if not isinstance(statement, capa.engine.And):
+                    for child in statement.get_children():
+                        if isinstance(child, capa.engine.Some) and child.count == 0:
+                            self.violation = True
+
+                for child in statement.get_children():
+                    rec(child)
+
+        rec(rule.statement)
+
+        return self.violation
+
+
 class UnusualMetaField(Lint):
    name = "unusual meta field"
    recommendation = "Remove the meta field"
@@ -653,6 +706,8 @@ LOGIC_LINTS = (
    DoesntMatchExample(),
    StatementWithSingleChildStatement(),
    OrStatementWithAlwaysTrueChild(),
+    NotNotUnderAnd(),
+    OptionalNotUnderAnd(),
 )


--- a/scripts/profile-time.py
+++ b/scripts/profile-time.py
@@ -0,0 +1,150 @@
+"""
+Invoke capa multiple times and record profiling informations.
+Use the --number and --repeat options to change the number of iterations.
+By default, the script will emit a markdown table with a label pulled from git.
+
+Note: you can run this script against pre-generated .frz files to reduce the startup time.
+
+usage:
+
+    usage: profile-time.py [--number NUMBER] [--repeat REPEAT] [--label LABEL] sample
+
+    Profile capa performance
+
+    positional arguments:
+      sample                path to sample to analyze
+
+    optional arguments:
+      --number NUMBER       batch size of profile collection
+      --repeat REPEAT       batch count of profile collection
+      --label LABEL         description of the profile collection
+
+example:
+
+    $ python profile-time.py ./tests/data/kernel32.dll_.frz --number 1 --repeat 2
+
+    | label                                | count(evaluations)   | avg(time)   | min(time)   | max(time)   |
+    |--------------------------------------|----------------------|-------------|-------------|-------------|
+    | 18c30e4 main: remove perf debug msgs | 66,561,622           | 132.13s     | 125.14s     | 139.12s     |
+
+      ^^^ --label or git hash               
+"""
+import sys
+import timeit
+import logging
+import argparse
+import subprocess
+
+import tqdm
+import tabulate
+
+import capa.main
+import capa.perf
+import capa.rules
+import capa.engine
+import capa.helpers
+import capa.features
+import capa.features.common
+import capa.features.freeze
+
+logger = logging.getLogger("capa.profile")
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    label = subprocess.run(
+        "git show --pretty=oneline --abbrev-commit | head -n 1", shell=True, capture_output=True, text=True
+    ).stdout.strip()
+    is_dirty = (
+        subprocess.run(
+            "git status | grep 'modified: ' | grep -v 'rules' | grep -v 'tests/data'",
+            shell=True,
+            capture_output=True,
+            text=True,
+        ).stdout
+        != ""
+    )
+
+    if is_dirty:
+        label += " (dirty)"
+
+    parser = argparse.ArgumentParser(description="Profile capa performance")
+    capa.main.install_common_args(parser, wanted={"format", "sample", "signatures", "rules"})
+
+    parser.add_argument("--number", type=int, default=3, help="batch size of profile collection")
+    parser.add_argument("--repeat", type=int, default=30, help="batch count of profile collection")
+    parser.add_argument("--label", type=str, default=label, help="description of the profile collection")
+
+    args = parser.parse_args(args=argv)
+    capa.main.handle_common_args(args)
+
+    try:
+        taste = capa.helpers.get_file_taste(args.sample)
+    except IOError as e:
+        logger.error("%s", str(e))
+        return -1
+
+    try:
+        with capa.main.timing("load rules"):
+            rules = capa.rules.RuleSet(capa.main.get_rules(args.rules, disable_progress=True))
+    except (IOError) as e:
+        logger.error("%s", str(e))
+        return -1
+
+    try:
+        sig_paths = capa.main.get_signatures(args.signatures)
+    except (IOError) as e:
+        logger.error("%s", str(e))
+        return -1
+
+    if (args.format == "freeze") or (args.format == "auto" and capa.features.freeze.is_freeze(taste)):
+        with open(args.sample, "rb") as f:
+            extractor = capa.features.freeze.load(f.read())
+    else:
+        extractor = capa.main.get_extractor(
+            args.sample, args.format, capa.main.BACKEND_VIV, sig_paths, should_save_workspace=False
+        )
+
+    with tqdm.tqdm(total=args.number * args.repeat) as pbar:
+
+        def do_iteration():
+            capa.perf.reset()
+            capa.main.find_capabilities(rules, extractor, disable_progress=True)
+            pbar.update(1)
+
+        samples = timeit.repeat(do_iteration, number=args.number, repeat=args.repeat)
+
+    logger.debug("perf: find capabilities: min: %0.2fs" % (min(samples) / float(args.number)))
+    logger.debug("perf: find capabilities: avg: %0.2fs" % (sum(samples) / float(args.repeat) / float(args.number)))
+    logger.debug("perf: find capabilities: max: %0.2fs" % (max(samples) / float(args.number)))
+
+    for (counter, count) in capa.perf.counters.most_common():
+        logger.debug("perf: counter: {:}: {:,}".format(counter, count))
+
+    print(
+        tabulate.tabulate(
+            [
+                (
+                    args.label,
+                    "{:,}".format(capa.perf.counters["evaluate.feature"]),
+                    # python documentation indicates that min(samples) should be preferred,
+                    # so lets put that first.
+                    #
+                    # https://docs.python.org/3/library/timeit.html#timeit.Timer.repeat
+                    "%0.2fs" % (min(samples) / float(args.number)),
+                    "%0.2fs" % (sum(samples) / float(args.repeat) / float(args.number)),
+                    "%0.2fs" % (max(samples) / float(args.number)),
+                )
+            ],
+            headers=["label", "count(evaluations)", "min(time)", "avg(time)", "max(time)"],
+            tablefmt="github",
+        )
+    )
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/show-features.py
+++ b/scripts/show-features.py
@@ -86,7 +86,7 @@ def main(argv=None):
        argv = sys.argv[1:]

    parser = argparse.ArgumentParser(description="Show the features that capa extracts from the given sample")
-    capa.main.install_common_args(parser, wanted={"format", "sample", "signatures"})
+    capa.main.install_common_args(parser, wanted={"format", "sample", "signatures", "backend"})

    parser.add_argument("-F", "--function", type=lambda x: int(x, 0x10), help="Show features for specific function")
    args = parser.parse_args(args=argv)
@@ -111,7 +111,7 @@ def main(argv=None):
        should_save_workspace = os.environ.get("CAPA_SAVE_WORKSPACE") not in ("0", "no", "NO", "n", None)
        try:
            extractor = capa.main.get_extractor(
-                args.sample, args.format, capa.main.BACKEND_VIV, sig_paths, should_save_workspace
+                args.sample, args.format, args.backend, sig_paths, should_save_workspace
            )
        except capa.main.UnsupportedFormatError:
            logger.error("-" * 80)
--- a/setup.py
+++ b/setup.py
@@ -18,10 +18,10 @@ requirements = [
    "termcolor==1.1.0",
    "wcwidth==0.2.5",
    "ida-settings==2.1.0",
-    "viv-utils[flirt]==0.6.7",
+    "viv-utils[flirt]==0.6.9",
    "halo==0.0.31",
    "networkx==2.5.1",
-    "ruamel.yaml==0.17.16",
+    "ruamel.yaml==0.17.20",
    "vivisect==1.0.5",
    "smda==1.6.2",
    "pefile==2021.9.3",
@@ -72,17 +72,17 @@ setuptools.setup(
            "pytest-instafail==0.4.2",
            "pytest-cov==3.0.0",
            "pycodestyle==2.8.0",
-            "black==21.9b0",
-            "isort==5.9.3",
-            "mypy==0.910",
-            "psutil==5.8.0",
+            "black==21.12b0",
+            "isort==5.10.1",
+            "mypy==0.931",
+            "psutil==5.9.0",
            # type stubs for mypy
            "types-backports==0.1.3",
-            "types-colorama==0.4.4",
-            "types-PyYAML==6.0.0",
-            "types-tabulate==0.8.3",
+            "types-colorama==0.4.5",
+            "types-PyYAML==6.0.3",
+            "types-tabulate==0.8.5",
            "types-termcolor==1.1.2",
-            "types-psutil==5.8.13",
+            "types-psutil==5.8.19",
        ],
    },
    zip_safe=False,
--- a/tests/data
+++ b/tests/data
--- a/tests/fixtures.py
+++ b/tests/fixtures.py
@@ -413,6 +413,7 @@ FEATURE_PRESENCE_TESTS = sorted(
        # insn/number
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0xFF), True),
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0x3136B0), True),
+        ("mimikatz", "function=0x401000", capa.features.insn.Number(0x0), True),
        # insn/number: stack adjustments
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0xC), False),
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0x10), False),
@@ -420,6 +421,9 @@ FEATURE_PRESENCE_TESTS = sorted(
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0xFF), True),
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0xFF, bitness=BITNESS_X32), True),
        ("mimikatz", "function=0x40105D", capa.features.insn.Number(0xFF, bitness=BITNESS_X64), False),
+        # insn/number: negative
+        ("mimikatz", "function=0x401553", capa.features.insn.Number(0xFFFFFFFF), True),
+        ("mimikatz", "function=0x43e543", capa.features.insn.Number(0xFFFFFFF0), True),
        # insn/offset
        ("mimikatz", "function=0x40105D", capa.features.insn.Offset(0x0), True),
        ("mimikatz", "function=0x40105D", capa.features.insn.Offset(0x4), True),
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@@ -5,13 +5,6 @@
 # Unless required by applicable law or agreed to in writing, software distributed under the License
 #  is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and limitations under the License.
-
-import textwrap
-
-import capa.rules
-import capa.engine
-import capa.features.insn
-import capa.features.common
 from capa.engine import *
 from capa.features import *
 from capa.features.insn import *
@@ -117,419 +110,27 @@ def test_range():
    assert Range(Number(1), min=1, max=3).evaluate({Number(1): {1, 2, 3, 4}}) == False


-def test_range_exact():
-    rule = textwrap.dedent(
-        """
-        rule:
-            meta:
-                name: test rule
-            features:
-                - count(number(100)): 2
-        """
-    )
-    r = capa.rules.Rule.from_yaml(rule)
+def test_short_circuit():
+    assert Or([Number(1), Number(2)]).evaluate({Number(1): {1}}) == True

-    # just enough matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
-    assert "test rule" in matches
-
-    # not enough matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
-    assert "test rule" not in matches
-
-    # too many matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2, 3}}, 0x0)
-    assert "test rule" not in matches
+    # with short circuiting, only the children up until the first satisfied child are captured.
+    assert len(Or([Number(1), Number(2)]).evaluate({Number(1): {1}}, short_circuit=True).children) == 1
+    assert len(Or([Number(1), Number(2)]).evaluate({Number(1): {1}}, short_circuit=False).children) == 2


-def test_range_range():
-    rule = textwrap.dedent(
-        """
-         rule:
-             meta:
-                 name: test rule
-             features:
-                 - count(number(100)): (2, 3)
-         """
-    )
-    r = capa.rules.Rule.from_yaml(rule)
+def test_eval_order():
+    # base cases.
+    assert Or([Number(1), Number(2)]).evaluate({Number(1): {1}}) == True
+    assert Or([Number(1), Number(2)]).evaluate({Number(2): {1}}) == True

-    # just enough matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
-    assert "test rule" in matches
+    # with short circuiting, only the children up until the first satisfied child are captured.
+    assert len(Or([Number(1), Number(2)]).evaluate({Number(1): {1}}).children) == 1
+    assert len(Or([Number(1), Number(2)]).evaluate({Number(2): {1}}).children) == 2
+    assert len(Or([Number(1), Number(2)]).evaluate({Number(1): {1}, Number(2): {1}}).children) == 1

-    # enough matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2, 3}}, 0x0)
-    assert "test rule" in matches
+    # and its guaranteed that children are evaluated in order.
+    assert Or([Number(1), Number(2)]).evaluate({Number(1): {1}}).children[0].statement == Number(1)
+    assert Or([Number(1), Number(2)]).evaluate({Number(1): {1}}).children[0].statement != Number(2)

-    # not enough matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
-    assert "test rule" not in matches
-
-    # too many matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2, 3, 4}}, 0x0)
-    assert "test rule" not in matches
-
-
-def test_range_exact_zero():
-    rule = textwrap.dedent(
-        """
-        rule:
-            meta:
-                name: test rule
-            features:
-                - count(number(100)): 0
-        """
-    )
-    r = capa.rules.Rule.from_yaml(rule)
-
-    # feature isn't indexed - good.
-    features, matches = capa.engine.match([r], {}, 0x0)
-    assert "test rule" in matches
-
-    # feature is indexed, but no matches.
-    # i don't think we should ever really have this case, but good to check anyways.
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {}}, 0x0)
-    assert "test rule" in matches
-
-    # too many matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
-    assert "test rule" not in matches
-
-
-def test_range_with_zero():
-    rule = textwrap.dedent(
-        """
-         rule:
-             meta:
-                 name: test rule
-             features:
-                 - count(number(100)): (0, 1)
-         """
-    )
-    r = capa.rules.Rule.from_yaml(rule)
-
-    # ok
-    features, matches = capa.engine.match([r], {}, 0x0)
-    assert "test rule" in matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {}}, 0x0)
-    assert "test rule" in matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
-    assert "test rule" in matches
-
-    # too many matches
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
-    assert "test rule" not in matches
-
-
-def test_match_adds_matched_rule_feature():
-    """show that using `match` adds a feature for matched rules."""
-    rule = textwrap.dedent(
-        """
-        rule:
-            meta:
-                name: test rule
-            features:
-                - number: 100
-        """
-    )
-    r = capa.rules.Rule.from_yaml(rule)
-    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
-    assert capa.features.common.MatchedRule("test rule") in features
-
-
-def test_match_matched_rules():
-    """show that using `match` adds a feature for matched rules."""
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule1
-                    features:
-                        - number: 100
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule2
-                    features:
-                        - match: test rule1
-                """
-            )
-        ),
-    ]
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.insn.Number(100): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule1") in features
-    assert capa.features.common.MatchedRule("test rule2") in features
-
-    # the ordering of the rules must not matter,
-    # the engine should match rules in an appropriate order.
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(reversed(rules)),
-        {capa.features.insn.Number(100): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule1") in features
-    assert capa.features.common.MatchedRule("test rule2") in features
-
-
-def test_substring():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule
-                    features:
-                        - and:
-                            - substring: abc
-                """
-            )
-        ),
-    ]
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("aaaa"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") not in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("abc"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("111abc222"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("111abc"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("abc222"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-
-def test_regex():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule
-                    features:
-                        - and:
-                            - string: /.*bbbb.*/
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: rule with implied wildcards
-                    features:
-                        - and:
-                            - string: /bbbb/
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: rule with anchor
-                    features:
-                        - and:
-                            - string: /^bbbb/
-                """
-            )
-        ),
-    ]
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.insn.Number(100): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") not in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("aaaa"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") not in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("aBBBBa"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") not in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("abbbba"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-    assert capa.features.common.MatchedRule("rule with implied wildcards") in features
-    assert capa.features.common.MatchedRule("rule with anchor") not in features
-
-
-def test_regex_ignorecase():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule
-                    features:
-                        - and:
-                            - string: /.*bbbb.*/i
-                """
-            )
-        ),
-    ]
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("aBBBBa"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-
-def test_regex_complex():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                r"""
-                rule:
-                    meta:
-                        name: test rule
-                    features:
-                        - or:
-                            - string: /.*HARDWARE\\Key\\key with spaces\\.*/i
-                """
-            )
-        ),
-    ]
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String(r"Hardware\Key\key with spaces\some value"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-
-def test_match_namespace():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: CreateFile API
-                        namespace: file/create/CreateFile
-                    features:
-                        - api: CreateFile
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: WriteFile API
-                        namespace: file/write
-                    features:
-                        - api: WriteFile
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: file-create
-                    features:
-                        - match: file/create
-                """
-            )
-        ),
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: filesystem-any
-                    features:
-                        - match: file
-                """
-            )
-        ),
-    ]
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.insn.API("CreateFile"): {1}},
-        0x0,
-    )
-    assert "CreateFile API" in matches
-    assert "file-create" in matches
-    assert "filesystem-any" in matches
-    assert capa.features.common.MatchedRule("file") in features
-    assert capa.features.common.MatchedRule("file/create") in features
-    assert capa.features.common.MatchedRule("file/create/CreateFile") in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.insn.API("WriteFile"): {1}},
-        0x0,
-    )
-    assert "WriteFile API" in matches
-    assert "file-create" not in matches
-    assert "filesystem-any" in matches
-
-
-def test_render_number():
-    assert str(capa.features.insn.Number(1)) == "number(0x1)"
-    assert str(capa.features.insn.Number(1, bitness=capa.features.common.BITNESS_X32)) == "number/x32(0x1)"
-    assert str(capa.features.insn.Number(1, bitness=capa.features.common.BITNESS_X64)) == "number/x64(0x1)"
-
-
-def test_render_offset():
-    assert str(capa.features.insn.Offset(1)) == "offset(0x1)"
-    assert str(capa.features.insn.Offset(1, bitness=capa.features.common.BITNESS_X32)) == "offset/x32(0x1)"
-    assert str(capa.features.insn.Offset(1, bitness=capa.features.common.BITNESS_X64)) == "offset/x64(0x1)"
+    assert Or([Number(1), Number(2)]).evaluate({Number(2): {1}}).children[1].statement == Number(2)
+    assert Or([Number(1), Number(2)]).evaluate({Number(2): {1}}).children[1].statement != Number(1)
--- a/tests/test_match.py
+++ b/tests/test_match.py
@@ -0,0 +1,533 @@
+# Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at: [package root]/LICENSE.txt
+# Unless required by applicable law or agreed to in writing, software distributed under the License
+#  is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and limitations under the License.
+
+import textwrap
+
+import capa.rules
+import capa.engine
+import capa.features.insn
+import capa.features.common
+from capa.rules import Scope
+from capa.features import *
+from capa.features.insn import *
+from capa.features.common import *
+
+
+def match(rules, features, va, scope=Scope.FUNCTION):
+    """
+    use all matching algorithms and verify that they compute the same result.
+    then, return those results to the caller so they can make their asserts.
+    """
+    features1, matches1 = capa.engine.match(rules, features, va)
+
+    ruleset = capa.rules.RuleSet(rules)
+    features2, matches2 = ruleset.match(scope, features, va)
+
+    for feature, locations in features1.items():
+        assert feature in features2
+        assert locations == features2[feature]
+
+    for rulename, results in matches1.items():
+        assert rulename in matches2
+        assert len(results) == len(matches2[rulename])
+
+    return features1, matches1
+
+
+def test_match_simple():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+                namespace: testns1/testns2
+            features:
+                - number: 100
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    features, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" in matches
+    assert MatchedRule("test rule") in features
+    assert MatchedRule("testns1") in features
+    assert MatchedRule("testns1/testns2") in features
+
+
+def test_match_range_exact():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+            features:
+                - count(number(100)): 2
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    # just enough matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" in matches
+
+    # not enough matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert "test rule" not in matches
+
+    # too many matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2, 3}}, 0x0)
+    assert "test rule" not in matches
+
+
+def test_match_range_range():
+    rule = textwrap.dedent(
+        """
+         rule:
+             meta:
+                 name: test rule
+             features:
+                 - count(number(100)): (2, 3)
+         """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    # just enough matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" in matches
+
+    # enough matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2, 3}}, 0x0)
+    assert "test rule" in matches
+
+    # not enough matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert "test rule" not in matches
+
+    # too many matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2, 3, 4}}, 0x0)
+    assert "test rule" not in matches
+
+
+def test_match_range_exact_zero():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+            features:
+                - count(number(100)): 0
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    # feature isn't indexed - good.
+    _, matches = match([r], {}, 0x0)
+    assert "test rule" in matches
+
+    # feature is indexed, but no matches.
+    # i don't think we should ever really have this case, but good to check anyways.
+    _, matches = match([r], {capa.features.insn.Number(100): {}}, 0x0)
+    assert "test rule" in matches
+
+    # too many matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert "test rule" not in matches
+
+
+def test_match_range_with_zero():
+    rule = textwrap.dedent(
+        """
+         rule:
+             meta:
+                 name: test rule
+             features:
+                 - count(number(100)): (0, 1)
+         """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    # ok
+    _, matches = match([r], {}, 0x0)
+    assert "test rule" in matches
+    _, matches = match([r], {capa.features.insn.Number(100): {}}, 0x0)
+    assert "test rule" in matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert "test rule" in matches
+
+    # too many matches
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" not in matches
+
+
+def test_match_adds_matched_rule_feature():
+    """show that using `match` adds a feature for matched rules."""
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+            features:
+                - number: 100
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+    features, _ = match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert capa.features.common.MatchedRule("test rule") in features
+
+
+def test_match_matched_rules():
+    """show that using `match` adds a feature for matched rules."""
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule1
+                    features:
+                        - number: 100
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule2
+                    features:
+                        - match: test rule1
+                """
+            )
+        ),
+    ]
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.insn.Number(100): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule1") in features
+    assert capa.features.common.MatchedRule("test rule2") in features
+
+    # the ordering of the rules must not matter,
+    # the engine should match rules in an appropriate order.
+    features, _ = match(
+        capa.rules.topologically_order_rules(reversed(rules)),
+        {capa.features.insn.Number(100): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule1") in features
+    assert capa.features.common.MatchedRule("test rule2") in features
+
+
+def test_match_namespace():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: CreateFile API
+                        namespace: file/create/CreateFile
+                    features:
+                        - api: CreateFile
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: WriteFile API
+                        namespace: file/write
+                    features:
+                        - api: WriteFile
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: file-create
+                    features:
+                        - match: file/create
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: filesystem-any
+                    features:
+                        - match: file
+                """
+            )
+        ),
+    ]
+
+    features, matches = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.insn.API("CreateFile"): {1}},
+        0x0,
+    )
+    assert "CreateFile API" in matches
+    assert "file-create" in matches
+    assert "filesystem-any" in matches
+    assert capa.features.common.MatchedRule("file") in features
+    assert capa.features.common.MatchedRule("file/create") in features
+    assert capa.features.common.MatchedRule("file/create/CreateFile") in features
+
+    features, matches = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.insn.API("WriteFile"): {1}},
+        0x0,
+    )
+    assert "WriteFile API" in matches
+    assert "file-create" not in matches
+    assert "filesystem-any" in matches
+
+
+def test_match_substring():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - and:
+                            - substring: abc
+                """
+            )
+        ),
+    ]
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("aaaa"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") not in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("abc"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("111abc222"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("111abc"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("abc222"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+
+def test_match_regex():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - and:
+                            - string: /.*bbbb.*/
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: rule with implied wildcards
+                    features:
+                        - and:
+                            - string: /bbbb/
+                """
+            )
+        ),
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: rule with anchor
+                    features:
+                        - and:
+                            - string: /^bbbb/
+                """
+            )
+        ),
+    ]
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.insn.Number(100): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") not in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("aaaa"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") not in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("aBBBBa"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") not in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("abbbba"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+    assert capa.features.common.MatchedRule("rule with implied wildcards") in features
+    assert capa.features.common.MatchedRule("rule with anchor") not in features
+
+
+def test_match_regex_ignorecase():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - and:
+                            - string: /.*bbbb.*/i
+                """
+            )
+        ),
+    ]
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("aBBBBa"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+
+def test_match_regex_complex():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                r"""
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - or:
+                            - string: /.*HARDWARE\\Key\\key with spaces\\.*/i
+                """
+            )
+        ),
+    ]
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String(r"Hardware\Key\key with spaces\some value"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+
+def test_match_regex_values_always_string():
+    rules = [
+        capa.rules.Rule.from_yaml(
+            textwrap.dedent(
+                """
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - or:
+                            - string: /123/
+                            - string: /0x123/
+                """
+            )
+        ),
+    ]
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("123"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+    features, _ = match(
+        capa.rules.topologically_order_rules(rules),
+        {capa.features.common.String("0x123"): {1}},
+        0x0,
+    )
+    assert capa.features.common.MatchedRule("test rule") in features
+
+
+def test_match_not():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+                namespace: testns1/testns2
+            features:
+                - not:
+                    - number: 99
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" in matches
+
+
+def test_match_not_not():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+                namespace: testns1/testns2
+            features:
+                - not:
+                    - not:
+                        - number: 100
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    _, matches = match([r], {capa.features.insn.Number(100): {1, 2}}, 0x0)
+    assert "test rule" in matches
--- a/tests/test_optimizer.py
+++ b/tests/test_optimizer.py
@@ -0,0 +1,65 @@
+# Copyright (C) 2021 FireEye, Inc. All Rights Reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at: [package root]/LICENSE.txt
+# Unless required by applicable law or agreed to in writing, software distributed under the License
+#  is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and limitations under the License.
+
+import textwrap
+
+import pytest
+
+import capa.rules
+import capa.engine
+import capa.optimizer
+import capa.features.common
+from capa.engine import Or, And
+from capa.features.insn import Mnemonic
+from capa.features.common import Arch, Bytes, Substring
+
+
+def test_optimizer_order():
+    rule = textwrap.dedent(
+        """
+        rule:
+            meta:
+                name: test rule
+                scope: function
+            features:
+                - and:
+                    - substring: "foo"
+                    - arch: amd64
+                    - mnemonic: cmp
+                    - and:
+                      - bytes: 3
+                      - offset: 2
+                    - or:
+                      - number: 1
+                      - offset: 4
+        """
+    )
+    r = capa.rules.Rule.from_yaml(rule)
+
+    # before optimization
+    children = list(r.statement.get_children())
+    assert isinstance(children[0], Substring)
+    assert isinstance(children[1], Arch)
+    assert isinstance(children[2], Mnemonic)
+    assert isinstance(children[3], And)
+    assert isinstance(children[4], Or)
+
+    # after optimization
+    capa.optimizer.optimize_rules([r])
+    children = list(r.statement.get_children())
+
+    # cost: 0
+    assert isinstance(children[0], Arch)
+    # cost: 1
+    assert isinstance(children[1], Mnemonic)
+    # cost: 2
+    assert isinstance(children[2], Substring)
+    # cost: 3
+    assert isinstance(children[3], Or)
+    # cost: 4
+    assert isinstance(children[4], And)
--- a/tests/test_render.py
+++ b/tests/test_render.py
@@ -2,9 +2,23 @@ import textwrap

 import capa.rules
 import capa.render.utils
+import capa.features.insn
+import capa.features.common
 import capa.render.result_document


+def test_render_number():
+    assert str(capa.features.insn.Number(1)) == "number(0x1)"
+    assert str(capa.features.insn.Number(1, bitness=capa.features.common.BITNESS_X32)) == "number/x32(0x1)"
+    assert str(capa.features.insn.Number(1, bitness=capa.features.common.BITNESS_X64)) == "number/x64(0x1)"
+
+
+def test_render_offset():
+    assert str(capa.features.insn.Offset(1)) == "offset(0x1)"
+    assert str(capa.features.insn.Offset(1, bitness=capa.features.common.BITNESS_X32)) == "offset/x32(0x1)"
+    assert str(capa.features.insn.Offset(1, bitness=capa.features.common.BITNESS_X64)) == "offset/x64(0x1)"
+
+
 def test_render_meta_attack():
    # Persistence::Boot or Logon Autostart Execution::Registry Run Keys / Startup Folder [T1547.001]
    id = "T1543.003"
--- a/tests/test_rules.py
+++ b/tests/test_rules.py
@@ -785,37 +785,6 @@ def test_substring_description():
    assert (Substring("abc") in children) == True


-def test_regex_values_always_string():
-    rules = [
-        capa.rules.Rule.from_yaml(
-            textwrap.dedent(
-                """
-                rule:
-                    meta:
-                        name: test rule
-                    features:
-                        - or:
-                            - string: /123/
-                            - string: /0x123/
-                """
-            )
-        ),
-    ]
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("123"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-    features, matches = capa.engine.match(
-        capa.rules.topologically_order_rules(rules),
-        {capa.features.common.String("0x123"): {1}},
-        0x0,
-    )
-    assert capa.features.common.MatchedRule("test rule") in features
-
-
 def test_filter_rules():
    rules = capa.rules.RuleSet(
        [