import source files, forgetting about 938 prior commits

2025-12-05 20:40:05 -08:00 · 2020-06-18 09:13:01 -06:00
parent f2d795090c
commit add3537447
65 changed files with 10322 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,111 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+.idea/*
+*.prof
+*.viv
+*.idb
+*.i64
+!rules/lib
--- a/README.md
+++ b/README.md
@@ -0,0 +1,456 @@
+# capa
+[![Build Status](https://drone.oneteamed.net/api/badges/FLARE/capa/status.svg)](https://drone.oneteamed.net/FLARE/capa)
+
+capa detects capabilities in executable files.
+You run it against a .exe or .dll and it tells you what it thinks the program can do.
+For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.
+
+```
+λ capa.exe suspicious.exe -q
+
+objectives:
+  communication
+  data manipulation
+  machine access control
+
+behaviors:
+  communication-via-http
+  encrypt data
+  load code functionality
+
+techniques:
+  send-http-request
+  encrypt data using rc4
+  load pe
+
+```
+
+# download
+
+Download capa from the [Releases](/releases) page or get the nightly builds here:
+- Windows 64bit: TODO
+- Windows 32bit: TODO
+- Linux: TODO
+- OSX: TODO
+
+
+# contents
+
+- [installation](#installation)
+- [example](#example)
+- [rule format](#rule-format)
+  - [meta block](#meta-block)
+  - [features block](#features-block)
+- [extracted features](#extracted-features)
+  - [function features](#function-features)
+    - [api](#api)
+    - [number](#number)
+    - [string](#string)
+    - [bytes](#bytes)
+    - [offset](#offset)
+    - [mnemonic](#mnemonic)
+    - [characteristics](#characteristics)
+  - [file features](#file-features)
+    - [string](#file-string)
+    - [export](#export)
+    - [import](#import)
+    - [section](#section)
+  - [counting](#counting)
+  - [matching prior rule matches](#matching-prior-rule-matches)
+- [limitations](#Limitations)
+
+# installation
+
+See [doc/installation.md](doc/installation.md) for information on how to setup the project, including how to use it as a Python library.
+
+For more information about how to use capa, including running it as an IDA script/plugin see [doc/usage.md](doc/usage.md).
+
+# example
+
+Here we run capa against an unknown binary (`level32.exe`),
+and the tool reports that the program can decode data via XOR,
+references data in its resource section, writes to a file, and spawns a new process.
+Taken together, this makes us think that `level32.exe` could be a dropper.
+Therefore, our next analysis step might be to run `level32.exe` in a sandbox and try to recover the payload.
+
+```
+λ capa.exe level32.exe -q
+disposition: malicious
+category: dropper
+
+objectives:
+  data manipulation
+  machine access control
+
+behaviors:
+  encrypt data
+  load code functionality
+
+techniques:
+  encrypt data using rc4
+  load pe
+
+anomalies:
+  embedded PE file
+```
+
+By passing the `-vv` flag (for Very Verbose), capa reports exactly where it found evidence of these capabilities.
+This is useful for at least two reasons:
+
+  - it helps explain why we should trust the results, and enables us to verify the conclusions
+  - it shows where within the binary an experienced analyst might study with IDA Pro
+
+```
+λ capa.exe level32.exe -q -vv
+rule load PE file:
+  - function 0x401c58:
+      or:
+        and:
+          mnemonic(cmp):
+            - virtual address: 0x401c58
+            - virtual address: 0x401c68
+            - virtual address: 0x401c74
+            - virtual address: 0x401c7f
+            - virtual address: 0x401c8a
+          or:
+            number(0x4550):
+              - virtual address: 0x401c68
+          or:
+            number(0x5a4d):
+              - virtual address: 0x401c58
+...
+```
+
+
+# rule format
+
+capa uses a collection of rules to identify capabilities within a program.
+These rules are easy to write, even for those new to reverse engineering.
+By authoring rules, you can extend the capabilities that capa recognizes.
+In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.
+
+Here's an example rule used by capa:
+
+```
+───────┬────────────────────────────────────────────────────────
+       │ File: rules/calculate-crc32.yml
+───────┼────────────────────────────────────────────────────────
+   1   │ rule:
+   2   │   meta:
+   3   │     name: calculate CRC32
+   4   |     rule-category: data-manipulation/hash-data/hash-data-using-crc32
+   5   │     author: moritz.raabe@fireeye.com
+   6   │     scope: function
+   7   │     examples:
+   8   │       - 2D3EDC218A90F03089CC01715A9F047F:0x403CBD
+   9   │   features:
+  10   │     - and:
+  11   │       - mnemonic: shr
+  12   │       - number: 0xEDB88320
+  13   │       - number: 8
+  14   │       - characteristic(nzxor): True
+───────┴────────────────────────────────────────────────────────
+```
+
+Rules are yaml files that follow a certain schema.
+
+The top level element is a dictionary named `rule` with two required children dictionaries:
+`meta` and `features`.
+
+
+## meta block
+
+The meta block contains metadata that identifies the rule, categorizes into behaviors, 
+and provides references to additional documentation.
+Here are the common fields:
+
+  - `name` is required. This string should uniquely identify the rule.
+
+  - `rule-category` is required when a rule describes a behavior (as opposed to matching a role or disposition).
+The rule category specifies an objective, behavior, and technique matched by this rule,
+using a format like `$objective/$behavior/$technique`.
+An objective is a high level goal of a program, such as "communication".
+A behavior is something that a program may do, such as "communication via socket".
+A technique is a way of implementing some behavior, such as "send-data".
+
+  - `maec/malware-category` is required when the rule describes a role, such as `dropper` or `backdoor`.
+
+  - `maec/analysis-conclusion` is required when the rule describes a disposition, such as `benign` or `malicious`.
+
+  - `scope` indicates to which feature set this rule applies.
+    It can takes the following values:
+    - **`basic block`:** limits matches to a basic block.
+      It is used to achieve locality in rules (for example for parameters of a function).
+    - **`function`:** identify functions.
+      It doesn't support child functions (see [doc/limitations.md](doc/limitations.md#wrapper-functions-and-matches-in-child-functions)).
+      It is the default.
+    - **`file`:** matches file format aspects.
+    - **`program`:** *matches the matches* of `function` and `file` scopes.
+      Not yet implemented.
+
+  - `author` specifies the name or handle of the rule author.
+
+  - `examples` is a list of references to samples that should match the capability.
+When the rule scope is `function`, then the reference should be `<sample hash>:<function va>`.
+
+  - `reference` lists related information in a book, article, blog post, etc.
+
+Other fields are allowed but not defined in this specification. `description` is probably a good one.
+
+
+## features block
+
+This section declares logical statements about the features that must exist for the rule to match.
+
+There are five structural expressions that may be nested:
+  - `and` - all of the children expressions must match
+  - `or` - match at least one of the children
+  - `not` - match when the child expression does not
+  - `N or more` - match at least `N` or more of the children
+    - `optional` is an alias for `0 or more`, which is useful for documenting related features. See [write-file.yml](/rules/machine-access-control/file-manipulation/write-file.yml) for an example.
+  
+For example, consider the following rule:
+
+```
+   9   │     - and:
+  10   │       - mnemonic: shr
+  11   │       - number: 0xEDB88320
+  12   │       - number: 8
+  13   │       - characteristic(nzxor): True
+```
+
+For this to match, the function must:
+  - contain an `shr` instruction, and
+  - reference the immediate constant `0xEDB88320`, which some may recognize as related to the CRC32 checksum, and
+  - reference the number `8`, and
+  - have an unusual feature, in this case, contain a non-zeroing XOR instruction
+If only one of these features is found in a function, the rule will not match.
+
+
+# extracted features
+
+## function features
+
+capa extracts features from the disassembly of a function, such as which API functions are called.
+The tool also reasons about the code structure to guess at function-level constructs.
+These are the features supported at the function-scope:
+
+  - [api](#api)
+  - [number](#number)
+  - [string](#string)
+  - [bytes](#bytes)
+  - [offset](#offset)
+  - [mnemonic](#mnemonic)
+  - [characteristics](#characteristics)
+
+### api
+A call to a named function, probably an import,
+though possibly a local function (like `malloc`) extracted via FLIRT.
+
+The parameter is a string describing the function name, specified like `module.functionname` or `functionname`.
+
+Example:
+
+    api: kernel32.CreateFileA
+    api: CreateFileA
+
+### number
+A number used by the logic of the program.
+This should not be a stack or structure offset.
+For example, a crypto constant.
+
+The parameter is a number; if prefixed with `0x` then in hex format, otherwise, decimal format.
+
+To associate context with a number, e.g. for constant definitions, append an equal sign and the respective name to
+the number definition. This helps with documenting rules and provides context in capa's output.
+
+Examples:
+
+    number: 16
+    number: 0x10
+    number: 0x40 = PAGE_EXECUTE_READWRITE
+
+TODO: signed vs unsigned.
+
+### string
+A string referenced by the logic of the program.
+This is probably a pointer to an ASCII or Unicode string.
+This could also be an obfuscated string, for example a stack string.
+
+The parameter is a string describing the string.
+This can be the verbatim value, or a regex matching the string.
+Regexes should be surrounded with `/` characters. 
+By default, capa uses case-sensitive matching and assumes leading and trailing wildcards.
+To perform case-insensitive matching append an `i`. To anchor the regex at the start or end of a string, use `^` and/or `$`.
+
+Examples:
+
+    string: This program cannot be run in DOS mode.
+    string: Firefox 64.0
+    string: /SELECT.*FROM.*WHERE/
+    string: /Hardware\\Description\\System\\CentralProcessor/i
+    
+Note that regex matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly.
+
+### bytes
+A sequence of bytes referenced by the logic of the program. 
+The provided sequence must match from the beginning of the referenced bytes and be no more than `0x100` bytes.
+The parameter is a sequence of hexadecimal bytes followed by an optional description.
+ 
+
+The example below illustrates byte matching given a COM CLSID pushed onto the stack prior to `CoCreateInstance`.
+
+Disassembly:
+
+    push    offset iid_004118d4_IShellLinkA ; riid
+    push    1               ; dwClsContext
+    push    0               ; pUnkOuter
+    push    offset clsid_004118c4_ShellLink ; rclsid
+    call    ds:CoCreateInstance
+
+Example rule elements:
+
+    bytes: 01 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = CLSID_ShellLink
+    bytes: EE 14 02 00 00 00 00 00 C0 00 00 00 00 00 00 46 = IID_IShellLink
+
+### offset
+A structure offset referenced by the logic of the program.
+This should not be a stack offset.
+
+The parameter is a number; if prefixed with `0x` then in hex format, otherwise, decimal format.
+
+Examples:
+
+    offset: 0xC
+    offset: 0x14
+
+### mnemonic
+
+An instruction mnemonic found in the given function.
+
+The parameter is a string containing the mnemonic.
+
+Examples:
+
+    mnemonic: xor
+    mnemonic: shl
+    
+    
+### characteristics
+
+Characteristics are features that are extracted by the analysis engine.
+They are one-off features that seem interesting to the authors.
+
+For example, the `characteristic(nzxor)` feature describes non-zeroing XOR instructions.
+captdet does not support instruction pattern matching,
+ so a select set of interesting instructions are pulled out as characteristics.
+
+| characteristic                             | scope                 | description |
+|--------------------------------------------|-----------------------|-------------|
+| `characteristic(embedded pe): true`        | file                  | (XOR encoded) embedded PE files. |
+| `characteristic(switch): true`             | function              | Function contains a switch or jump table. |
+| `characteristic(loop): true`               | function              | Function contains a loop. |
+| `characteristic(recursive call): true`     | function              | Function is recursive. |
+| `characteristic(calls from): true`         | function              | There are unique calls from this function. Best used like: `count(characteristic(calls from)): 3 or more` |
+| `characteristic(calls to): true`           | function              | There are unique calls to this function. Best used like: `count(characteristic(calls to)): 3 or more` |
+| `characteristic(nzxor): true`              | basic block, function | Non-zeroing XOR instruction |
+| `characteristic(peb access): true`         | basic block, function | Access to the process environment block (PEB), e.g. via fs:[30h], gs:[60h], or `NtCurrentPeb` |
+| `characteristic(fs access): true`          | basic block, function | Access to memory via the `fs` segment. |
+| `characteristic(gs access): true`          | basic block, function | Access to memory via the `gs` segment. |
+| `characteristic(cross section flow): true` | basic block, function | Function contains a call/jump to a different section. This is commonly seen in unpacking stubs. |
+| `characteristic(tight loop): true`         | basic block           | A tight loop where a basic block branches to itself. |
+| `characteristic(indirect call): true`      | basic block, function | Indirect call instruction; for example, `call edx` or `call qword ptr [rsp+78h]`. |
+
+## file features
+
+capa extracts features from the file data.
+File features stem from the file structure, i.e. PE structure or the raw file data.
+These are the features supported at the file-scope:
+
+  - [string](#file-string)
+  - [export](#export)
+  - [import](#import)
+  - [section](#section)
+
+### file string
+An ASCII or UTF-16 LE string present in the file.
+
+The parameter is a string describing the string.
+This can be the verbatim value, or a regex matching the string.
+Regexes should be surrounded with `/` characters. By default, capa uses case-sensitive matching.
+To perform case-insensitive matching append an `i`.
+
+Examples:
+
+    string: Z:\Dev\dropper\dropper.pdb
+    string: [ENTER]
+    string: /.*VBox.*/
+    string: /.*Software\Microsoft\Windows\CurrentVersion\Run.*/i
+
+Note that regex matching is expensive (`O(features)` rather than `O(1)`) so they should be used sparingly.
+
+### export
+
+The name of a routine exported from a shared library.
+
+Examples:
+
+    export: InstallA
+
+### import
+
+The name of a routine imported from a shared library.
+
+Examples:
+
+    import: kernel32.WinExec
+    import: WinExec           # wildcard module name
+    import: kernel32.#22      # by ordinal
+
+### section
+
+The name of a section in a structured file.
+
+Examples:
+
+    section: .rsrc
+
+## counting
+
+Many rules will inspect the feature set for a select combination of features;
+however, some rules may consider the number of times a feature was seen in a feature set.
+
+These rules can be expressed like:
+
+    count(characteristic(nzxor)): 2           # exactly match count==2
+    count(characteristic(nzxor)): 2 or more   # at least two matches
+    count(characteristic(nzxor)): 2 or fewer  # at most two matches
+    count(characteristic(nzxor)): (2, 10)     # match any value in the range 2<=count<=10
+
+    count(mnemonic(mov)): 3
+    count(basic block): 4
+
+## matching prior rule matches
+
+capa rules can specify logic for matching on other rule matches.
+This allows a rule author to refactor common capability patterns into their own reusable components.
+You can specify a rule match expression like so:
+
+    - and:
+      - match: file creation
+      - match: process creation
+
+Rules are uniquely identified by their `rule.meta.name` property;
+this is the value that should appear on the right hand side of the `match` expression.
+
+capa will refuse to run if a rule dependency is not present during matching.
+
+Common rule patterns, such as the various ways to implement "writes to a file", can be refactored into "library rules". 
+These are rules with `rule.meta.lib: True`.
+By default, library rules will not be output to the user as a rule match, 
+but can be matched by other rules.
+When no active rules depend on a library rule, these the library rules will not be evaluated - maintaining performance.
+
+# limitations
+
+To learn more about capa's current limitations see [here](doc/limitations.md).
--- a/capa/init.py
+++ b/capa/init.py
--- a/capa/engine.py
+++ b/capa/engine.py
@@ -0,0 +1,286 @@
+import re
+import sys
+import copy
+import collections
+
+import capa.features
+
+
+class Statement(object):
+    '''
+    superclass for structural nodes, such as and/or/not.
+    this exists to provide a default impl for `__str__` and `__repr__`,
+     and to declare the interface method `evaluate`
+    '''
+    def __init__(self):
+        super(Statement, self).__init__()
+        self.name = self.__class__.__name__
+
+    def __str__(self):
+        return '%s(%s)' % (self.name.lower(), ','.join(map(str, self.get_children())))
+
+    def __repr__(self):
+        return str(self)
+
+    def evaluate(self, ctx):
+        '''
+        classes that inherit `Statement` must implement `evaluate`
+
+        args:
+          ctx (defaultdict[Feature, set[VA]])
+
+        returns:
+          Result
+        '''
+        raise NotImplementedError()
+
+    def get_children(self):
+        if hasattr(self, 'child'):
+            yield self.child
+
+        if hasattr(self, 'children'):
+            for child in self.children:
+                yield child
+
+    def replace_child(self, existing, new):
+        if hasattr(self, 'child'):
+            if self.child is existing:
+                self.child = new
+
+        if hasattr(self, 'children'):
+            for i, child in enumerate(self.children):
+                if child is existing:
+                    self.children[i] = new
+
+
+class Result(object):
+    '''
+    represents the results of an evaluation of statements against features.
+
+    instances of this class should behave like a bool,
+    e.g. `assert Result(True, ...) == True`
+
+    instances track additional metadata about evaluation results.
+    they contain references to the statement node (e.g. an And statement),
+     as well as the children Result instances.
+
+    we need this so that we can render the tree of expressions and their results.
+    '''
+    def __init__(self, success, statement, children, locations=None):
+        '''
+        args:
+          success (bool)
+          statement (capa.engine.Statement or capa.features.Feature)
+          children (list[Result])
+          locations (iterable[VA])
+        '''
+        super(Result, self).__init__()
+        self.success = success
+        self.statement = statement
+        self.children = children
+        self.locations = locations if locations is not None else ()
+
+    def __eq__(self, other):
+        if isinstance(other, bool):
+            return self.success == other
+        return False
+
+    def __bool__(self):
+        return self.success
+
+    def __nonzero__(self):
+        return self.success
+
+
+class And(Statement):
+    '''match if all of the children evaluate to True.'''
+    def __init__(self, *children):
+        super(And, self).__init__()
+        self.children = list(children)
+
+    def evaluate(self, ctx):
+        results = [child.evaluate(ctx) for child in self.children]
+        success = all(results)
+        return Result(success, self, results)
+
+
+class Or(Statement):
+    '''match if any of the children evaluate to True.'''
+    def __init__(self, *children):
+        super(Or, self).__init__()
+        self.children = list(children)
+
+    def evaluate(self, ctx):
+        results = [child.evaluate(ctx) for child in self.children]
+        success = any(results)
+        return Result(success, self, results)
+
+
+class Not(Statement):
+    '''match only if the child evaluates to False.'''
+    def __init__(self, child):
+        super(Not, self).__init__()
+        self.child = child
+
+    def evaluate(self, ctx):
+        results = [self.child.evaluate(ctx)]
+        success = not results[0]
+        return Result(success, self, results)
+
+
+class Some(Statement):
+    '''match if at least N of the children evaluate to True.'''
+    def __init__(self, count, *children):
+        super(Some, self).__init__()
+        self.count = count
+        self.children = list(children)
+
+    def evaluate(self, ctx):
+        results = [child.evaluate(ctx) for child in self.children]
+        # note that here we cast the child result as a bool
+        # because we've overridden `__bool__` above.
+        #
+        # we can't use `if child is True` because the instance is not True.
+        success = sum([1 for child in results if bool(child) is True]) >= self.count
+        return Result(success, self, results)
+
+
+class Element(Statement):
+    '''match if the child is contained in the ctx set.'''
+    def __init__(self, child):
+        super(Element, self).__init__()
+        self.child = child
+
+    def __hash__(self):
+        return hash((self.name, self.child))
+
+    def __eq__(self, other):
+        return self.name == other.name and self.child == other.child
+
+    def evaluate(self, ctx):
+        return Result(self.child in ctx, self, [])
+
+
+class Range(Statement):
+    '''match if the child is contained in the ctx set with a count in the given range.'''
+    def __init__(self, child, min=None, max=None):
+        super(Range, self).__init__()
+        self.child = child
+        self.min = min if min is not None else 0
+        self.max = max if max is not None else (1 << 64 - 1)
+
+    def evaluate(self, ctx):
+        if self.child not in ctx:
+            return Result(False, self, [self.child])
+
+        count = len(ctx[self.child])
+        return Result(self.min <= count <= self.max, self, [], locations=ctx[self.child])
+
+    def __str__(self):
+        if self.max == (1 << 64 - 1):
+            return 'range(%s, min=%d, max=infinity)' % (str(self.child), self.min)
+        else:
+            return 'range(%s, min=%d, max=%d)' % (str(self.child), self.min, self.max)
+
+
+class Regex(Statement):
+    '''match if the given pattern matches a String feature.'''
+    def __init__(self, pattern):
+        super(Regex, self).__init__()
+        self.pattern = pattern
+        pat = self.pattern[len('/'):-len('/')]
+        flags = re.DOTALL
+        if pattern.endswith('/i'):
+            pat = self.pattern[len('/'):-len('/i')]
+            flags |= re.IGNORECASE
+        self.re = re.compile(pat, flags)
+        self.match = ''
+
+    def evaluate(self, ctx):
+        for feature, locations in ctx.items():
+            if not isinstance(feature, (capa.features.String, )):
+                continue
+
+            # `re.search` finds a match anywhere in the given string
+            # which implies leading and/or trailing whitespace.
+            # using this mode cleans is more convenient for rule authors,
+            # so that they don't have to prefix/suffix their terms like: /.*foo.*/.
+            if self.re.search(feature.value):
+                self.match = feature.value
+                return Result(True, self, [], locations=locations)
+
+        return Result(False, self, [])
+
+    def __str__(self):
+        return 'regex(string =~ %s, matched = "%s")' % (self.pattern, self.match)
+
+
+class Subscope(Statement):
+    '''
+    a subscope element is a placeholder in a rule - it should not be evaluated directly.
+    the engine should preprocess rules to extract subscope statements into their own rules.
+    '''
+    def __init__(self, scope, child):
+        super(Subscope, self).__init__()
+        self.scope = scope
+        self.child = child
+
+    def evaluate(self, ctx):
+        raise ValueError('cannot evaluate a subscope directly!')
+
+
+def topologically_order_rules(rules):
+    '''
+    order the given rules such that dependencies show up before dependents.
+    this means that as we match rules, we can add features, and these
+     will be matched by subsequent rules if they follow this order.
+
+    assumes that the rule dependency graph is a DAG.
+    '''
+    rules = {rule.name: rule for rule in rules}
+    seen = set([])
+    ret = []
+
+    def rec(rule):
+        if rule.name in seen:
+            return
+
+        for dep in rule.get_dependencies():
+            rec(rules[dep])
+
+        ret.append(rule)
+        seen.add(rule.name)
+
+    for rule in rules.values():
+        rec(rule)
+
+    return ret
+
+
+def match(rules, features, va):
+    '''
+    Args:
+      rules (List[capa.rules.Rule]): these must already be ordered topologically by dependency.
+      features (Mapping[capa.features.Feature, int]):
+      va (int): location of the features
+
+    Returns:
+      Tuple[List[capa.features.Feature], Dict[str, Tuple[int, capa.engine.Result]]]: two-tuple with entries:
+        - list of features used for matching (which may be greater than argument, due to rule match features), and
+        - mapping from rule name to (location of match, result object)
+    '''
+    results = collections.defaultdict(list)
+
+    # copy features so that we can modify it
+    # without affecting the caller (keep this function pure)
+    #
+    # note: copy doesn't notice this is a defaultdict, so we'll recreate that manually.
+    features = collections.defaultdict(set, copy.copy(features))
+
+    for rule in rules:
+        res = rule.evaluate(features)
+        if res:
+            results[rule.name].append((va, res))
+            features[capa.features.MatchedRule(rule.name)].add(va)
+
+    return (features, results)
--- a/capa/features/init.py
+++ b/capa/features/init.py
@@ -0,0 +1,113 @@
+import codecs
+import logging
+
+import capa.engine
+
+
+logger = logging.getLogger(__name__)
+MAX_BYTES_FEATURE_SIZE = 0x100
+
+
+class Feature(object):
+    def __init__(self, args):
+        super(Feature, self).__init__()
+        self.name = self.__class__.__name__
+        self.args = args
+
+    def __hash__(self):
+        return hash((self.name, tuple(self.args)))
+
+    def __eq__(self, other):
+        return self.name == other.name and self.args == other.args
+
+    def __str__(self):
+        return '%s(%s)' % (self.name.lower(), ','.join(self.args))
+
+    def __repr__(self):
+        return str(self)
+
+    def evaluate(self, ctx):
+        return capa.engine.Result(self in ctx, self, [], locations=ctx.get(self, []))
+
+    def serialize(self):
+        return self.__dict__
+
+    def freeze_serialize(self):
+        return (self.__class__.__name__,
+                self.args)
+
+    @classmethod
+    def freeze_deserialize(cls, args):
+        return cls(*args)
+
+
+class MatchedRule(Feature):
+    def __init__(self, rule_name):
+        super(MatchedRule, self).__init__([rule_name])
+        self.rule_name = rule_name
+
+    def __str__(self):
+        return 'match(%s)' % (self.rule_name)
+
+
+class Characteristic(Feature):
+    def __init__(self, name, value=None):
+        '''
+        when `value` is not provided, this serves as descriptor for a class of characteristics.
+        this is only used internally, such as in `rules.py` when checking if a statement is
+          supported by a given scope.
+        '''
+        super(Characteristic, self).__init__([name, value])
+        self.name = name
+        self.value = value
+
+    def evaluate(self, ctx):
+        if self.value is None:
+            raise ValueError('cannot evaluate characteristc %s with empty value' % (str(self)))
+        return super(Characteristic, self).evaluate(ctx)
+
+    def __str__(self):
+        if self.value is None:
+            return 'characteristic(%s)' % (self.name)
+        else:
+            return 'characteristic(%s(%s))' % (self.name, self.value)
+
+
+class String(Feature):
+    def __init__(self, value):
+        super(String, self).__init__([value])
+        self.value = value
+
+    def __str__(self):
+        return 'string("%s")' % (self.value)
+
+
+class Bytes(Feature):
+    def __init__(self, value, symbol=None):
+        super(Bytes, self).__init__([value])
+        self.value = value
+        self.symbol = symbol
+
+    def evaluate(self, ctx):
+        for feature, locations in ctx.items():
+            if not isinstance(feature, (capa.features.Bytes, )):
+                continue
+
+            if feature.value.startswith(self.value):
+                return capa.engine.Result(True, self, [], locations=locations)
+
+        return capa.engine.Result(False, self, [])
+
+    def __str__(self):
+        if self.symbol:
+            return 'bytes(0x%s = %s)' % (codecs.encode(self.value, 'hex').upper(), self.symbol)
+        else:
+            return 'bytes(0x%s)' % (codecs.encode(self.value, 'hex').upper())
+
+    def freeze_serialize(self):
+        return (self.__class__.__name__,
+                map(lambda x: codecs.encode(x, 'hex'), self.args))
+
+    @classmethod
+    def freeze_deserialize(cls, args):
+        return cls(*map(lambda x: codecs.decode(x, 'hex'), args))
--- a/capa/features/basicblock.py
+++ b/capa/features/basicblock.py
@@ -0,0 +1,9 @@
+from capa.features import Feature
+
+
+class BasicBlock(Feature):
+    def __init__(self):
+        super(BasicBlock, self).__init__([])
+
+    def __str__(self):
+        return 'basic block'
--- a/capa/features/extractors/init.py
+++ b/capa/features/extractors/init.py
@@ -0,0 +1,274 @@
+import abc
+
+try:
+    import ida
+except (ImportError, SyntaxError):
+    pass
+
+try:
+    import viv
+except (ImportError, SyntaxError):
+    pass
+
+__all__ = ["ida", "viv"]
+
+
+class FeatureExtractor(object):
+    '''
+    FeatureExtractor defines the interface for fetching features from a sample.
+
+    There may be multiple backends that support fetching features for capa.
+    For example, we use vivisect by default, but also want to support saving
+     and restoring features from a JSON file.
+    When we restore the features, we'd like to use exactly the same matching logic
+     to find matching rules.
+    Therefore, we can define a FeatureExtractor that provides features from the
+     serialized JSON file and do matching without a binary analysis pass.
+    Also, this provides a way to hook in an IDA backend.
+
+    This class is not instantiated directly; it is the base class for other implementations.
+    '''
+    __metaclass__ = abc.ABCMeta
+
+    def __init__(self):
+        #
+        # note: a subclass should define ctor parameters for its own use.
+        #  for example, the Vivisect feature extract might require the vw and/or path.
+        # this base class doesn't know what to do with that info, though.
+        #
+        super(FeatureExtractor, self).__init__()
+
+    @abc.abstractmethod
+    def extract_file_features(self):
+        '''
+        extract file-scope features.
+
+        example::
+
+            extractor = VivisectFeatureExtractor(vw, path)
+            for feature, va in extractor.get_file_features():
+                print('0x%x: %s', va, feature)
+
+        yields:
+          Tuple[capa.features.Feature, int]: feature and its location
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def get_functions(self):
+        '''
+        enumerate the functions and provide opaque values that will
+         subsequently be provided to `.extract_function_features()`, etc.
+
+        by "opaque value", we mean that this can be any object, as long as it
+         provides enough context to `.extract_function_features()`.
+
+        the opaque value should support casting to int (`__int__`) for the function start address.
+
+        yields:
+          any: the opaque function value.
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def extract_function_features(self, f):
+        '''
+        extract function-scope features.
+        the arguments are opaque values previously provided by `.get_functions()`, etc.
+
+        example::
+
+            extractor = VivisectFeatureExtractor(vw, path)
+            for function in extractor.get_functions():
+                for feature, va in extractor.extract_function_features(function):
+                    print('0x%x: %s', va, feature)
+
+        args:
+          f [any]: an opaque value previously fetched from `.get_functions()`.
+
+        yields:
+          Tuple[capa.features.Feature, int]: feature and its location
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def get_basic_blocks(self, f):
+        '''
+        enumerate the basic blocks in the given function and provide opaque values that will
+         subsequently be provided to `.extract_basic_block_features()`, etc.
+
+        by "opaque value", we mean that this can be any object, as long as it
+         provides enough context to `.extract_basic_block_features()`.
+
+        the opaque value should support casting to int (`__int__`) for the basic block start address.
+
+        yields:
+          any: the opaque basic block value.
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def extract_basic_block_features(self, f, bb):
+        '''
+        extract basic block-scope features.
+        the arguments are opaque values previously provided by `.get_functions()`, etc.
+
+        example::
+
+            extractor = VivisectFeatureExtractor(vw, path)
+            for function in extractor.get_functions():
+                for bb in extractor.get_basic_blocks(function):
+                    for feature, va in extractor.extract_basic_block_features(function, bb):
+                        print('0x%x: %s', va, feature)
+
+        args:
+          f [any]: an opaque value previously fetched from `.get_functions()`.
+          bb [any]: an opaque value previously fetched from `.get_basic_blocks()`.
+
+        yields:
+          Tuple[capa.features.Feature, int]: feature and its location
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def get_instructions(self, f, bb):
+        '''
+        enumerate the instructions in the given basic block and provide opaque values that will
+         subsequently be provided to `.extract_insn_features()`, etc.
+
+        by "opaque value", we mean that this can be any object, as long as it
+         provides enough context to `.extract_insn_features()`.
+
+        the opaque value should support casting to int (`__int__`) for the instruction address.
+
+        yields:
+          any: the opaque function value.
+        '''
+        raise NotImplemented
+
+    @abc.abstractmethod
+    def extract_insn_features(self, f, bb, insn):
+        '''
+        extract instruction-scope features.
+        the arguments are opaque values previously provided by `.get_functions()`, etc.
+
+        example::
+
+            extractor = VivisectFeatureExtractor(vw, path)
+            for function in extractor.get_functions():
+                for bb in extractor.get_basic_blocks(function):
+                    for insn in extractor.get_instructions(function, bb):
+                        for feature, va in extractor.extract_insn_features(function, bb, insn):
+                            print('0x%x: %s', va, feature)
+
+        args:
+          f [any]: an opaque value previously fetched from `.get_functions()`.
+          bb [any]: an opaque value previously fetched from `.get_basic_blocks()`.
+          insn [any]: an opaque value previously fetched from `.get_instructions()`.
+
+        yields:
+          Tuple[capa.features.Feature, int]: feature and its location
+        '''
+        raise NotImplemented
+
+
+class NullFeatureExtractor(FeatureExtractor):
+    '''
+    An extractor that extracts some user-provided features.
+    The structure of the single parameter is demonstrated in the example below.
+
+    This is useful for testing, as we can provide expected values and see if matching works.
+    Also, this is how we represent features deserialized from a freeze file.
+
+    example::
+
+        extractor = NullFeatureExtractor({
+            'file features': [
+                (0x402345, capa.features.Characteristic('embedded pe', True)),
+            ],
+            'functions': {
+                0x401000: {
+                    'features': [
+                        (0x401000, capa.features.Characteristic('switch', True)),
+                    ],
+                    'basic blocks': {
+                        0x401000: {
+                            'features': [
+                                (0x401000, capa.features.Characteristic('tight-loop', True)),
+                            ],
+                            'instructions': {
+                                0x401000: {
+                                    'features': [
+                                        (0x401000, capa.features.Characteristic('nzxor', True)),
+                                    ],
+                                },
+                                0x401002: ...
+                            }
+                        },
+                        0x401005: ...
+                    }
+                },
+                0x40200: ...
+            }
+        )
+    '''
+    def __init__(self, features):
+        super(NullFeatureExtractor, self).__init__()
+        self.features = features
+
+    def extract_file_features(self):
+        for p in self.features.get('file features', []):
+            va, feature = p
+            yield feature, va
+
+    def get_functions(self):
+        for va in sorted(self.features['functions'].keys()):
+            yield va
+
+    def extract_function_features(self, f):
+        for p in (self.features  # noqa: E127 line over-indented
+                            .get('functions', {})
+                            .get(f, {})
+                            .get('features', [])):
+            va, feature = p
+            yield feature, va
+
+    def get_basic_blocks(self, f):
+        for va in sorted(self.features  # noqa: E127 line over-indented
+                                 .get('functions', {})
+                                 .get(f, {})
+                                 .get('basic blocks', {})
+                                 .keys()):
+            yield va
+
+    def extract_basic_block_features(self, f, bb):
+        for p in (self.features  # noqa: E127 line over-indented
+                        .get('functions', {})
+                        .get(f, {})
+                        .get('basic blocks', {})
+                        .get(bb, {})
+                        .get('features', [])):
+            va, feature = p
+            yield feature, va
+
+    def get_instructions(self, f, bb):
+        for va in sorted(self.features  # noqa: E127 line over-indented
+                         .get('functions', {})
+                         .get(f, {})
+                         .get('basic blocks', {})
+                         .get(bb, {})
+                         .get('instructions', {})
+                         .keys()):
+            yield va
+
+    def extract_insn_features(self, f, bb, insn):
+        for p in (self.features  # noqa: E127 line over-indented
+                            .get('functions', {})
+                            .get(f, {})
+                            .get('basic blocks', {})
+                            .get(bb, {})
+                            .get('instructions', {})
+                            .get(insn, {})
+                            .get('features', [])):
+            va, feature = p
+            yield feature, va
--- a/capa/features/extractors/helpers.py
+++ b/capa/features/extractors/helpers.py
@@ -0,0 +1,61 @@
+import sys
+import builtins
+
+from capa.features.insn import API
+
+MIN_STACKSTRING_LEN = 8
+
+
+def xor_static(data, i):
+    if sys.version_info >= (3, 0):
+        return bytes(c ^ i for c in data)
+    else:
+        return ''.join(chr(ord(c) ^ i) for c in data)
+
+
+def is_aw_function(function_name):
+    '''
+    is the given function name an A/W function?
+    these are variants of functions that, on Windows, accept either a narrow or wide string.
+    '''
+    if len(function_name) < 2:
+        return False
+
+    # last character should be 'A' or 'W'
+    if function_name[-1] not in ('A', 'W'):
+        return False
+
+    # second to last character should be lowercase letter
+    return 'a' <= function_name[-2] <= 'z' or '0' <= function_name[-2] <= '9'
+
+
+def generate_api_features(apiname, va):
+    '''
+    for a given function name and address, generate API names.
+    we over-generate features to make matching easier.
+    these include:
+      - kernel32.CreateFileA
+      - kernel32.CreateFile
+      - CreateFileA
+      - CreateFile
+    '''
+    # (kernel32.CreateFileA, 0x401000)
+    yield API(apiname), va
+
+    if is_aw_function(apiname):
+        # (kernel32.CreateFile, 0x401000)
+        yield API(apiname[:-1]), va
+
+    if '.' in apiname:
+        modname, impname = apiname.split('.')
+        # strip modname to support importname-only matching
+        # (CreateFileA, 0x401000)
+        yield API(impname), va
+
+        if is_aw_function(impname):
+            # (CreateFile, 0x401000)
+            yield API(impname[:-1]), va
+
+
+def all_zeros(bytez):
+    return all(b == 0 for b in builtins.bytes(bytez))
--- a/capa/features/extractors/ida/init.py
+++ b/capa/features/extractors/ida/init.py
@@ -0,0 +1,73 @@
+import sys
+import types
+
+import idaapi
+
+from capa.features.extractors import FeatureExtractor
+
+import capa.features.extractors.ida.file
+import capa.features.extractors.ida.insn
+import capa.features.extractors.ida.helpers
+import capa.features.extractors.ida.function
+import capa.features.extractors.ida.basicblock
+
+
+def get_va(self):
+    if isinstance(self, idaapi.BasicBlock):
+        return self.start_ea
+
+    if isinstance(self, idaapi.func_t):
+        return self.start_ea
+
+    if isinstance(self, idaapi.insn_t):
+        return self.ea
+
+    raise TypeError
+
+
+def add_va_int_cast(o):
+    '''
+    dynamically add a cast-to-int (`__int__`) method to the given object
+    that returns the value of the `.va` property.
+    this bit of skullduggery lets use cast viv-utils objects as ints.
+    the correct way of doing this is to update viv-utils (or subclass the objects here).
+    '''
+
+    if sys.version_info >= (3, 0):
+        setattr(o, '__int__', types.MethodType(get_va, o))
+    else:
+        setattr(o, '__int__', types.MethodType(get_va, o, type(o)))
+    return o
+
+
+class IdaFeatureExtractor(FeatureExtractor):
+    def __init__(self):
+        super(IdaFeatureExtractor, self).__init__()
+
+    def extract_file_features(self):
+        for feature, va in capa.features.extractors.ida.file.extract_features():
+            yield feature, va
+
+    def get_functions(self):
+        for f in capa.features.extractors.ida.helpers.get_functions(ignore_thunks=True, ignore_libs=True):
+            yield add_va_int_cast(f)
+
+    def extract_function_features(self, f):
+        for feature, va in capa.features.extractors.ida.function.extract_features(f):
+            yield feature, va
+
+    def get_basic_blocks(self, f):
+        for bb in idaapi.FlowChart(f, flags=idaapi.FC_PREDS):
+            yield add_va_int_cast(bb)
+
+    def extract_basic_block_features(self, f, bb):
+        for feature, va in capa.features.extractors.ida.basicblock.extract_features(f, bb):
+            yield feature, va
+
+    def get_instructions(self, f, bb):
+        for insn in capa.features.extractors.ida.helpers.get_instructions_in_range(bb.start_ea, bb.end_ea):
+            yield add_va_int_cast(insn)
+
+    def extract_insn_features(self, f, bb, insn):
+        for feature, va in capa.features.extractors.ida.insn.extract_features(f, bb, insn):
+            yield feature, va
--- a/capa/features/extractors/ida/basicblock.py
+++ b/capa/features/extractors/ida/basicblock.py
@@ -0,0 +1,170 @@
+import sys
+import struct
+import string
+import pprint
+
+import idautils
+import idaapi
+import idc
+
+from capa.features.extractors.ida import helpers
+
+from capa.features import Characteristic
+from capa.features.basicblock import BasicBlock
+from capa.features.extractors.helpers import MIN_STACKSTRING_LEN
+
+
+def _ida_get_printable_len(op):
+    ''' Return string length if all operand bytes are ascii or utf16-le printable
+
+        args:
+            op (IDA op_t)
+    '''
+    op_val = helpers.mask_op_val(op)
+
+    if op.dtype == idaapi.dt_byte:
+        chars = struct.pack('<B', op_val)
+    elif op.dtype == idaapi.dt_word:
+        chars = struct.pack('<H', op_val)
+    elif op.dtype == idaapi.dt_dword:
+        chars = struct.pack('<I', op_val)
+    elif op.dtype == idaapi.dt_qword:
+        chars = struct.pack('<Q', op_val)
+    else:
+        raise ValueError('Unhandled operand data type 0x%x.' % op.dtype)
+
+    def _is_printable_ascii(chars):
+        if sys.version_info >= (3, 0):
+            return all(c < 127 and chr(c) in string.printable for c in chars)
+        else:
+            return all(ord(c) < 127 and c in string.printable for c in chars)
+
+    def _is_printable_utf16le(chars):
+        if sys.version_info >= (3, 0):
+            if all(c == 0x00 for c in chars[1::2]):
+                return _is_printable_ascii(chars[::2])
+        else:
+            if all(c == '\x00' for c in chars[1::2]):
+                return _is_printable_ascii(chars[::2])
+
+    if _is_printable_ascii(chars):
+        return idaapi.get_dtype_size(op.dtype)
+
+    if _is_printable_utf16le(chars):
+        return idaapi.get_dtype_size(op.dtype) / 2
+
+    return 0
+
+
+def _is_mov_imm_to_stack(insn):
+    ''' verify instruction moves immediate onto stack
+
+        args:
+            insn (IDA insn_t)
+    '''
+    if insn.Op2.type != idaapi.o_imm:
+        return False
+
+    if not helpers.is_op_stack_var(insn.ea, 0):
+        return False
+
+    if not insn.get_canon_mnem().startswith('mov'):
+        return False
+
+    return True
+
+
+def _ida_bb_contains_stackstring(f, bb):
+    ''' check basic block for stackstring indicators
+
+        true if basic block contains enough moves of constant bytes to the stack
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+    '''
+    count = 0
+
+    for insn in helpers.get_instructions_in_range(bb.start_ea, bb.end_ea):
+        if _is_mov_imm_to_stack(insn):
+            count += _ida_get_printable_len(insn.Op2)
+
+        if count > MIN_STACKSTRING_LEN:
+            return True
+
+    return False
+
+
+def extract_bb_stackstring(f, bb):
+    ''' extract stackstring indicators from basic block
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+    '''
+    if _ida_bb_contains_stackstring(f, bb):
+        yield Characteristic('stack string', True), bb.start_ea
+
+
+def _ida_bb_contains_tight_loop(f, bb):
+    ''' check basic block for stackstring indicators
+
+        true if last instruction in basic block branches to basic block start
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+    '''
+    bb_end = idc.prev_head(bb.end_ea)
+
+    if bb.start_ea < bb_end:
+        for ref in idautils.CodeRefsFrom(bb_end, True):
+            if ref == bb.start_ea:
+                return True
+
+    return False
+
+
+def extract_bb_tight_loop(f, bb):
+    ''' extract tight loop indicators from a basic block
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+    '''
+    if _ida_bb_contains_tight_loop(f, bb):
+        yield Characteristic('tight loop', True), bb.start_ea
+
+
+def extract_features(f, bb):
+    ''' extract basic block features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+    '''
+    yield BasicBlock(), bb.start_ea
+
+    for bb_handler in BASIC_BLOCK_HANDLERS:
+        for feature, va in bb_handler(f, bb):
+            yield feature, va
+
+
+BASIC_BLOCK_HANDLERS = (
+    extract_bb_tight_loop,
+    extract_bb_stackstring,
+)
+
+
+def main():
+    features = []
+
+    for f in helpers.get_functions(ignore_thunks=True, ignore_libs=True):
+        for bb in idaapi.FlowChart(f, flags=idaapi.FC_PREDS):
+            features.extend(list(extract_features(f, bb)))
+
+    pprint.pprint(features)
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/features/extractors/ida/file.py
+++ b/capa/features/extractors/ida/file.py
@@ -0,0 +1,155 @@
+import struct
+import pprint
+
+import idautils
+import idaapi
+import idc
+
+from capa.features import String
+from capa.features import Characteristic
+from capa.features.file import Section
+from capa.features.file import Export
+from capa.features.file import Import
+import capa.features.extractors.strings
+import capa.features.extractors.helpers
+import capa.features.extractors.ida.helpers
+
+
+def _ida_check_segment_for_pe(seg):
+    ''' check segment for embedded PE
+
+        adapted for IDA from:
+        https://github.com/vivisect/vivisect/blob/7be4037b1cecc4551b397f840405a1fc606f9b53/PE/carve.py#L19
+
+        args:
+            seg (IDA segment_t)
+    '''
+    seg_max = seg.end_ea
+    mz_xor = [(capa.features.extractors.helpers.xor_static(b'MZ', i),
+               capa.features.extractors.helpers.xor_static(b'PE', i),
+               i)
+              for i in range(256)]
+    todo = [(capa.features.extractors.ida.helpers.find_byte_sequence(seg.start_ea, seg.end_ea, mzx), mzx, pex, i) for mzx, pex, i in mz_xor]
+    todo = [(off, mzx, pex, i) for (off, mzx, pex, i) in todo if off != idaapi.BADADDR]
+
+    while len(todo):
+        off, mzx, pex, i = todo.pop()
+
+        # The MZ header has one field we will check e_lfanew is at 0x3c
+        e_lfanew = off + 0x3c
+
+        if seg_max < (e_lfanew + 4):
+            continue
+
+        newoff = struct.unpack('<I', capa.features.extractors.helpers.xor_static(idc.get_bytes(e_lfanew, 4), i))[0]
+
+        peoff = off + newoff
+        if seg_max < (peoff + 2):
+            continue
+
+        if idc.get_bytes(peoff, 2) == pex:
+            yield (off, i)
+
+        nextres = capa.features.extractors.ida.helpers.find_byte_sequence(off + 1, seg.end_ea, mzx)
+        if nextres != -1:
+            todo.append((nextres, mzx, pex, i))
+
+
+def extract_file_embedded_pe():
+    ''' extract embedded PE features
+
+        IDA must load resource sections for this to be complete
+            - '-R' from console
+            - Check 'Load resource sections' when opening binary in IDA manually
+    '''
+    for seg in capa.features.extractors.ida.helpers.get_segments():
+        if seg.is_header_segm():
+            # IDA may load header segments, skip if present
+            continue
+
+        for ea, _ in _ida_check_segment_for_pe(seg):
+            yield Characteristic('embedded pe', True), ea
+
+
+def extract_file_export_names():
+    ''' extract function exports '''
+    for _, _, ea, name in idautils.Entries():
+        yield Export(name), ea
+
+
+def extract_file_import_names():
+    ''' extract function imports
+
+        1. imports by ordinal:
+         - modulename.#ordinal
+
+        2. imports by name, results in two features to support importname-only
+           matching:
+         - modulename.importname
+         - importname
+    '''
+    for ea, imp_info in capa.features.extractors.ida.helpers.get_file_imports().items():
+        dllname, name, ordi = imp_info
+
+        if name:
+            yield Import('%s.%s' % (dllname, name)), ea
+            yield Import(name), ea
+
+        if ordi:
+            yield Import('%s.#%s' % (dllname, str(ordi))), ea
+
+
+def extract_file_section_names():
+    ''' extract section names
+
+        IDA must load resource sections for this to be complete
+            - '-R' from console
+            - Check 'Load resource sections' when opening binary in IDA manually
+    '''
+    for seg in capa.features.extractors.ida.helpers.get_segments():
+        if seg.is_header_segm():
+            # IDA may load header segments, skip if present
+            continue
+
+        yield Section(idaapi.get_segm_name(seg)), seg.start_ea
+
+
+def extract_file_strings():
+    ''' extract ASCII and UTF-16 LE strings
+
+        IDA must load resource sections for this to be complete
+            - '-R' from console
+            - Check 'Load resource sections' when opening binary in IDA manually
+    '''
+    for seg in capa.features.extractors.ida.helpers.get_segments():
+        seg_buff = capa.features.extractors.ida.helpers.get_segment_buffer(seg)
+
+        for s in capa.features.extractors.strings.extract_ascii_strings(seg_buff):
+            yield String(s.s), (seg.start_ea + s.offset)
+
+        for s in capa.features.extractors.strings.extract_unicode_strings(seg_buff):
+            yield String(s.s), (seg.start_ea + s.offset)
+
+
+def extract_features():
+    ''' extract file features '''
+    for file_handler in FILE_HANDLERS:
+        for feature, va in file_handler():
+            yield feature, va
+
+
+FILE_HANDLERS = (
+    extract_file_export_names,
+    extract_file_import_names,
+    extract_file_strings,
+    extract_file_section_names,
+    extract_file_embedded_pe,
+)
+
+
+def main():
+    pprint.pprint(list(extract_features()))
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/features/extractors/ida/function.py
+++ b/capa/features/extractors/ida/function.py
@@ -0,0 +1,100 @@
+import idautils
+import idaapi
+
+from capa.features import Characteristic
+from capa.features.extractors import loops
+
+
+def _ida_function_contains_switch(f):
+    ''' check a function for switch statement indicators
+
+        adapted from:
+        https://reverseengineering.stackexchange.com/questions/17548/calc-switch-cases-in-idapython-cant-iterate-over-results?rq=1
+
+        arg:
+            f (IDA func_t)
+    '''
+    for start, end in idautils.Chunks(f.start_ea):
+        for head in idautils.Heads(start, end):
+            if idaapi.get_switch_info(head):
+                return True
+
+    return False
+
+
+def extract_function_switch(f):
+    ''' extract switch indicators from a function
+
+        arg:
+            f (IDA func_t)
+    '''
+    if _ida_function_contains_switch(f):
+        yield Characteristic('switch', True), f.start_ea
+
+
+def extract_function_calls_to(f):
+    ''' extract callers to a function
+
+        args:
+            f (IDA func_t)
+    '''
+    for ea in idautils.CodeRefsTo(f.start_ea, True):
+        yield Characteristic('calls to', True), ea
+
+
+def extract_function_loop(f):
+    ''' extract loop indicators from a function
+
+        args:
+            f (IDA func_t)
+    '''
+    edges = []
+    for bb in idaapi.FlowChart(f):
+        map(lambda s: edges.append((bb.start_ea, s.start_ea)), bb.succs())
+
+    if edges and loops.has_loop(edges):
+        yield Characteristic('loop', True), f.start_ea
+
+
+def extract_recursive_call(f):
+    ''' extract recursive function call
+
+        args:
+            f (IDA func_t)
+    '''
+    for ref in idautils.CodeRefsTo(f.start_ea, True):
+        if f.contains(ref):
+            yield Characteristic('recursive call', True), f.start_ea
+            break
+
+
+def extract_features(f):
+    ''' extract function features
+
+        arg:
+            f (IDA func_t)
+    '''
+    for func_handler in FUNCTION_HANDLERS:
+        for feature, va in func_handler(f):
+            yield feature, va
+
+
+FUNCTION_HANDLERS = (
+    extract_function_calls_to,
+    extract_function_switch,
+    extract_function_loop,
+    extract_recursive_call
+)
+
+
+def main():
+    features = []
+
+    for f in helpers.get_functions(ignore_thunks=True, ignore_libs=True):
+        features.extend(list(extract_features(f)))
+
+    pprint.pprint(features)
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/features/extractors/ida/helpers.py
+++ b/capa/features/extractors/ida/helpers.py
@@ -0,0 +1,298 @@
+import sys
+import string
+
+import idautils
+import idaapi
+import idc
+
+
+def find_byte_sequence(start, end, seq):
+    ''' find byte sequence
+
+        args:
+            start: min virtual address
+            end: max virtual address
+            seq: bytes to search e.g. b'\x01\x03'
+    '''
+    if sys.version_info >= (3, 0):
+        return idaapi.find_binary(start, end, ' '.join(['%02x' % b for b in seq]), 0, idaapi.SEARCH_DOWN)
+    else:
+        return idaapi.find_binary(start, end, ' '.join(['%02x' % ord(b) for b in seq]), 0, idaapi.SEARCH_DOWN)
+
+
+def get_functions(start=None, end=None, ignore_thunks=False, ignore_libs=False):
+    ''' get functions, range optional
+
+        args:
+            start: min virtual address
+            end: max virtual address
+
+        ret:
+            yield func_t*
+    '''
+    for ea in idautils.Functions(start=start, end=end):
+        f = idaapi.get_func(ea)
+
+        if ignore_thunks and f.flags & idaapi.FUNC_THUNK:
+            continue
+
+        if ignore_libs and f.flags & idaapi.FUNC_LIB:
+            continue
+
+        yield f
+
+
+def get_segments():
+    ''' Get list of segments (sections) in the binary image '''
+    for n in range(idaapi.get_segm_qty()):
+        seg = idaapi.getnseg(n)
+        if seg:
+            yield seg
+
+
+def get_segment_buffer(seg):
+    ''' return bytes stored in a given segment
+
+        decrease buffer size until IDA is able to read bytes from the segment
+    '''
+    buff = b''
+    sz = seg.end_ea - seg.start_ea
+
+    while sz > 0:
+        buff = idaapi.get_bytes(seg.start_ea, sz)
+        if buff:
+            break
+        sz -= 0x1000
+
+    # IDA returns None if get_bytes fails, so convert for consistent return type
+    return buff if buff else b''
+
+
+def get_file_imports():
+    ''' get file imports '''
+    _imports = {}
+
+    for idx in range(idaapi.get_import_module_qty()):
+        dllname = idaapi.get_import_module_name(idx)
+
+        if not dllname:
+            continue
+
+        def _inspect_import(ea, name, ordi):
+            if name and name.startswith('__imp_'):
+                # handle mangled names starting
+                name = name[len('__imp_'):]
+            _imports[ea] = (dllname.lower(), name, ordi)
+            return True
+
+        idaapi.enum_import_names(idx, _inspect_import)
+
+    return _imports
+
+
+def get_instructions_in_range(start, end):
+    ''' yield instructions in range
+
+        args:
+            start: virtual address (inclusive)
+            end: virtual address (exclusive)
+        yield:
+            (insn_t*)
+    '''
+    for head in idautils.Heads(start, end):
+        inst = idautils.DecodeInstruction(head)
+        if inst:
+            yield inst
+
+
+def is_operand_equal(op1, op2):
+    ''' compare two IDA op_t '''
+    if op1.flags != op2.flags:
+        return False
+
+    if op1.dtype != op2.dtype:
+        return False
+
+    if op1.type != op2.type:
+        return False
+
+    if op1.reg != op2.reg:
+        return False
+
+    if op1.phrase != op2.phrase:
+        return False
+
+    if op1.value != op2.value:
+        return False
+
+    if op1.addr != op2.addr:
+        return False
+
+    return True
+
+
+def is_basic_block_equal(bb1, bb2):
+    ''' compare two IDA BasicBlock '''
+    return bb1.start_ea == bb2.start_ea \
+        and bb1.end_ea == bb2.end_ea \
+        and bb1.type == bb2.type
+
+
+def basic_block_size(bb):
+    ''' calculate size of basic block '''
+    return bb.end_ea - bb.start_ea
+
+
+def read_bytes_at(ea, count):
+    segm_end = idc.get_segm_end(ea)
+    if ea + count > segm_end:
+        return idc.get_bytes(ea, segm_end - ea)
+    else:
+        return idc.get_bytes(ea, count)
+
+
+def find_string_at(ea, min=4):
+    ''' check if ASCII string exists at a given virtual address '''
+    found = idaapi.get_strlit_contents(ea, -1, idaapi.STRTYPE_C)
+    if found and len(found) > min:
+        try:
+            found = found.decode('ascii')
+            # hacky check for IDA bug; get_strlit_contents also reads Unicode as
+            # myy__uunniiccoodde when searching in ASCII mode so we check for that here
+            # and return the fixed up value
+            if len(found) >= 3 and found[1::2] == found[2::2]:
+                found = found[0] + found[1::2]
+            return found
+        except UnicodeDecodeError:
+            pass
+    return None
+
+
+def get_op_phrase_info(op):
+    ''' parse phrase features from operand
+
+        Pretty much dup of sark's implementation:
+            https://github.com/tmr232/Sark/blob/master/sark/code/instruction.py#L28-L73
+    '''
+    if op.type not in (idaapi.o_phrase, idaapi.o_displ):
+        return
+
+    scale = 1 << ((op.specflag2 & 0xC0) >> 6)
+    offset = op.addr
+
+    if op.specflag1 == 0:
+        index = None
+        base = op.reg
+    elif op.specflag1 == 1:
+        index = (op.specflag2 & 0x38) >> 3
+        base = (op.specflag2 & 0x07) >> 0
+
+        if op.reg == 0xC:
+            if base & 4:
+                base += 8
+            if index & 4:
+                index += 8
+    else:
+        return
+
+    if (index == base == idautils.procregs.sp.reg) and (scale == 1):
+        # HACK: This is a really ugly hack. For some reason, phrases of the form `[esp + ...]` (`sp`, `rsp` as well)
+        # set both the `index` and the `base` to `esp`. This is not significant, as `esp` cannot be used as an
+        # index, but it does cause issues with the parsing.
+        # This is only relevant to Intel architectures.
+        index = None
+
+    return {'base': base, 'index': index, 'scale': scale, 'offset': offset}
+
+
+def is_op_write(insn, op):
+    ''' Check if an operand is written to (destination operand) '''
+    return idaapi.has_cf_chg(insn.get_canon_feature(), op.n)
+
+
+def is_op_read(insn, op):
+    ''' Check if an operand is read from (source operand) '''
+    return idaapi.has_cf_use(insn.get_canon_feature(), op.n)
+
+
+def is_sp_modified(insn):
+    ''' determine if instruction modifies SP, ESP, RSP '''
+    for op in get_insn_ops(insn, op_type=(idaapi.o_reg,)):
+        if op.reg != idautils.procregs.sp.reg:
+            continue
+
+        if is_op_write(insn, op):
+            return True
+
+    return False
+
+
+def is_bp_modified(insn):
+    ''' check if instruction modifies BP, EBP, RBP '''
+    for op in get_insn_ops(insn, op_type=(idaapi.o_reg,)):
+        if op.reg != idautils.procregs.bp.reg:
+            continue
+
+        if is_op_write(insn, op):
+            return True
+
+    return False
+
+
+def is_frame_register(reg):
+    ''' check if register is sp or bp '''
+    return reg in (idautils.procregs.sp.reg, idautils.procregs.bp.reg)
+
+
+def get_insn_ops(insn, op_type=None):
+    ''' yield op_t for instruction, filter on type if specified '''
+    for op in insn.ops:
+        if op.type == idaapi.o_void:
+            # avoid looping all 6 ops if only subset exists
+            break
+
+        if op_type and op.type not in op_type:
+            continue
+
+        yield op
+
+
+def ea_flags(ea):
+    ''' retrieve processor flags for a given address '''
+    return idaapi.get_flags(ea)
+
+
+def is_op_stack_var(ea, n):
+    ''' check if operand is a stack variable '''
+    return idaapi.is_stkvar(ea_flags(ea), n)
+
+
+def mask_op_val(op):
+    ''' mask off a value based on data type
+
+        necesssary due to a bug in 64-bit
+
+        Example:
+            .rsrc:0054C12C mov [ebp+var_4], 0FFFFFFFFh
+
+            insn.Op2.dtype == idaapi.dt_dword
+            insn.Op2.value == 0xffffffffffffffff
+    '''
+    masks = {
+        idaapi.dt_byte: 0xFF,
+        idaapi.dt_word: 0xFFFF,
+        idaapi.dt_dword: 0xFFFFFFFF,
+        idaapi.dt_qword: 0xFFFFFFFFFFFFFFFF
+    }
+
+    mask = masks.get(op.dtype, None)
+
+    if not mask:
+        raise ValueError('No support for operand data type 0x%x' % op.dtype)
+
+    return mask & op.value
+
+
+def ea_to_offset(ea):
+    ''' convert virtual address to file offset '''
+    return idaapi.get_fileregion_offset(ea)
--- a/capa/features/extractors/ida/insn.py
+++ b/capa/features/extractors/ida/insn.py
@@ -0,0 +1,420 @@
+import pprint
+
+import idautils
+import idaapi
+import idc
+
+from capa.features import String
+from capa.features import Bytes
+from capa.features import Characteristic
+from capa.features import MAX_BYTES_FEATURE_SIZE
+from capa.features.insn import Number
+from capa.features.insn import Offset
+from capa.features.insn import Mnemonic
+import capa.features.extractors.helpers
+import capa.features.extractors.ida.helpers
+
+
+_file_imports_cache = None
+
+
+def get_imports():
+    global _file_imports_cache
+    if _file_imports_cache is None:
+        _file_imports_cache = capa.features.extractors.ida.helpers.get_file_imports()
+    return _file_imports_cache
+
+
+def _check_for_api_call(insn):
+    ''' check instruction for API call '''
+    if not idaapi.is_call_insn(insn):
+        return
+
+    for call_ref in idautils.CodeRefsFrom(insn.ea, False):
+        imp = get_imports().get(call_ref, None)
+
+        if imp:
+            yield '%s.%s' % (imp[0], imp[1])
+        else:
+            f = idaapi.get_func(call_ref)
+
+            if f and f.flags & idaapi.FUNC_THUNK:
+                # check if call to thunk
+                # TODO: first instruction might not always be the thunk
+                for thunk_ref in idautils.DataRefsFrom(call_ref):
+                    # TODO: always data ref for thunk??
+                    imp = get_imports().get(thunk_ref, None)
+
+                    if imp:
+                        yield '%s.%s' % (imp[0], imp[1])
+
+
+def extract_insn_api_features(f, bb, insn):
+    ''' parse instruction API features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+
+        example:
+            call dword [0x00473038]
+    '''
+    for api_name in _check_for_api_call(insn):
+        for feature, va in capa.features.extractors.helpers.generate_api_features(api_name, insn.ea):
+            yield feature, va
+
+
+def extract_insn_number_features(f, bb, insn):
+    ''' parse instruction number features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+
+        example:
+            push    3136B0h         ; dwControlCode
+    '''
+    if idaapi.is_ret_insn(insn):
+        # skip things like:
+        #   .text:0042250E retn 8
+        return
+
+    if capa.features.extractors.ida.helpers.is_sp_modified(insn):
+        # skip things like:
+        #   .text:00401145 add esp, 0Ch
+        return
+
+    for op in capa.features.extractors.ida.helpers.get_insn_ops(insn, op_type=(idaapi.o_imm,)):
+        op_val = capa.features.extractors.ida.helpers.mask_op_val(op)
+
+        if idaapi.is_mapped(op_val):
+            # assume valid address is not a constant
+            continue
+
+        yield Number(op_val), insn.ea
+
+
+def extract_insn_bytes_features(f, bb, insn):
+    ''' parse referenced byte sequences
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+
+        example:
+            push    offset iid_004118d4_IShellLinkA ; riid
+    '''
+    if idaapi.is_call_insn(insn):
+        # ignore call instructions
+        return
+
+    for ref in idautils.DataRefsFrom(insn.ea):
+        extracted_bytes = capa.features.extractors.ida.helpers.read_bytes_at(ref, MAX_BYTES_FEATURE_SIZE)
+        if extracted_bytes:
+            if not capa.features.extractors.helpers.all_zeros(extracted_bytes):
+                yield Bytes(extracted_bytes), insn.ea
+
+
+def extract_insn_string_features(f, bb, insn):
+    ''' parse instruction string features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+
+        example:
+            push offset aAcr     ; "ACR  > "
+    '''
+    for ref in idautils.DataRefsFrom(insn.ea):
+        found = capa.features.extractors.ida.helpers.find_string_at(ref)
+        if found:
+            yield String(found), insn.ea
+
+
+def extract_insn_offset_features(f, bb, insn):
+    ''' parse instruction structure offset features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+
+        example:
+            .text:0040112F cmp [esi+4], ebx
+    '''
+    for op in capa.features.extractors.ida.helpers.get_insn_ops(insn, op_type=(idaapi.o_phrase, idaapi.o_displ)):
+        if capa.features.extractors.ida.helpers.is_op_stack_var(insn.ea, op.n):
+            # skip stack offsets
+            continue
+
+        p_info = capa.features.extractors.ida.helpers.get_op_phrase_info(op)
+
+        if not p_info:
+            continue
+
+        op_off = p_info['offset']
+
+        if 0 == op_off:
+            # TODO: Do we want to record offset of zero?
+            continue
+
+        if idaapi.is_mapped(op_off):
+            # Ignore:
+            #   mov esi, dword_1005B148[esi]
+            continue
+
+        # TODO: Do we handle two's complement?
+        yield Offset(op_off), insn.ea
+
+
+def _contains_stack_cookie_keywords(s):
+    ''' check if string contains stack cookie keywords
+
+        Examples:
+            xor     ecx, ebp ; StackCookie
+
+            mov     eax, ___security_cookie
+    '''
+    if not s:
+        return False
+
+    s = s.strip().lower()
+
+    if 'cookie' not in s:
+        return False
+
+    return any(keyword in s for keyword in ('stack', 'security'))
+
+
+def _bb_stack_cookie_registers(bb):
+    ''' scan basic block for stack cookie operations
+
+        yield registers ids that may have been used for stack cookie operations
+
+        assume instruction that sets stack cookie and nzxor exist in same block
+        and stack cookie register is not modified prior to nzxor
+
+        Example:
+            .text:004062DA mov     eax, ___security_cookie <-- stack cookie
+            .text:004062DF mov     ecx, eax
+            .text:004062E1 mov     ebx, [esi]
+            .text:004062E3 and     ecx, 1Fh
+            .text:004062E6 mov     edi, [esi+4]
+            .text:004062E9 xor     ebx, eax
+            .text:004062EB mov     esi, [esi+8]
+            .text:004062EE xor     edi, eax <-- ignore
+            .text:004062F0 xor     esi, eax <-- ignore
+            .text:004062F2 ror     edi, cl
+            .text:004062F4 ror     esi, cl
+            .text:004062F6 ror     ebx, cl
+            .text:004062F8 cmp     edi, esi
+            .text:004062FA jnz     loc_40639D
+
+        TODO: this is expensive, but necessary?...
+    '''
+    for insn in capa.features.extractors.ida.helpers.get_instructions_in_range(bb.start_ea, bb.end_ea):
+        if _contains_stack_cookie_keywords(idc.GetDisasm(insn.ea)):
+            for op in capa.features.extractors.ida.helpers.get_insn_ops(insn, op_type=(idaapi.o_reg,)):
+                if capa.features.extractors.ida.helpers.is_op_write(insn, op):
+                    # only include modified registers
+                    yield op.reg
+
+
+def _is_nzxor_stack_cookie(f, bb, insn):
+    ''' check if nzxor is related to stack cookie '''
+    if _contains_stack_cookie_keywords(idaapi.get_cmt(insn.ea, False)):
+        # Example:
+        #   xor     ecx, ebp        ; StackCookie
+        return True
+
+    if any(op_reg in _bb_stack_cookie_registers(bb) for op_reg in (insn.Op1.reg, insn.Op2.reg)):
+        # Example:
+        #   mov     eax, ___security_cookie
+        #   xor     eax, ebp
+        return True
+
+    return False
+
+
+def extract_insn_nzxor_characteristic_features(f, bb, insn):
+    ''' parse instruction non-zeroing XOR instruction
+
+        ignore expected non-zeroing XORs, e.g. security cookies
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    if insn.itype != idaapi.NN_xor:
+        return
+
+    if capa.features.extractors.ida.helpers.is_operand_equal(insn.Op1, insn.Op2):
+        return
+
+    if _is_nzxor_stack_cookie(f, bb, insn):
+        return
+
+    yield Characteristic('nzxor', True), insn.ea
+
+
+def extract_insn_mnemonic_features(f, bb, insn):
+    ''' parse instruction mnemonic features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    yield Mnemonic(insn.get_canon_mnem()), insn.ea
+
+
+def extract_insn_peb_access_characteristic_features(f, bb, insn):
+    ''' parse instruction peb access
+
+        fs:[0x30] on x86, gs:[0x60] on x64
+
+        TODO:
+            IDA should be able to do this..
+    '''
+    if insn.itype not in (idaapi.NN_push, idaapi.NN_mov):
+        return
+
+    if all(map(lambda op: op.type != idaapi.o_mem, insn.ops)):
+        # try to optimize for only memory referencese
+        return
+
+    disasm = idc.GetDisasm(insn.ea)
+
+    if ' fs:30h' in disasm or ' gs:60h' in disasm:
+        # TODO: replace above with proper IDA
+        yield Characteristic('peb access', True), insn.ea
+
+
+def extract_insn_segment_access_features(f, bb, insn):
+    ''' parse instruction fs or gs access
+
+        TODO:
+            IDA should be able to do this...
+    '''
+    if all(map(lambda op: op.type != idaapi.o_mem, insn.ops)):
+        # try to optimize for only memory referencese
+        return
+
+    disasm = idc.GetDisasm(insn.ea)
+
+    if ' fs:' in disasm:
+        # TODO: replace above with proper IDA
+        yield Characteristic('fs access', True), insn.ea
+
+    if ' gs:' in disasm:
+        # TODO: replace above with proper IDA
+        yield Characteristic('gs access', True), insn.ea
+
+
+def extract_insn_cross_section_cflow(f, bb, insn):
+    ''' inspect the instruction for a CALL or JMP that crosses section boundaries
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    for ref in idautils.CodeRefsFrom(insn.ea, False):
+        if ref in get_imports().keys():
+            # ignore API calls
+            continue
+
+        if not idaapi.getseg(ref):
+            # handle IDA API bug
+            continue
+
+        if idaapi.getseg(ref) == idaapi.getseg(insn.ea):
+            continue
+
+        yield Characteristic('cross section flow', True), insn.ea
+
+
+def extract_function_calls_from(f, bb, insn):
+    ''' extract functions calls from features
+
+        most relevant at the function scope, however, its most efficient to extract at the instruction scope
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    if not idaapi.is_call_insn(insn):
+        # ignore jmp, etc.
+        return
+
+    for ref in idautils.CodeRefsFrom(insn.ea, False):
+        yield Characteristic('calls from', True), ref
+
+
+def extract_function_indirect_call_characteristic_features(f, bb, insn):
+    ''' extract indirect function calls (e.g., call eax or call dword ptr [edx+4])
+        does not include calls like => call ds:dword_ABD4974
+
+        most relevant at the function or basic block scope;
+        however, its most efficient to extract at the instruction scope
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    if not idaapi.is_call_insn(insn):
+        return
+
+    if idc.get_operand_type(insn.ea, 0) in (idc.o_reg, idc.o_phrase, idc.o_displ):
+        yield Characteristic('indirect call', True), insn.ea
+
+
+def extract_features(f, bb, insn):
+    ''' extract instruction features
+
+        args:
+            f (IDA func_t)
+            bb (IDA BasicBlock)
+            insn (IDA insn_t)
+    '''
+    for inst_handler in INSTRUCTION_HANDLERS:
+        for feature, va in inst_handler(f, bb, insn):
+            yield feature, va
+
+
+INSTRUCTION_HANDLERS = (
+    extract_insn_api_features,
+    extract_insn_number_features,
+    extract_insn_bytes_features,
+    extract_insn_string_features,
+    extract_insn_offset_features,
+    extract_insn_nzxor_characteristic_features,
+    extract_insn_mnemonic_features,
+    extract_insn_peb_access_characteristic_features,
+    extract_insn_cross_section_cflow,
+    extract_insn_segment_access_features,
+    extract_function_calls_from,
+    extract_function_indirect_call_characteristic_features
+)
+
+
+def main():
+    features = []
+
+    for f in capa.features.extractors.ida.helpers.get_functions(ignore_thunks=True, ignore_libs=True):
+        for bb in idaapi.FlowChart(f, flags=idaapi.FC_PREDS):
+            for insn in capa.features.extractors.ida.helpers.get_instructions_in_range(bb.start_ea, bb.end_ea):
+                features.extend(list(extract_features(f, bb, insn)))
+
+    pprint.pprint(features)
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/features/extractors/loops.py
+++ b/capa/features/extractors/loops.py
@@ -0,0 +1,17 @@
+from networkx.algorithms.components import strongly_connected_components
+from networkx import nx
+
+
+def has_loop(edges, threshold=2):
+    ''' check if a list of edges representing a directed graph contains a loop
+
+        args:
+            edges: list of edge sets representing a directed graph i.e. [(1, 2), (2, 1)]
+            threshold: min number of nodes contained in loop
+
+        returns:
+            bool
+    '''
+    g = nx.DiGraph()
+    g.add_edges_from(edges)
+    return any(len(comp) >= threshold for comp in strongly_connected_components(g))
--- a/capa/features/extractors/strings.py
+++ b/capa/features/extractors/strings.py
@@ -0,0 +1,98 @@
+# Copyright (C) 2017 FireEye, Inc. All Rights Reserved.
+#
+# strings code from FLOSS, https://github.com/fireeye/flare-floss
+#
+
+import re
+from collections import namedtuple
+
+
+ASCII_BYTE = r" !\"#\$%&\'\(\)\*\+,-\./0123456789:;<=>\?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\}\\\~\t".encode('ascii')
+ASCII_RE_4 = re.compile(b"([%s]{%d,})" % (ASCII_BYTE, 4))
+UNICODE_RE_4 = re.compile(b"((?:[%s]\x00){%d,})" % (ASCII_BYTE, 4))
+REPEATS = [b"A", b"\x00", b"\xfe", b"\xff"]
+SLICE_SIZE = 4096
+
+String = namedtuple("String", ["s", "offset"])
+
+
+def buf_filled_with(buf, character):
+    dupe_chunk = character * SLICE_SIZE
+    for offset in range(0, len(buf), SLICE_SIZE):
+        new_chunk = buf[offset: offset + SLICE_SIZE]
+        if dupe_chunk[:len(new_chunk)] != new_chunk:
+            return False
+    return True
+
+
+def extract_ascii_strings(buf, n=4):
+    '''
+    Extract ASCII strings from the given binary data.
+
+    :param buf: A bytestring.
+    :type buf: str
+    :param n: The minimum length of strings to extract.
+    :type n: int
+    :rtype: Sequence[String]
+    '''
+
+    if not buf:
+        return
+
+    if (buf[0] in REPEATS) and buf_filled_with(buf, buf[0]):
+        return
+
+    r = None
+    if n == 4:
+        r = ASCII_RE_4
+    else:
+        reg = b"([%s]{%d,})" % (ASCII_BYTE, n)
+        r = re.compile(reg)
+    for match in r.finditer(buf):
+        yield String(match.group().decode("ascii"), match.start())
+
+
+def extract_unicode_strings(buf, n=4):
+    '''
+    Extract naive UTF-16 strings from the given binary data.
+
+    :param buf: A bytestring.
+    :type buf: str
+    :param n: The minimum length of strings to extract.
+    :type n: int
+    :rtype: Sequence[String]
+    '''
+
+    if not buf:
+        return
+
+    if (buf[0] in REPEATS) and buf_filled_with(buf, buf[0]):
+        return
+
+    if n == 4:
+        r = UNICODE_RE_4
+    else:
+        reg = b"((?:[%s]\x00){%d,})" % (ASCII_BYTE, n)
+        r = re.compile(reg)
+    for match in r.finditer(buf):
+        try:
+            yield String(match.group().decode("utf-16"), match.start())
+        except UnicodeDecodeError:
+            pass
+
+
+def main():
+    import sys
+
+    with open(sys.argv[1], 'rb') as f:
+        b = f.read()
+
+    for s in extract_ascii_strings(b):
+        print('0x{:x}: {:s}'.format(s.offset, s.s))
+
+    for s in extract_unicode_strings(b):
+        print('0x{:x}: {:s}'.format(s.offset, s.s))
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/features/extractors/viv/init.py
+++ b/capa/features/extractors/viv/init.py
@@ -0,0 +1,73 @@
+import types
+
+import viv_utils
+
+import capa.features.extractors
+import capa.features.extractors.viv.file
+import capa.features.extractors.viv.function
+import capa.features.extractors.viv.basicblock
+import capa.features.extractors.viv.insn
+from capa.features.extractors import FeatureExtractor
+
+import file
+import function
+import basicblock
+import insn
+__all__ = ["file", "function", "basicblock", "insn"]
+
+
+def get_va(self):
+    try:
+        # vivisect type
+        return self.va
+    except AttributeError:
+        pass
+
+    raise TypeError()
+
+
+def add_va_int_cast(o):
+    '''
+    dynamically add a cast-to-int (`__int__`) method to the given object
+    that returns the value of the `.va` property.
+
+    this bit of skullduggery lets use cast viv-utils objects as ints.
+    the correct way of doing this is to update viv-utils (or subclass the objects here).
+    '''
+    setattr(o, '__int__', types.MethodType(get_va, o, type(o)))
+    return o
+
+
+class VivisectFeatureExtractor(FeatureExtractor):
+    def __init__(self, vw, path):
+        super(VivisectFeatureExtractor, self).__init__()
+        self.vw = vw
+        self.path = path
+
+    def extract_file_features(self):
+        for feature, va in capa.features.extractors.viv.file.extract_features(self.vw, self.path):
+            yield feature, va
+
+    def get_functions(self):
+        for va in sorted(self.vw.getFunctions()):
+            yield add_va_int_cast(viv_utils.Function(self.vw, va))
+
+    def extract_function_features(self, f):
+        for feature, va in capa.features.extractors.viv.function.extract_features(f):
+            yield feature, va
+
+    def get_basic_blocks(self, f):
+        for bb in f.basic_blocks:
+            yield add_va_int_cast(bb)
+
+    def extract_basic_block_features(self, f, bb):
+        for feature, va in capa.features.extractors.viv.basicblock.extract_features(f, bb):
+            yield feature, va
+
+    def get_instructions(self, f, bb):
+        for insn in bb.instructions:
+            yield add_va_int_cast(insn)
+
+    def extract_insn_features(self, f, bb, insn):
+        for feature, va in capa.features.extractors.viv.insn.extract_features(f, bb, insn):
+            yield feature, va
--- a/capa/features/extractors/viv/basicblock.py
+++ b/capa/features/extractors/viv/basicblock.py
@@ -0,0 +1,147 @@
+import struct
+import string
+
+import envi
+import vivisect.const
+
+from capa.features import Characteristic
+from capa.features.basicblock import BasicBlock
+from capa.features.extractors.helpers import MIN_STACKSTRING_LEN
+
+
+def interface_extract_basic_block_XXX(f, bb):
+    '''
+    parse features from the given basic block.
+
+    args:
+      f (viv_utils.Function): the function to process.
+      bb (viv_utils.BasicBlock): the basic block to process.
+
+    yields:
+      (Feature, int): the feature and the address at which its found.
+    '''
+    yield NotImplementedError('feature'), NotImplementedError('virtual address')
+
+
+def _bb_has_tight_loop(f, bb):
+    '''
+    parse tight loops, true if last instruction in basic block branches to bb start
+    '''
+    if len(bb.instructions) > 0:
+        for bva, bflags in bb.instructions[-1].getBranches():
+            if bflags & vivisect.envi.BR_COND:
+                if bva == bb.va:
+                    return True
+
+    return False
+
+
+def extract_bb_tight_loop(f, bb):
+    ''' check basic block for tight loop indicators '''
+    if _bb_has_tight_loop(f, bb):
+        yield Characteristic('tight loop', True), bb.va
+
+
+def _bb_has_stackstring(f, bb):
+    '''
+    extract potential stackstring creation, using the following heuristics:
+      - basic block contains enough moves of constant bytes to the stack
+    '''
+    count = 0
+    for instr in bb.instructions:
+        if is_mov_imm_to_stack(instr):
+            # add number of operand bytes
+            src = instr.getOperands()[1]
+            count += get_printable_len(src)
+        if count > MIN_STACKSTRING_LEN:
+            return True
+
+    return False
+
+
+def extract_stackstring(f, bb):
+    ''' check basic block for stackstring indicators '''
+    if _bb_has_stackstring(f, bb):
+        yield Characteristic('stack string', True), bb.va
+
+
+def is_mov_imm_to_stack(instr):
+    '''
+    Return if instruction moves immediate onto stack
+    '''
+    if not instr.mnem.startswith('mov'):
+        return False
+
+    try:
+        dst, src = instr.getOperands()
+    except ValueError:
+        # not two operands
+        return False
+
+    if not src.isImmed():
+        return False
+
+    # TODO what about 64-bit operands?
+    if not isinstance(dst, envi.archs.i386.disasm.i386SibOper) and \
+            not isinstance(dst, envi.archs.i386.disasm.i386RegMemOper):
+        return False
+
+    if not dst.reg:
+        return False
+
+    rname = dst._dis_regctx.getRegisterName(dst.reg)
+    if rname not in ['ebp', 'rbp', 'esp', 'rsp']:
+        return False
+
+    return True
+
+
+def get_printable_len(oper):
+    '''
+    Return string length if all operand bytes are ascii or utf16-le printable
+    '''
+    if oper.tsize == 1:
+        chars = struct.pack('<B', oper.imm)
+    elif oper.tsize == 2:
+        chars = struct.pack('<H', oper.imm)
+    elif oper.tsize == 4:
+        chars = struct.pack('<I', oper.imm)
+    elif oper.tsize == 8:
+        chars = struct.pack('<Q', oper.imm)
+    if is_printable_ascii(chars):
+        return oper.tsize
+    if is_printable_utf16le(chars):
+        return oper.tsize / 2
+    return 0
+
+
+def is_printable_ascii(chars):
+    return all(ord(c) < 127 and c in string.printable for c in chars)
+
+
+def is_printable_utf16le(chars):
+    if all(c == '\x00' for c in chars[1::2]):
+        return is_printable_ascii(chars[::2])
+
+
+def extract_features(f, bb):
+    '''
+    extract features from the given basic block.
+
+    args:
+      f (viv_utils.Function): the function from which to extract features
+      bb (viv_utils.BasicBlock): the basic block to process.
+
+    yields:
+      Feature, set[VA]: the features and their location found in this basic block.
+    '''
+    yield BasicBlock(), bb.va
+    for bb_handler in BASIC_BLOCK_HANDLERS:
+        for feature, va in bb_handler(f, bb):
+            yield feature, va
+
+
+BASIC_BLOCK_HANDLERS = (
+    extract_bb_tight_loop,
+    extract_stackstring,
+)
--- a/capa/features/extractors/viv/file.py
+++ b/capa/features/extractors/viv/file.py
@@ -0,0 +1,102 @@
+import PE.carve as pe_carve  # vivisect PE
+
+from capa.features import Characteristic
+from capa.features.file import Export
+from capa.features.file import Import
+from capa.features.file import Section
+from capa.features import String
+import capa.features.extractors.strings
+
+
+def extract_file_embedded_pe(vw, file_path):
+    with open(file_path, 'rb') as f:
+        fbytes = f.read()
+
+    for offset, i in pe_carve.carve(fbytes, 1):
+        yield Characteristic('embedded pe', True), offset
+
+
+def extract_file_export_names(vw, file_path):
+    for va, etype, name, _ in vw.getExports():
+        yield Export(name), va
+
+
+def extract_file_import_names(vw, file_path):
+    '''
+    extract imported function names
+    1. imports by ordinal:
+     - modulename.#ordinal
+    2. imports by name, results in two features to support importname-only matching:
+     - modulename.importname
+     - importname
+    '''
+    for va, _, _, tinfo in vw.getImports():
+        # vivisect source: tinfo = "%s.%s" % (libname, impname)
+        modname, impname = tinfo.split('.')
+        if is_viv_ord_impname(impname):
+            # replace ord prefix with #
+            impname = '#%s' % impname[len('ord'):]
+            tinfo = '%s.%s' % (modname, impname)
+            yield Import(tinfo), va
+        else:
+            yield Import(tinfo), va
+            yield Import(impname), va
+
+
+def is_viv_ord_impname(impname):
+    '''
+    return if import name matches vivisect's ordinal naming scheme `'ord%d' % ord`
+    '''
+    if not impname.startswith('ord'):
+        return False
+    try:
+        int(impname[len('ord'):])
+    except ValueError:
+        return False
+    else:
+        return True
+
+
+def extract_file_section_names(vw, file_path):
+    for va, _, segname, _ in vw.getSegments():
+        yield Section(segname), va
+
+
+def extract_file_strings(vw, file_path):
+    '''
+    extract ASCII and UTF-16 LE strings from file
+    '''
+    with open(file_path, 'rb') as f:
+        b = f.read()
+
+    for s in capa.features.extractors.strings.extract_ascii_strings(b):
+        yield String(s.s), s.offset
+
+    for s in capa.features.extractors.strings.extract_unicode_strings(b):
+        yield String(s.s), s.offset
+
+
+def extract_features(vw, file_path):
+    '''
+    extract file features from given workspace
+
+    args:
+      vw (vivisect.VivWorkspace): the vivisect workspace
+      file_path: path to the input file
+
+    yields:
+      Tuple[Feature, VA]: a feature and its location.
+    '''
+
+    for file_handler in FILE_HANDLERS:
+        for feature, va in file_handler(vw, file_path):
+            yield feature, va
+
+
+FILE_HANDLERS = (
+    extract_file_embedded_pe,
+    extract_file_export_names,
+    extract_file_import_names,
+    extract_file_section_names,
+    extract_file_strings,
+)
--- a/capa/features/extractors/viv/function.py
+++ b/capa/features/extractors/viv/function.py
@@ -0,0 +1,99 @@
+import vivisect.const
+
+from capa.features import Characteristic
+from capa.features.extractors import loops
+
+
+def interface_extract_function_XXX(f):
+    '''
+    parse features from the given function.
+
+    args:
+      f (viv_utils.Function): the function to process.
+
+    yields:
+      (Feature, int): the feature and the address at which its found.
+    '''
+    yield NotImplementedError('feature'), NotImplementedError('virtual address')
+
+
+def get_switches(vw):
+    '''
+    caching accessor to vivisect workspace switch constructs.
+    '''
+    if 'switches' in vw.metadata:
+        return vw.metadata['switches']
+    else:
+        # addresses of switches in the program
+        switches = set()
+
+        for case_va, _ in filter(lambda t: 'case' in t[1], vw.getNames()):
+            # assume that the xref to a case location is a switch construct
+            for switch_va, _, _, _ in vw.getXrefsTo(case_va):
+                switches.add(switch_va)
+
+        vw.metadata['switches'] = switches
+        return switches
+
+
+def get_functions_with_switch(vw):
+    if 'functions_with_switch' in vw.metadata:
+        return vw.metadata['functions_with_switch']
+    else:
+        functions = set()
+        for switch in get_switches(vw):
+            functions.add(vw.getFunction(switch))
+        vw.metadata['functions_with_switch'] = functions
+        return functions
+
+
+def extract_function_switch(f):
+    '''
+    parse if a function contains a switch statement based on location names
+    method can be optimized
+    '''
+    if f.va in get_functions_with_switch(f.vw):
+        yield Characteristic('switch', True), f.va
+
+
+def extract_function_calls_to(f):
+    for src, _, _, _ in f.vw.getXrefsTo(f.va, rtype=vivisect.const.REF_CODE):
+        yield Characteristic('calls to', True), src
+
+
+def extract_function_loop(f):
+    '''
+    parse if a function has a loop
+    '''
+    edges = []
+
+    for bb in f.basic_blocks:
+        if len(bb.instructions) > 0:
+            for bva, bflags in bb.instructions[-1].getBranches():
+                if bflags & vivisect.envi.BR_COND or bflags & vivisect.envi.BR_FALL or bflags & vivisect.envi.BR_TABLE:
+                    edges.append((bb.va, bva))
+
+    if edges and loops.has_loop(edges):
+        yield Characteristic('loop', True), f.va
+
+
+def extract_features(f):
+    '''
+    extract features from the given function.
+
+    args:
+      f (viv_utils.Function): the function from which to extract features
+
+    yields:
+      Feature, set[VA]: the features and their location found in this function.
+    '''
+    for func_handler in FUNCTION_HANDLERS:
+        for feature, va in func_handler(f):
+            yield feature, va
+
+
+FUNCTION_HANDLERS = (
+    extract_function_switch,
+    extract_function_calls_to,
+    extract_function_loop
+)
--- a/capa/features/extractors/viv/indirect_calls.py
+++ b/capa/features/extractors/viv/indirect_calls.py
@@ -0,0 +1,154 @@
+import collections
+
+import envi
+import envi.archs.i386.disasm
+import envi.archs.amd64.disasm
+import vivisect.const
+
+
+# pull out consts for lookup performance
+i386RegOper = envi.archs.i386.disasm.i386RegOper
+i386ImmOper = envi.archs.i386.disasm.i386ImmOper
+i386ImmMemOper = envi.archs.i386.disasm.i386ImmMemOper
+Amd64RipRelOper = envi.archs.amd64.disasm.Amd64RipRelOper
+LOC_OP = vivisect.const.LOC_OP
+IF_NOFALL = envi.IF_NOFALL
+REF_CODE = vivisect.const.REF_CODE
+FAR_BRANCH_MASK = (envi.BR_PROC | envi.BR_DEREF | envi.BR_ARCH)
+
+DESTRUCTIVE_MNEMONICS = ('mov', 'lea', 'pop', 'xor')
+
+
+def get_previous_instructions(vw, va):
+    '''
+    collect the instructions that flow to the given address, local to the current function.
+
+    args:
+      vw (vivisect.Workspace)
+      va (int): the virtual address to inspect
+
+    returns:
+      List[int]: the prior instructions, which may fallthrough and/or jump here
+    '''
+    ret = []
+
+    # find the immediate prior instruction.
+    # ensure that it fallsthrough to this one.
+    loc = vw.getPrevLocation(va, adjacent=True)
+    if loc is not None:
+        # from vivisect.const:
+        # location: (L_VA, L_SIZE, L_LTYPE, L_TINFO)
+        (pva, _, ptype, pinfo) = vw.getPrevLocation(va, adjacent=True)
+
+        if ptype == LOC_OP and not (pinfo & IF_NOFALL):
+            ret.append(pva)
+
+    # find any code refs, e.g. jmp, to this location.
+    # ignore any calls.
+    #
+    # from vivisect.const:
+    # xref: (XR_FROM, XR_TO, XR_RTYPE, XR_RFLAG)
+    for (xfrom, _, _, xflag) in vw.getXrefsTo(va, REF_CODE):
+        if (xflag & FAR_BRANCH_MASK) != 0:
+            continue
+        ret.append(xfrom)
+
+    return ret
+
+
+class NotFoundError(Exception):
+    pass
+
+
+def find_definition(vw, va, reg):
+    '''
+    scan backwards from the given address looking for assignments to the given register.
+    if a constant, return that value.
+
+    args:
+      vw (vivisect.Workspace)
+      va (int): the virtual address at which to start analysis
+      reg (int): the vivisect register to study
+
+    returns:
+      (va: int, value?: int|None): the address of the assignment and the value, if a constant.
+
+    raises:
+      NotFoundError: when the definition cannot be found.
+    '''
+    q = collections.deque()
+    seen = set([])
+
+    q.extend(get_previous_instructions(vw, va))
+    while q:
+        cur = q.popleft()
+
+        # skip if we've already processed this location
+        if cur in seen:
+            continue
+        seen.add(cur)
+
+        insn = vw.parseOpcode(cur)
+
+        if len(insn.opers) == 0:
+            q.extend(get_previous_instructions(vw, cur))
+            continue
+
+        opnd0 = insn.opers[0]
+        if not \
+                (isinstance(opnd0, i386RegOper)
+                 and opnd0.reg == reg
+                 and insn.mnem in DESTRUCTIVE_MNEMONICS):
+            q.extend(get_previous_instructions(vw, cur))
+            continue
+
+        # if we reach here, the instruction is destructive to our target register.
+
+        # we currently only support extracting the constant from something like: `mov $reg, IAT`
+        # so, any other pattern results in an unknown value, represented by None.
+        # this is a good place to extend in the future, if we need more robust support.
+        if insn.mnem != 'mov':
+            return (cur, None)
+        else:
+            opnd1 = insn.opers[1]
+            if isinstance(opnd1, i386ImmOper):
+                return (cur, opnd1.getOperValue(opnd1))
+            elif isinstance(opnd1, i386ImmMemOper):
+                return (cur, opnd1.getOperAddr(opnd1))
+            elif isinstance(opnd1, Amd64RipRelOper):
+                return (cur, opnd1.getOperAddr(insn))
+            else:
+                # might be something like: `mov $reg, dword_401000[eax]`
+                return (cur, None)
+
+    raise NotFoundError()
+
+
+def is_indirect_call(vw, va, insn=None):
+    if insn is None:
+        insn = vw.parseOpcode(va)
+
+    return (insn.mnem == 'call'
+            and isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper))
+
+
+def resolve_indirect_call(vw, va, insn=None):
+    '''
+    inspect the given indirect call instruction and attempt to resolve the target address.
+
+    args:
+      vw (vivisect.Workspace)
+      va (int): the virtual address at which to start analysis
+
+    returns:
+      (va: int, value?: int|None): the address of the assignment and the value, if a constant.
+
+    raises:
+      NotFoundError: when the definition cannot be found.
+    '''
+    if insn is None:
+        insn = vw.parseOpcode(va)
+
+    assert is_indirect_call(vw, va, insn=insn)
+
+    return find_definition(vw, va, insn.opers[0].reg)
--- a/capa/features/extractors/viv/insn.py
+++ b/capa/features/extractors/viv/insn.py
@@ -0,0 +1,465 @@
+import envi.memory
+import envi.archs.i386.disasm
+import vivisect.const
+
+from capa.features import String
+from capa.features import Bytes
+from capa.features import Characteristic
+from capa.features import MAX_BYTES_FEATURE_SIZE
+from capa.features.insn import Number
+from capa.features.insn import Offset
+from capa.features.insn import Mnemonic
+import capa.features.extractors.helpers
+from capa.features.extractors.viv.indirect_calls import NotFoundError
+from capa.features.extractors.viv.indirect_calls import resolve_indirect_call
+
+
+def interface_extract_instruction_XXX(f, bb, insn):
+    '''
+    parse features from the given instruction.
+
+    args:
+      f (viv_utils.Function): the function to process.
+      bb (viv_utils.BasicBlock): the basic block to process.
+      insn (vivisect...Instruction): the instruction to process.
+
+    yields:
+      (Feature, int): the feature and the address at which its found.
+    '''
+    yield NotImplementedError('feature'), NotImplementedError('virtual address')
+
+
+def get_imports(vw):
+    '''
+    caching accessor to vivisect workspace imports
+    avoids performance issues in vivisect when collecting locations
+    '''
+    if 'imports' in vw.metadata:
+        return vw.metadata['imports']
+    else:
+        imports = {p[0]: p[3] for p in vw.getImports()}
+        vw.metadata['imports'] = imports
+        return imports
+
+
+def extract_insn_api_features(f, bb, insn):
+    '''parse API features from the given instruction.'''
+
+    # example:
+    #
+    #    call dword [0x00473038]
+
+    if insn.mnem != 'call':
+        return
+
+    # traditional call via IAT
+    if isinstance(insn.opers[0], envi.archs.i386.disasm.i386ImmMemOper):
+        oper = insn.opers[0]
+        target = oper.getOperAddr(insn)
+
+        imports = get_imports(f.vw)
+        if target in imports.keys():
+            for feature, va in capa.features.extractors.helpers.generate_api_features(imports[target], insn.va):
+                yield feature, va
+
+    # call via thunk on x86,
+    # see 9324d1a8ae37a36ae560c37448c9705a at 0x407985
+    #
+    # this is also how calls to internal functions may be decoded on x64.
+    # see Lab21-01.exe_:0x140001178
+    elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386PcRelOper):
+        target = insn.opers[0].getOperValue(insn)
+
+        try:
+            thunk = f.vw.getFunctionMeta(target, 'Thunk')
+        except vivisect.exc.InvalidFunction:
+            return
+        else:
+            if thunk:
+                for feature, va in capa.features.extractors.helpers.generate_api_features(thunk, insn.va):
+                    yield feature, va
+
+    # call via import on x64
+    # see Lab21-01.exe_:0x14000118C
+    elif isinstance(insn.opers[0], envi.archs.amd64.disasm.Amd64RipRelOper):
+        op = insn.opers[0]
+        target = op.getOperAddr(insn)
+
+        imports = get_imports(f.vw)
+        if target in imports.keys():
+            for feature, va in capa.features.extractors.helpers.generate_api_features(imports[target], insn.va):
+                yield feature, va
+
+    elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper):
+        try:
+            (_, target) = resolve_indirect_call(f.vw, insn.va, insn=insn)
+        except NotFoundError:
+            # not able to resolve the indirect call, sorry
+            return
+
+        if target is None:
+            # not able to resolve the indirect call, sorry
+            return
+
+        imports = get_imports(f.vw)
+        if target in imports.keys():
+            for feature, va in capa.features.extractors.helpers.generate_api_features(imports[target], insn.va):
+                yield feature, va
+
+
+def extract_insn_number_features(f, bb, insn):
+    '''parse number features from the given instruction.'''
+    # example:
+    #
+    #     push    3136B0h         ; dwControlCode
+    for oper in insn.opers:
+        # this is for both x32 and x64
+        if not isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
+            continue
+
+        v = oper.getOperValue(oper)
+
+        if f.vw.probeMemory(v, 1, envi.memory.MM_READ):
+            # this is a valid address
+            # assume its not also a constant.
+            continue
+
+        if insn.mnem == 'add' \
+           and insn.opers[0].isReg() \
+           and insn.opers[0].reg == envi.archs.i386.disasm.REG_ESP:
+            # skip things like:
+            #
+            #    .text:00401140                 call    sub_407E2B
+            #    .text:00401145                 add     esp, 0Ch
+            return
+
+        yield Number(v), insn.va
+
+
+def extract_insn_bytes_features(f, bb, insn):
+    '''
+    parse byte sequence features from the given instruction.
+    example:
+        #     push    offset iid_004118d4_IShellLinkA ; riid
+    '''
+    for oper in insn.opers:
+        if insn.mnem == 'call':
+            # ignore call instructions
+            continue
+
+        if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
+            v = oper.getOperValue(oper)
+        elif isinstance(oper, envi.archs.i386.disasm.i386RegMemOper):
+            # handle case like:
+            #   movzx   ecx, ds:byte_423258[eax]
+            v = oper.disp
+        elif isinstance(oper, envi.archs.amd64.disasm.Amd64RipRelOper):
+            # see: Lab21-01.exe_:0x1400010D3
+            v = oper.getOperAddr(insn)
+        else:
+            continue
+
+        segm = f.vw.getSegment(v)
+        if not segm:
+            continue
+
+        segm_end = segm[0] + segm[1]
+        try:
+            # Do not read beyond the end of a segment
+            if v + MAX_BYTES_FEATURE_SIZE > segm_end:
+                extracted_bytes = f.vw.readMemory(v, segm_end - v)
+            else:
+                extracted_bytes = f.vw.readMemory(v, MAX_BYTES_FEATURE_SIZE)
+        except envi.SegmentationViolation:
+            pass
+        else:
+            if not capa.features.extractors.helpers.all_zeros(extracted_bytes):
+                yield Bytes(extracted_bytes), insn.va
+
+
+def read_string(vw, offset):
+    try:
+        alen = vw.detectString(offset)
+    except envi.SegmentationViolation:
+        pass
+    else:
+        if alen > 0:
+            return vw.readMemory(offset, alen).decode('utf-8')
+
+    try:
+        ulen = vw.detectUnicode(offset)
+    except envi.SegmentationViolation:
+        pass
+    except IndexError:
+        # potential vivisect bug detecting Unicode at segment end
+        pass
+    else:
+        if ulen > 0:
+            if ulen % 2 == 1:
+                # vivisect seems to mis-detect the end unicode strings
+                # off by one, too short
+                ulen += 1
+            return vw.readMemory(offset, ulen).decode('utf-16')
+
+    raise ValueError('not a string', offset)
+
+
+def extract_insn_string_features(f, bb, insn):
+    '''parse string features from the given instruction.'''
+    # example:
+    #
+    #     push    offset aAcr     ; "ACR  > "
+    for oper in insn.opers:
+        if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
+            v = oper.getOperValue(oper)
+        elif isinstance(oper, envi.archs.amd64.disasm.Amd64RipRelOper):
+            v = oper.getOperAddr(insn)
+        else:
+            continue
+
+        try:
+            s = read_string(f.vw, v)
+        except ValueError:
+            continue
+        else:
+            yield String(s.rstrip('\x00')), insn.va
+
+
+def extract_insn_offset_features(f, bb, insn):
+    '''parse structure offset features from the given instruction.'''
+    # example:
+    #
+    #     .text:0040112F    cmp     [esi+4], ebx
+    for oper in insn.opers:
+        # this is for both x32 and x64
+        if not isinstance(oper, envi.archs.i386.disasm.i386RegMemOper):
+            continue
+
+        if oper.reg == envi.archs.i386.disasm.REG_ESP:
+            continue
+
+        if oper.reg == envi.archs.i386.disasm.REG_EBP:
+            continue
+
+        # TODO: do x64 support for real.
+        if oper.reg == envi.archs.amd64.disasm.REG_RBP:
+            continue
+
+        yield Offset(oper.disp), insn.va
+
+
+def is_security_cookie(f, bb, insn):
+    '''
+    check if an instruction is related to security cookie checks
+    '''
+    # security cookie check should use SP or BP
+    oper = insn.opers[1]
+    if oper.isReg() \
+        and oper.reg not in [envi.archs.i386.disasm.REG_ESP, envi.archs.i386.disasm.REG_EBP,
+                             # TODO: do x64 support for real.
+                             envi.archs.amd64.disasm.REG_RBP, envi.archs.amd64.disasm.REG_RSP]:
+        return False
+
+    # expect security cookie init in first basic block within first bytes (instructions)
+    bb0 = f.basic_blocks[0]
+    if bb == bb0 and insn.va < bb.va + 30:
+        return True
+
+    # ... or within last bytes (instructions) before a return
+    elif bb.instructions[-1].isReturn() and insn.va > bb.va + bb.size - 30:
+        return True
+
+    return False
+
+
+def extract_insn_nzxor_characteristic_features(f, bb, insn):
+    '''
+    parse non-zeroing XOR instruction from the given instruction.
+    ignore expected non-zeroing XORs, e.g. security cookies.
+    '''
+    if insn.mnem != 'xor':
+        return
+
+    if insn.opers[0] == insn.opers[1]:
+        return
+
+    if is_security_cookie(f, bb, insn):
+        return
+
+    yield Characteristic('nzxor', True), insn.va
+
+
+def extract_insn_mnemonic_features(f, bb, insn):
+    '''parse mnemonic features from the given instruction.'''
+    yield Mnemonic(insn.mnem), insn.va
+
+
+def extract_insn_peb_access_characteristic_features(f, bb, insn):
+    '''
+    parse peb access from the given function. fs:[0x30] on x86, gs:[0x60] on x64
+    '''
+    # TODO extract x64
+
+    if insn.mnem not in ['push', 'mov']:
+        return
+
+    if 'fs' in insn.getPrefixName():
+        for oper in insn.opers:
+            # examples
+            #
+            #     IDA: mov     eax, large fs:30h
+            #     viv: fs: mov eax,dword [0x00000030]  ; i386ImmMemOper
+            #     IDA: push    large dword ptr fs:30h
+            #     viv: fs: push dword [0x00000030]
+            #     fs: push dword [eax + 0x30]  ; i386RegMemOper, with eax = 0
+            if (isinstance(oper, envi.archs.i386.disasm.i386RegMemOper) and oper.disp == 0x30) or \
+                    (isinstance(oper, envi.archs.i386.disasm.i386ImmMemOper) and oper.imm == 0x30):
+                yield Characteristic('peb access', True), insn.va
+    elif 'gs' in insn.getPrefixName():
+        for oper in insn.opers:
+            if (isinstance(oper, envi.archs.amd64.disasm.i386RegMemOper) and oper.disp == 0x60) or \
+                    (isinstance(oper, envi.archs.amd64.disasm.i386ImmMemOper) and oper.imm == 0x60):
+                yield Characteristic('peb access', True), insn.va
+    else:
+        pass
+
+
+def extract_insn_segment_access_features(f, bb, insn):
+    ''' parse the instruction for access to fs or gs '''
+    prefix = insn.getPrefixName()
+
+    if prefix == 'fs':
+        yield Characteristic('fs access', True), insn.va
+
+    if prefix == 'gs':
+        yield Characteristic('gs access', True), insn.va
+
+
+def get_section(vw, va):
+    for start, length, _, __ in vw.getMemoryMaps():
+        if start <= va < start + length:
+            return start
+
+    raise KeyError(va)
+
+
+def extract_insn_cross_section_cflow(f, bb, insn):
+    '''
+    inspect the instruction for a CALL or JMP that crosses section boundaries.
+    '''
+    for va, flags in insn.getBranches():
+        if flags & envi.BR_FALL:
+            continue
+
+        try:
+            # skip 32-bit calls to imports
+            if insn.mnem == 'call' and isinstance(insn.opers[0], envi.archs.i386.disasm.i386ImmMemOper):
+                oper = insn.opers[0]
+                target = oper.getOperAddr(insn)
+
+                if target in get_imports(f.vw):
+                    continue
+
+            # skip 64-bit calls to imports
+            elif insn.mnem == 'call' and isinstance(insn.opers[0], envi.archs.amd64.disasm.Amd64RipRelOper):
+                op = insn.opers[0]
+                target = op.getOperAddr(insn)
+
+                if target in get_imports(f.vw):
+                    continue
+
+            if get_section(f.vw, insn.va) != get_section(f.vw, va):
+                yield Characteristic('cross section flow', True), insn.va
+
+        except KeyError:
+            continue
+
+
+# this is a feature that's most relevant at the function scope,
+# however, its most efficient to extract at the instruction scope.
+def extract_function_calls_from(f, bb, insn):
+    if insn.mnem != 'call':
+        return
+
+    target = None
+
+    # traditional call via IAT, x32
+    if isinstance(insn.opers[0], envi.archs.i386.disasm.i386ImmMemOper):
+        oper = insn.opers[0]
+        target = oper.getOperAddr(insn)
+        yield Characteristic('calls from', True), target
+
+    # call via thunk on x86,
+    # see 9324d1a8ae37a36ae560c37448c9705a at 0x407985
+    #
+    # call to internal function on x64
+    # see Lab21-01.exe_:0x140001178
+    elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386PcRelOper):
+        target = insn.opers[0].getOperValue(insn)
+        yield Characteristic('calls from', True), target
+
+    # call via IAT, x64
+    elif isinstance(insn.opers[0], envi.archs.amd64.disasm.Amd64RipRelOper):
+        op = insn.opers[0]
+        target = op.getOperAddr(insn)
+        yield Characteristic('calls from', True), target
+
+    if target and target == f.va:
+        # if we found a jump target and it's the function address
+        # mark as recursive
+        yield Characteristic('recursive call', True), target
+
+
+# this is a feature that's most relevant at the function or basic block scope,
+# however, its most efficient to extract at the instruction scope.
+def extract_function_indirect_call_characteristic_features(f, bb, insn):
+    '''
+    extract indirect function call characteristic (e.g., call eax or call dword ptr [edx+4])
+    does not include calls like => call ds:dword_ABD4974
+    '''
+    if insn.mnem != 'call':
+        return
+
+    # Checks below work for x86 and x64
+    if isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper):
+        # call edx
+        yield Characteristic('indirect call', True), insn.va
+    elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegMemOper):
+        # call dword ptr [eax+50h]
+        yield Characteristic('indirect call', True), insn.va
+    elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386SibOper):
+        # call qword ptr [rsp+78h]
+        yield Characteristic('indirect call', True), insn.va
+
+
+def extract_features(f, bb, insn):
+    '''
+    extract features from the given insn.
+
+    args:
+      f (viv_utils.Function): the function from which to extract features
+      bb (viv_utils.BasicBlock): the basic block to process.
+      insn (vivisect...Instruction): the instruction to process.
+
+    yields:
+      Feature, set[VA]: the features and their location found in this insn.
+    '''
+    for insn_handler in INSTRUCTION_HANDLERS:
+        for feature, va in insn_handler(f, bb, insn):
+            yield feature, va
+
+
+INSTRUCTION_HANDLERS = (
+    extract_insn_api_features,
+    extract_insn_number_features,
+    extract_insn_string_features,
+    extract_insn_bytes_features,
+    extract_insn_offset_features,
+    extract_insn_nzxor_characteristic_features,
+    extract_insn_mnemonic_features,
+    extract_insn_peb_access_characteristic_features,
+    extract_insn_cross_section_cflow,
+    extract_insn_segment_access_features,
+    extract_function_calls_from,
+    extract_function_indirect_call_characteristic_features
+)
--- a/capa/features/file.py
+++ b/capa/features/file.py
@@ -0,0 +1,31 @@
+from capa.features import Feature
+
+
+class Export(Feature):
+    def __init__(self, value):
+        # value is export name
+        super(Export, self).__init__([value])
+        self.value = value
+
+    def __str__(self):
+        return 'Export(%s)' % (self.value)
+
+
+class Import(Feature):
+    def __init__(self, value):
+        # value is import name
+        super(Import, self).__init__([value])
+        self.value = value
+
+    def __str__(self):
+        return 'Import(%s)' % (self.value)
+
+
+class Section(Feature):
+    def __init__(self, value):
+        # value is section name
+        super(Section, self).__init__([value])
+        self.value = value
+
+    def __str__(self):
+        return 'Section(%s)' % (self.value)
--- a/capa/features/freeze.py
+++ b/capa/features/freeze.py
@@ -0,0 +1,276 @@
+'''
+capa freeze file format: `| capa0000 | + zlib(utf-8(json(...)))`
+
+json format:
+
+    {
+      'version': 1,
+      'functions': {
+        int(function va): {
+          'basic blocks': {
+            int(basic block va): {
+              'instructions': [instruction va, ...]
+            },
+            ...
+          },
+          ...
+        },
+        ...
+      },
+      'scopes': {
+        'file': [
+          (str(name), [any(arg), ...], int(va), ()),
+          ...
+        },
+        'function': [
+          (str(name), [any(arg), ...], int(va), (int(function va), )),
+          ...
+        ],
+        'basic block': [
+          (str(name), [any(arg), ...], int(va), (int(function va),
+                                                 int(basic block va))),
+          ...
+        ],
+        'instruction': [
+          (str(name), [any(arg), ...], int(va), (int(function va),
+                                                 int(basic block va),
+                                                 int(instruction va))),
+          ...
+        ],
+      }
+    }
+'''
+import json
+import zlib
+import logging
+
+import capa.features.extractors
+import capa.features
+import capa.features.file
+import capa.features.function
+import capa.features.basicblock
+import capa.features.insn
+
+from capa.helpers import hex
+
+
+logger = logging.getLogger(__name__)
+
+
+def serialize_feature(feature):
+    return feature.freeze_serialize()
+
+
+KNOWN_FEATURES = {
+    F.__name__: F
+    for F in capa.features.Feature.__subclasses__()
+}
+
+
+def deserialize_feature(doc):
+    F = KNOWN_FEATURES[doc[0]]
+    return F.freeze_deserialize(doc[1])
+
+
+def dumps(extractor):
+    '''
+    serialize the given extractor to a string
+
+    args:
+      extractor: capa.features.extractor.FeatureExtractor:
+
+    returns:
+      str: the serialized features.
+    '''
+    ret = {
+        'version': 1,
+        'functions': {},
+        'scopes': {
+            'file': [],
+            'function': [],
+            'basic block': [],
+            'instruction': [],
+        }
+    }
+
+    for feature, va in extractor.extract_file_features():
+        ret['scopes']['file'].append(
+            serialize_feature(feature) + (hex(va), ())
+        )
+
+    for f in extractor.get_functions():
+        ret['functions'][hex(f)] = {}
+
+        for feature, va in extractor.extract_function_features(f):
+            ret['scopes']['function'].append(
+                serialize_feature(feature) + (hex(va), (hex(f), ))
+            )
+
+        for bb in extractor.get_basic_blocks(f):
+            ret['functions'][hex(f)][hex(bb)] = []
+
+            for feature, va in extractor.extract_basic_block_features(f, bb):
+                ret['scopes']['basic block'].append(
+                    serialize_feature(feature) + (hex(va), (hex(f), hex(bb), ))
+                )
+
+            for insn, insnva in sorted([(insn, int(insn)) for insn in extractor.get_instructions(f, bb)]):
+                ret['functions'][hex(f)][hex(bb)].append(hex(insnva))
+
+                for feature, va in extractor.extract_insn_features(f, bb, insn):
+                    ret['scopes']['instruction'].append(
+                        serialize_feature(feature) + (hex(va), (hex(f), hex(bb), hex(insnva), ))
+                    )
+    return json.dumps(ret)
+
+
+def loads(s):
+    '''deserialize a set of features (as a NullFeatureExtractor) from a string.'''
+    doc = json.loads(s)
+
+    if doc.get('version') != 1:
+        raise ValueError('unsupported freeze format version: %d' % (doc.get('version')))
+
+    features = {
+        'file features': [],
+        'functions': {},
+    }
+
+    for fva, function in doc.get('functions', {}).items():
+        fva = int(fva, 0x10)
+        features['functions'][fva] = {
+            'features': [],
+            'basic blocks': {},
+        }
+
+        for bbva, bb in function.items():
+            bbva = int(bbva, 0x10)
+            features['functions'][fva]['basic blocks'][bbva] = {
+                'features': [],
+                'instructions': {},
+            }
+
+            for insnva in bb:
+                insnva = int(insnva, 0x10)
+                features['functions'][fva]['basic blocks'][bbva]['instructions'][insnva] = {
+                    'features': [],
+                }
+
+    # in the following blocks, each entry looks like:
+    #
+    #     ('MatchedRule', ('foo', ), '0x401000', ('0x401000', ))
+    #      ^^^^^^^^^^^^^  ^^^^^^^^^  ^^^^^^^^^^  ^^^^^^^^^^^^^^
+    #      feature name   args       addr         func/bb/insn
+    for feature in doc.get('scopes', {}).get('file', []):
+        va, loc = feature[2:]
+        va = int(va, 0x10)
+        feature = deserialize_feature(feature[:2])
+        features['file features'].append((va, feature))
+
+    for feature in doc.get('scopes', {}).get('function', []):
+        # fetch the pair like:
+        #
+        #     ('0x401000', ('0x401000', ))
+        #      ^^^^^^^^^^  ^^^^^^^^^^^^^^
+        #      addr         func/bb/insn
+        va, loc = feature[2:]
+        va = int(va, 0x10)
+        loc = [int(lo, 0x10) for lo in loc]
+
+        # decode the feature from the pair like:
+        #
+        #     ('MatchedRule', ('foo', ))
+        #      ^^^^^^^^^^^^^  ^^^^^^^^^
+        #      feature name   args
+        feature = deserialize_feature(feature[:2])
+        features['functions'][loc[0]]['features'].append((va, feature))
+
+    for feature in doc.get('scopes', {}).get('basic block', []):
+        va, loc = feature[2:]
+        va = int(va, 0x10)
+        loc = [int(lo, 0x10) for lo in loc]
+        feature = deserialize_feature(feature[:2])
+        features['functions'][loc[0]]['basic blocks'][loc[1]]['features'].append((va, feature))
+
+    for feature in doc.get('scopes', {}).get('instruction', []):
+        va, loc = feature[2:]
+        va = int(va, 0x10)
+        loc = [int(lo, 0x10) for lo in loc]
+        feature = deserialize_feature(feature[:2])
+        features['functions'][loc[0]]['basic blocks'][loc[1]]['instructions'][loc[2]]['features'].append((va, feature))
+
+    return capa.features.extractors.NullFeatureExtractor(features)
+
+
+MAGIC = 'capa0000'.encode('ascii')
+
+
+def dump(extractor):
+    '''serialize the given extractor to a byte array.'''
+    return MAGIC + zlib.compress(dumps(extractor).encode('utf-8'))
+
+
+def is_freeze(buf):
+    return buf[:len(MAGIC)] == MAGIC
+
+
+def load(buf):
+    '''deserialize a set of features (as a NullFeatureExtractor) from a byte array.'''
+    if not is_freeze(buf):
+        raise ValueError('missing magic header')
+    return loads(zlib.decompress(buf[len(MAGIC):]).decode('utf-8'))
+
+
+def main(argv=None):
+    import sys
+    import argparse
+    import capa.main
+
+    if argv is None:
+        argv = sys.argv[1:]
+
+    formats = [
+        ('auto', '(default) detect file type automatically'),
+        ('pe', 'Windows PE file'),
+        ('sc32', '32-bit shellcode'),
+        ('sc64', '64-bit shellcode'),
+    ]
+    format_help = ', '.join(['%s: %s' % (f[0], f[1]) for f in formats])
+
+    parser = argparse.ArgumentParser(description="save capa features to a file")
+    parser.add_argument("sample", type=str,
+                        help="Path to sample to analyze")
+    parser.add_argument("output", type=str,
+                        help="Path to output file")
+    parser.add_argument("-v", "--verbose", action="store_true",
+                        help="Enable verbose output")
+    parser.add_argument("-q", "--quiet", action="store_true",
+                        help="Disable all output but errors")
+    parser.add_argument("-f", "--format", choices=[f[0] for f in formats], default="auto",
+                        help="Select sample format, %s" % format_help)
+    args = parser.parse_args(args=argv)
+
+    if args.quiet:
+        logging.basicConfig(level=logging.ERROR)
+        logging.getLogger().setLevel(logging.ERROR)
+    elif args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    vw = capa.main.get_workspace(args.sample, args.format)
+
+    # don't import this at top level to support ida/py3 backend
+    import capa.features.extractors.viv
+    extractor = capa.features.extractors.viv.VivisectFeatureExtractor(vw, args.sample)
+    with open(args.output, 'wb') as f:
+        f.write(dump(extractor))
+
+    return 0
+
+
+if __name__ == "__main__":
+    import sys
+    sys.exit(main())
--- a/capa/features/function.py
+++ b/capa/features/function.py
--- a/capa/features/insn.py
+++ b/capa/features/insn.py
@@ -0,0 +1,46 @@
+from capa.features import Feature
+
+
+class API(Feature):
+    def __init__(self, name):
+        # Downcase library name if given
+        if '.' in name:
+            modname, impname = name.split('.')
+            name = modname.lower() + '.' + impname
+
+        super(API, self).__init__([name])
+
+
+class Number(Feature):
+    def __init__(self, value, symbol=None):
+        super(Number, self).__init__([value])
+        self.value = value
+        self.symbol = symbol
+
+    def __str__(self):
+        if self.symbol:
+            return 'number(0x%x = %s)' % (self.value, self.symbol)
+        else:
+            return 'number(0x%x)' % (self.value)
+
+
+class Offset(Feature):
+    def __init__(self, value, symbol=None):
+        super(Offset, self).__init__([value])
+        self.value = value
+        self.symbol = symbol
+
+    def __str__(self):
+        if self.symbol:
+            return 'offset(0x%x = %s)' % (self.value, self.symbol)
+        else:
+            return 'offset(0x%x)' % (self.value)
+
+
+class Mnemonic(Feature):
+    def __init__(self, value):
+        super(Mnemonic, self).__init__([value])
+        self.value = value
+
+    def __str__(self):
+        return 'mnemonic(%s)' % (self.value)
--- a/capa/helpers.py
+++ b/capa/helpers.py
@@ -0,0 +1,18 @@
+_hex = hex
+
+
+def hex(i):
+    # under py2.7, long integers get formatted with a trailing `L`
+    # and this is not pretty. so strip it out.
+    return _hex(oint(i)).rstrip('L')
+
+
+def oint(i):
+    # there seems to be some trouble with using `int(viv_utils.Function)`
+    # with the black magic we do with binding the `__int__()` routine.
+    # i haven't had a chance to debug this yet (and i have no hotel wifi).
+    # so in the meantime, detect this, and call the method directly.
+    try:
+        return int(i)
+    except TypeError:
+        return i.__int__()
--- a/capa/ida/init.py
+++ b/capa/ida/init.py
--- a/capa/ida/explorer/init.py
+++ b/capa/ida/explorer/init.py
--- a/capa/ida/explorer/item.py
+++ b/capa/ida/explorer/item.py
@@ -0,0 +1,250 @@
+import binascii
+import codecs
+import sys
+
+from PyQt5 import QtCore
+
+import idaapi
+import idc
+
+import capa.ida.helpers
+
+
+def info_to_name(s):
+    ''' '''
+    try:
+        return s.split('(')[1].rstrip(')')
+    except IndexError:
+        return ''
+
+
+def ea_to_hex_str(ea):
+    ''' '''
+    return '%08X' % ea
+
+
+class CapaExplorerDataItem(object):
+    ''' store data for CapaExplorerDataModel
+
+        TODO
+    '''
+    def __init__(self, parent, data):
+        ''' '''
+        self._parent = parent
+        self._data = data
+        self._children = []
+        self._checked = False
+
+        self.flags = (QtCore.Qt.ItemIsEnabled | QtCore.Qt.ItemIsSelectable | QtCore.Qt.ItemIsTristate | QtCore.Qt.ItemIsUserCheckable)
+
+        if self._parent:
+            self._parent.appendChild(self)
+
+    def setIsEditable(self, isEditable=False):
+        ''' modify item flags to be editable or not '''
+        if isEditable:
+            self.flags |= QtCore.Qt.ItemIsEditable
+        else:
+            self.flags &= ~QtCore.Qt.ItemIsEditable
+
+    def setChecked(self, checked):
+        ''' set item as checked '''
+        self._checked = checked
+
+    def isChecked(self):
+        ''' get item is checked '''
+        return self._checked
+
+    def appendChild(self, item):
+        ''' add child item
+
+            @param item: CapaExplorerDataItem*
+        '''
+        self._children.append(item)
+
+    def child(self, row):
+        ''' get child row
+
+            @param row: TODO
+        '''
+        return self._children[row]
+
+    def childCount(self):
+        ''' get child count '''
+        return len(self._children)
+
+    def columnCount(self):
+        ''' get column count '''
+        return len(self._data)
+
+    def data(self, column):
+        ''' get data at column '''
+        try:
+            return self._data[column]
+        except IndexError:
+            return None
+
+    def parent(self):
+        ''' get parent '''
+        return self._parent
+
+    def row(self):
+        ''' get row location '''
+        if self._parent:
+            return self._parent._children.index(self)
+        return 0
+
+    def setData(self, column, value):
+        ''' set data in column '''
+        self._data[column] = value
+
+    def children(self):
+        ''' yield children '''
+        for child in self._children:
+            yield child
+
+    def removeChildren(self):
+        ''' '''
+        del self._children[:]
+
+    def __str__(self):
+        ''' get string representation of columns '''
+        return ' '.join([data for data in self._data if data])
+
+    @property
+    def info(self):
+        ''' '''
+        return self._data[0]
+
+    @property
+    def ea(self):
+        ''' '''
+        try:
+            return int(self._data[1], 16)
+        except ValueError:
+            return None
+
+    @property
+    def details(self):
+        ''' '''
+        return self._data[2]
+
+
+class CapaExplorerRuleItem(CapaExplorerDataItem):
+    ''' store data relevant to capa function result '''
+
+    view_fmt = '%s (%d)'
+
+    def __init__(self, parent, name, count, definition):
+        ''' '''
+        self._definition = definition
+        name = CapaExplorerRuleItem.view_fmt % (name, count) if count else name
+        super(CapaExplorerRuleItem, self).__init__(parent, [name, '', ''])
+
+    @property
+    def definition(self):
+        ''' '''
+        return self._definition
+
+
+class CapaExplorerFunctionItem(CapaExplorerDataItem):
+    ''' store data relevant to capa function result '''
+
+    view_fmt = 'function(%s)'
+
+    def __init__(self, parent, name, ea):
+        ''' '''
+        address = ea_to_hex_str(ea)
+        name = CapaExplorerFunctionItem.view_fmt % name
+
+        super(CapaExplorerFunctionItem, self).__init__(parent, [name, address, ''])
+
+    @property
+    def info(self):
+        ''' '''
+        info = super(CapaExplorerFunctionItem, self).info
+        name = info_to_name(info)
+        return name if name else info
+
+    @info.setter
+    def info(self, name):
+        ''' '''
+        self._data[0] = CapaExplorerFunctionItem.view_fmt % name
+
+
+class CapaExplorerBlockItem(CapaExplorerDataItem):
+    ''' store data relevant to capa basic block results '''
+
+    view_fmt = 'basic block(loc_%s)'
+
+    def __init__(self, parent, ea):
+        ''' '''
+        address = ea_to_hex_str(ea)
+        name = CapaExplorerBlockItem.view_fmt % address
+
+        super(CapaExplorerBlockItem, self).__init__(parent, [name, address, ''])
+
+
+class CapaExplorerDefaultItem(CapaExplorerDataItem):
+    ''' store data relevant to capa default result '''
+
+    def __init__(self, parent, name, ea=None):
+        ''' '''
+        if ea:
+            address = ea_to_hex_str(ea)
+        else:
+            address = ''
+
+        super(CapaExplorerDefaultItem, self).__init__(parent, [name, address, ''])
+
+
+class CapaExplorerFeatureItem(CapaExplorerDataItem):
+    ''' store data relevant to capa feature result '''
+
+    def __init__(self, parent, data):
+        super(CapaExplorerFeatureItem, self).__init__(parent, data)
+
+
+class CapaExplorerInstructionViewItem(CapaExplorerFeatureItem):
+
+    def __init__(self, parent, name, ea):
+        ''' '''
+        details = capa.ida.helpers.get_disasm_line(ea)
+        address = ea_to_hex_str(ea)
+
+        super(CapaExplorerInstructionViewItem, self).__init__(parent, [name, address, details])
+
+        self.ida_highlight = idc.get_color(ea, idc.CIC_ITEM)
+
+
+class CapaExplorerByteViewItem(CapaExplorerFeatureItem):
+
+    def __init__(self, parent, name, ea):
+        ''' '''
+        address = ea_to_hex_str(ea)
+
+        byte_snap = idaapi.get_bytes(ea, 32)
+        if byte_snap:
+            byte_snap = codecs.encode(byte_snap, 'hex').upper()
+            # TODO: better way?
+            if sys.version_info >= (3, 0):
+                details = ' '.join([byte_snap[i:i + 2].decode() for i in range(0, len(byte_snap), 2)])
+            else:
+                details = ' '.join([byte_snap[i:i + 2] for i in range(0, len(byte_snap), 2)])
+        else:
+            details = ''
+
+        super(CapaExplorerByteViewItem, self).__init__(parent, [name, address, details])
+
+        self.ida_highlight = idc.get_color(ea, idc.CIC_ITEM)
+
+
+class CapaExplorerStringViewItem(CapaExplorerFeatureItem):
+
+    def __init__(self, parent, name, ea, value):
+        ''' '''
+        address = ea_to_hex_str(ea)
+
+        super(CapaExplorerStringViewItem, self).__init__(parent, [name, address, value])
+
+        self.ida_highlight = idc.get_color(ea, idc.CIC_ITEM)
--- a/capa/ida/explorer/model.py
+++ b/capa/ida/explorer/model.py
@@ -0,0 +1,423 @@
+from PyQt5 import QtCore, QtGui
+from collections import deque
+import binascii
+
+import idaapi
+import idc
+
+from capa.ida.explorer.item import (
+    CapaExplorerDataItem,
+    CapaExplorerDefaultItem,
+    CapaExplorerFeatureItem,
+    CapaExplorerFunctionItem,
+    CapaExplorerRuleItem,
+    CapaExplorerStringViewItem,
+    CapaExplorerInstructionViewItem,
+    CapaExplorerByteViewItem,
+    CapaExplorerBlockItem
+)
+
+import capa.ida.helpers
+
+
+# default highlight color used in IDA window
+DEFAULT_HIGHLIGHT = 0xD096FF
+
+
+class CapaExplorerDataModel(QtCore.QAbstractItemModel):
+    ''' '''
+
+    COLUMN_INDEX_RULE_INFORMATION = 0
+    COLUMN_INDEX_VIRTUAL_ADDRESS = 1
+    COLUMN_INDEX_DETAILS = 2
+
+    COLUMN_COUNT = 3
+
+    def __init__(self, parent=None):
+        ''' '''
+        super(CapaExplorerDataModel, self).__init__(parent)
+
+        self._root = CapaExplorerDataItem(None, ['Rule Information', 'Address', 'Details'])
+
+    def reset(self):
+        ''' '''
+        # reset checkboxes and color highlights
+        # TODO: make less hacky
+        for idx in range(self._root.childCount()):
+            rindex = self.index(idx, 0, QtCore.QModelIndex())
+            for mindex in self.iterateChildrenIndexFromRootIndex(rindex, ignore_root=False):
+                mindex.internalPointer().setChecked(False)
+                self._util_reset_ida_highlighting(mindex.internalPointer(), False)
+                self.dataChanged.emit(mindex, mindex)
+
+    def clear(self):
+        ''' '''
+        self.beginResetModel()
+        # TODO: make sure this isn't for memory
+        self._root.removeChildren()
+        self.endResetModel()
+
+    def columnCount(self, mindex):
+        ''' get the number of columns for the children of the given parent
+
+            @param mindex: QModelIndex*
+
+            @retval column count
+        '''
+        if mindex.isValid():
+            return mindex.internalPointer().columnCount()
+        else:
+            return self._root.columnCount()
+
+    def data(self, mindex, role):
+        ''' get data stored under the given role for the item referred to by the index
+
+            @param mindex: QModelIndex*
+            @param role: QtCore.Qt.*
+
+            @retval data to be displayed
+        '''
+        if not mindex.isValid():
+            return None
+
+        if role == QtCore.Qt.DisplayRole:
+            # display data in corresponding column
+            return mindex.internalPointer().data(mindex.column())
+
+        if role == QtCore.Qt.ToolTipRole and \
+            CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION == mindex.column() and \
+                isinstance(mindex.internalPointer(), CapaExplorerRuleItem):
+            # show tooltip containing rule definition
+            return mindex.internalPointer().definition
+
+        if role == QtCore.Qt.CheckStateRole and mindex.column() == CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION:
+            # inform view how to display content of checkbox - un/checked
+            return QtCore.Qt.Checked if mindex.internalPointer().isChecked() else QtCore.Qt.Unchecked
+
+        if role == QtCore.Qt.FontRole and mindex.column() in (CapaExplorerDataModel.COLUMN_INDEX_VIRTUAL_ADDRESS, CapaExplorerDataModel.COLUMN_INDEX_DETAILS):
+            return QtGui.QFont('Courier', weight=QtGui.QFont.Medium)
+
+        if role == QtCore.Qt.FontRole and mindex.internalPointer() == self._root:
+            return QtCore.QFont(bold=True)
+
+        return None
+
+    def flags(self, mindex):
+        ''' get item flags for given index
+
+            @param mindex: QModelIndex*
+
+            @retval QtCore.Qt.ItemFlags
+        '''
+        if not mindex.isValid():
+            return QtCore.Qt.NoItemFlags
+
+        return mindex.internalPointer().flags
+
+    def headerData(self, section, orientation, role):
+        ''' get data for the given role and section in the header with the specified orientation
+
+            @param section: int
+            @param orientation: QtCore.Qt.Orientation
+            @param role: QtCore.Qt.DisplayRole
+
+            @retval header data list()
+        '''
+        if orientation == QtCore.Qt.Horizontal and role == QtCore.Qt.DisplayRole:
+            return self._root.data(section)
+
+        return None
+
+    def index(self, row, column, parent):
+        ''' get index of the item in the model specified by the given row, column and parent index
+
+            @param row: int
+            @param column: int
+            @param parent: QModelIndex*
+
+            @retval QModelIndex*
+        '''
+        if not self.hasIndex(row, column, parent):
+            return QtCore.QModelIndex()
+
+        if not parent.isValid():
+            parent_item = self._root
+        else:
+            parent_item = parent.internalPointer()
+
+        child_item = parent_item.child(row)
+
+        if child_item:
+            return self.createIndex(row, column, child_item)
+        else:
+            return QtCore.QModelIndex()
+
+    def parent(self, mindex):
+        ''' get parent of the model item with the given index
+
+            if the item has no parent, an invalid QModelIndex* is returned
+
+            @param mindex: QModelIndex*
+
+            @retval QModelIndex*
+        '''
+        if not mindex.isValid():
+            return QtCore.QModelIndex()
+
+        child = mindex.internalPointer()
+        parent = child.parent()
+
+        if parent == self._root:
+            return QtCore.QModelIndex()
+
+        return self.createIndex(parent.row(), 0, parent)
+
+    def iterateChildrenIndexFromRootIndex(self, mindex, ignore_root=True):
+        ''' depth-first traversal of child nodes
+
+            @param mindex: QModelIndex*
+
+            @retval yield QModelIndex*
+        '''
+        visited = set()
+        stack = deque((mindex,))
+
+        while True:
+            try:
+                cmindex = stack.pop()
+            except IndexError:
+                break
+
+            if cmindex not in visited:
+                if not ignore_root or cmindex is not mindex:
+                    # ignore root
+                    yield cmindex
+
+                visited.add(cmindex)
+
+                for idx in range(self.rowCount(cmindex)):
+                    stack.append(cmindex.child(idx, 0))
+
+    def _util_reset_ida_highlighting(self, item, checked):
+        ''' '''
+        if not isinstance(item, (CapaExplorerStringViewItem, CapaExplorerInstructionViewItem, CapaExplorerByteViewItem)):
+            # ignore other item types
+            return
+
+        curr_highlight = idc.get_color(item.ea, idc.CIC_ITEM)
+
+        if checked:
+            # item checked - record current highlight and set to new
+            item.ida_highlight = curr_highlight
+            idc.set_color(item.ea, idc.CIC_ITEM, DEFAULT_HIGHLIGHT)
+        else:
+            # item unchecked - reset highlight
+            if curr_highlight != DEFAULT_HIGHLIGHT:
+                # user modified highlight - record new highlight and do not modify
+                item.ida_highlight = curr_highlight
+            else:
+                # reset highlight to previous
+                idc.set_color(item.ea, idc.CIC_ITEM, item.ida_highlight)
+
+    def setData(self, mindex, value, role):
+        ''' set the role data for the item at index to value
+
+            @param mindex: QModelIndex*
+            @param value: QVariant*
+            @param role: QtCore.Qt.EditRole
+
+            @retval True/False
+        '''
+        if not mindex.isValid():
+            return False
+
+        if role == QtCore.Qt.CheckStateRole and mindex.column() == CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION:
+            # user un/checked box - un/check parent and children
+            for cindex in self.iterateChildrenIndexFromRootIndex(mindex, ignore_root=False):
+                cindex.internalPointer().setChecked(value)
+                self._util_reset_ida_highlighting(cindex.internalPointer(), value)
+                self.dataChanged.emit(cindex, cindex)
+            return True
+
+        if role == QtCore.Qt.EditRole and value and \
+                mindex.column() == CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION and \
+                isinstance(mindex.internalPointer(), CapaExplorerFunctionItem):
+            # user renamed function - update IDA database and data model
+            old_name = mindex.internalPointer().info
+            new_name = str(value)
+
+            if idaapi.set_name(mindex.internalPointer().ea, new_name):
+                # success update IDA database - update data model
+                self.update_function_name(old_name, new_name)
+                return True
+
+        # no handle
+        return False
+
+    def rowCount(self, mindex):
+        ''' get the number of rows under the given parent
+
+            when the parent is valid it means that is returning the number of
+            children of parent
+
+            @param mindex: QModelIndex*
+
+            @retval row count
+        '''
+        if mindex.column() > 0:
+            return 0
+
+        if not mindex.isValid():
+            item = self._root
+        else:
+            item = mindex.internalPointer()
+
+        return item.childCount()
+
+    def render_capa_results(self, rule_set, results):
+        ''' populate data model with capa results
+
+            @param rule_set: TODO
+            @param results: TODO
+        '''
+        # prepare data model for changes
+        self.beginResetModel()
+
+        for (rule, ress) in results.items():
+            if rule_set.rules[rule].meta.get('lib', False):
+                # skip library rules
+                continue
+
+            # top level item is rule
+            parent = CapaExplorerRuleItem(self._root, rule, len(ress), rule_set.rules[rule].definition)
+
+            for (ea, res) in sorted(ress, key=lambda p: p[0]):
+                if rule_set.rules[rule].scope == capa.rules.FILE_SCOPE:
+                    # file scope - parent is rule
+                    parent2 = parent
+                elif rule_set.rules[rule].scope == capa.rules.FUNCTION_SCOPE:
+                    parent2 = CapaExplorerFunctionItem(parent, idaapi.get_name(ea), ea)
+                elif rule_set.rules[rule].scope == capa.rules.BASIC_BLOCK_SCOPE:
+                    parent2 = CapaExplorerBlockItem(parent, ea)
+                else:
+                    # TODO: better way to notify a missed scope?
+                    parent2 = CapaExplorerDefaultItem(parent, '', ea)
+
+                self._render_result(rule_set, res, parent2)
+
+        # reset data model after making changes
+        self.endResetModel()
+
+    def _render_result(self, rule_set, result, parent):
+        ''' '''
+        if not result.success:
+            # TODO: display failed branches??
+            return
+
+        if isinstance(result.statement, capa.engine.Some):
+            if result.statement.count == 0:
+                if sum(map(lambda c: c.success, result.children)) > 0:
+                    parent2 = CapaExplorerDefaultItem(parent, 'optional')
+                else:
+                    parent2 = parent
+            else:
+                parent2 = CapaExplorerDefaultItem(parent, '%d or more' % result.statement.count)
+        elif not isinstance(result.statement, (capa.features.Feature, capa.engine.Element, capa.engine.Range, capa.engine.Regex)):
+            # when rending a structural node (and/or/not) then we only care about the node name.
+            '''
+            succs = list(filter(lambda c: bool(c), result.children))
+            if len(succs) == 1:
+                # skip structural node with single succeeding child
+                parent2 = parent
+            else:
+                parent2 = CapaExplorerDefaultItem(parent, result.statement.name.lower())
+            '''
+            parent2 = CapaExplorerDefaultItem(parent, result.statement.name.lower())
+        else:
+            # but when rendering a Feature, want to see any arguments to it
+            if len(result.locations) == 1:
+                # ea = result.locations.pop()
+                ea = next(iter(result.locations))
+                parent2 = self._render_feature(rule_set, parent, result.statement, ea, str(result.statement))
+            else:
+                parent2 = CapaExplorerDefaultItem(parent, str(result.statement))
+
+                for ea in sorted(result.locations):
+                    self._render_feature(rule_set, parent2, result.statement, ea)
+
+        for child in result.children:
+            self._render_result(rule_set, child, parent2)
+
+    def _render_feature(self, rule_set, parent, feature, ea, name='-'):
+        ''' render a given feature
+
+            @param rule_set: TODO
+            @param parent: TODO
+            @param result: TODO
+            @param ea: virtual address
+            @param name: TODO
+        '''
+        instruction_view = (
+            capa.features.Bytes,
+            capa.features.String,
+            capa.features.insn.API,
+            capa.features.insn.Mnemonic,
+            capa.features.insn.Number,
+            capa.features.insn.Offset
+        )
+
+        byte_view = (
+            capa.features.file.Section,
+        )
+
+        string_view = (
+            capa.engine.Regex,
+        )
+
+        if isinstance(feature, instruction_view):
+            return CapaExplorerInstructionViewItem(parent, name, ea)
+
+        if isinstance(feature, byte_view):
+            return CapaExplorerByteViewItem(parent, name, ea)
+
+        if isinstance(feature, string_view):
+            # TODO: move string collection to item constructor
+            if isinstance(feature, capa.engine.Regex):
+                return CapaExplorerStringViewItem(parent, name, ea, feature.match)
+
+        if isinstance(feature, capa.features.Characteristic):
+            # special rendering for characteristics
+            if feature.name in ('loop', 'recursive call', 'tight loop', 'switch'):
+                return CapaExplorerDefaultItem(parent, name)
+            if feature.name in ('embedded pe',):
+                return CapaExplorerByteViewItem(parent, name, ea)
+            return CapaExplorerInstructionViewItem(parent, name, ea)
+
+        if isinstance(feature, capa.features.MatchedRule):
+            # render feature as a rule item
+            return CapaExplorerRuleItem(parent, name, 0, rule_set.rules[feature.rule_name].definition)
+
+        if isinstance(feature, capa.engine.Range):
+            # render feature based upon type child
+            return self._render_feature(rule_set, parent, feature.child, ea, name)
+
+        # no handle, default to name and virtual address display
+        return CapaExplorerDefaultItem(parent, name, ea)
+
+    def update_function_name(self, old_name, new_name):
+        ''' update all instances of function name
+
+            @param old_name: previous function name
+            @param new_name: new function name
+        '''
+        rmindex = self.index(0, 0, QtCore.QModelIndex())
+
+        # convert name to view format for matching
+        # TODO: handle this better
+        old_name = CapaExplorerFunctionItem.view_fmt % old_name
+
+        for mindex in self.match(rmindex, QtCore.Qt.DisplayRole, old_name, hits=-1, flags=QtCore.Qt.MatchRecursive):
+            if not isinstance(mindex.internalPointer(), CapaExplorerFunctionItem):
+                continue
+            mindex.internalPointer().info = new_name
+            self.dataChanged.emit(mindex, mindex)
--- a/capa/ida/explorer/proxy.py
+++ b/capa/ida/explorer/proxy.py
@@ -0,0 +1,75 @@
+from PyQt5 import QtCore
+from capa.ida.explorer.model import CapaExplorerDataModel
+
+
+class CapaExplorerSortFilterProxyModel(QtCore.QSortFilterProxyModel):
+
+    def __init__(self, parent=None):
+        ''' '''
+        super(CapaExplorerSortFilterProxyModel, self).__init__(parent)
+
+    def lessThan(self, left, right):
+        ''' true if the value of the left item is less than value of right item
+
+            @param left: QModelIndex*
+            @param right: QModelIndex*
+
+            @retval True/False
+        '''
+        ldata = left.internalPointer().data(left.column())
+        rdata = right.internalPointer().data(right.column())
+
+        if ldata and rdata and left.column() == CapaExplorerDataModel.COLUMN_INDEX_VIRTUAL_ADDRESS and left.column() == right.column():
+            # convert virtual address before compare
+            return int(ldata, 16) < int(rdata, 16)
+        else:
+            # compare as lowercase
+            return ldata.lower() < rdata.lower()
+
+    def filterAcceptsRow(self, row, parent):
+        ''' true if the item in the row indicated by the given row and parent
+            should be included in the model; otherwise returns false
+            @param row: int
+            @param parent: QModelIndex*
+
+            @retval True/False
+        '''
+        if self._filter_accepts_row_self(row, parent):
+            return True
+
+        alpha = parent
+        while alpha.isValid():
+            if self._filter_accepts_row_self(alpha.row(), alpha.parent()):
+                return True
+            alpha = alpha.parent()
+
+        if self._index_has_accepted_children(row, parent):
+            return True
+
+        return False
+
+    def add_single_string_filter(self, column, string):
+        ''' add fixed string filter
+
+            @param column: key column
+            @param string: string to sort
+        '''
+        self.setFilterKeyColumn(column)
+        self.setFilterFixedString(string)
+
+    def _index_has_accepted_children(self, row, parent):
+        ''' '''
+        mindex = self.sourceModel().index(row, 0, parent)
+
+        if mindex.isValid():
+            for idx in range(self.sourceModel().rowCount(mindex)):
+                if self._filter_accepts_row_self(idx, mindex):
+                    return True
+                if self._index_has_accepted_children(idx, mindex):
+                    return True
+
+        return False
+
+    def _filter_accepts_row_self(self, row, parent):
+        ''' '''
+        return super(CapaExplorerSortFilterProxyModel, self).filterAcceptsRow(row, parent)
--- a/capa/ida/explorer/view.py
+++ b/capa/ida/explorer/view.py
@@ -0,0 +1,281 @@
+from PyQt5 import QtWidgets, QtCore, QtGui
+
+import idaapi
+import idc
+
+from capa.ida.explorer.model import CapaExplorerDataModel
+from capa.ida.explorer.item import CapaExplorerFunctionItem
+
+
+class CapaExplorerQtreeView(QtWidgets.QTreeView):
+    ''' capa explorer QTreeView implementation
+
+        view controls UI action responses and displays data from
+        CapaExplorerDataModel
+
+        view does not modify CapaExplorerDataModel directly - data
+        modifications should be implemented in CapaExplorerDataModel
+    '''
+
+    def __init__(self, model, parent=None):
+        ''' initialize CapaExplorerQTreeView
+
+            TODO
+
+            @param model: TODO
+            @param parent: TODO
+        '''
+        super(CapaExplorerQtreeView, self).__init__(parent)
+
+        self.setModel(model)
+
+        # TODO: get from parent??
+        self._model = model
+        self._parent = parent
+
+        # configure custom UI controls
+        self.setContextMenuPolicy(QtCore.Qt.CustomContextMenu)
+        self.setExpandsOnDoubleClick(False)
+        self.setSortingEnabled(True)
+        self._model.setDynamicSortFilter(False)
+
+        # configure view columns to auto-resize
+        for idx in range(CapaExplorerDataModel.COLUMN_COUNT):
+            self.header().setSectionResizeMode(idx, QtWidgets.QHeaderView.Interactive)
+
+        # connect slots to resize columns when expanded or collapsed
+        self.expanded.connect(self.resize_columns_to_content)
+        self.collapsed.connect(self.resize_columns_to_content)
+
+        # connect slots
+        self.customContextMenuRequested.connect(self._slot_custom_context_menu_requested)
+        self.doubleClicked.connect(self._slot_double_click)
+        # self.clicked.connect(self._slot_click)
+
+        self.setStyleSheet('QTreeView::item {padding-right: 15 px;padding-bottom: 2 px;}')
+
+    def reset(self):
+        ''' reset user interface changes
+
+            called when view should reset any user interface changes
+            made since the last reset e.g. IDA window highlighting
+        '''
+        self.collapseAll()
+        self.resize_columns_to_content()
+
+    def resize_columns_to_content(self):
+        ''' reset view columns to contents
+
+            TODO: prevent columns from shrinking
+        '''
+        self.header().resizeSections(QtWidgets.QHeaderView.ResizeToContents)
+
+    def _map_index_to_source_item(self, mindex):
+        ''' map proxy model index to source model item
+
+            @param mindex: QModelIndex*
+
+            @retval QObject*
+        '''
+        return self._model.mapToSource(mindex).internalPointer()
+
+    def _send_data_to_clipboard(self, data):
+        ''' copy data to the clipboard
+
+            @param data: data to be copied
+        '''
+        clip = QtWidgets.QApplication.clipboard()
+        clip.clear(mode=clip.Clipboard)
+        clip.setText(data, mode=clip.Clipboard)
+
+    def _new_action(self, display, data, slot):
+        ''' create action for context menu
+
+            @param display: text displayed to user in context menu
+            @param data: data passed to slot
+            @param slot: slot to connect
+
+            @retval QAction*
+        '''
+        action = QtWidgets.QAction(display, self._parent)
+        action.setData(data)
+        action.triggered.connect(lambda checked: slot(action))
+
+        return action
+
+    def _load_default_context_menu_actions(self, data):
+        ''' yield actions specific to function custom context menu
+
+            @param data: tuple
+
+            @yield QAction*
+        '''
+        default_actions = [
+            ('Copy column', data, self._slot_copy_column),
+            ('Copy row', data, self._slot_copy_row),
+            # ('Filter', data, self._slot_filter),
+        ]
+
+        # add default actions
+        for action in default_actions:
+            yield self._new_action(*action)
+
+    def _load_function_context_menu_actions(self, data):
+        ''' yield actions specific to function custom context menu
+
+            @param data: tuple
+
+            @yield QAction*
+        '''
+        function_actions = [
+            ('Rename function', data, self._slot_rename_function),
+        ]
+
+        # add function actions
+        for action in function_actions:
+            yield self._new_action(*action)
+
+        # add default actions
+        for action in self._load_default_context_menu_actions(data):
+            yield action
+
+    def _load_default_context_menu(self, pos, item, mindex):
+        ''' create default custom context menu
+
+            creates custom context menu containing default actions
+
+            @param pos: TODO
+            @param item: TODO
+            @param mindex: TODO
+
+            @retval QMenu*
+        '''
+        menu = QtWidgets.QMenu()
+
+        for action in self._load_default_context_menu_actions((pos, item, mindex)):
+            menu.addAction(action)
+
+        return menu
+
+    def _load_function_item_context_menu(self, pos, item, mindex):
+        ''' create function custom context menu
+
+            creates custom context menu containing actions specific to functions
+            and the default actions
+
+            @param pos: TODO
+            @param item: TODO
+            @param mindex: TODO
+
+            @retval QMenu*
+        '''
+        menu = QtWidgets.QMenu()
+
+        for action in self._load_function_context_menu_actions((pos, item, mindex)):
+            menu.addAction(action)
+
+        return menu
+
+    def _show_custom_context_menu(self, menu, pos):
+        ''' display custom context menu in view
+
+            @param menu: TODO
+            @param pos: TODO
+        '''
+        if not menu:
+            return
+
+        menu.exec_(self.viewport().mapToGlobal(pos))
+
+    def _slot_copy_column(self, action):
+        ''' slot connected to custom context menu
+
+            allows user to select a column and copy the data
+            to clipboard
+
+            @param action: QAction*
+        '''
+        _, item, mindex = action.data()
+        self._send_data_to_clipboard(item.data(mindex.column()))
+
+    def _slot_copy_row(self, action):
+        ''' slot connected to custom context menu
+
+            allows user to select a row and copy the space-delimeted
+            data to clipboard
+
+            @param action: QAction*
+        '''
+        _, item, _ = action.data()
+        self._send_data_to_clipboard(str(item))
+
+    def _slot_rename_function(self, action):
+        ''' slot connected to custom context menu
+
+            allows user to select a edit a function name and push
+            changes to IDA
+
+            @param action: QAction*
+        '''
+        _, item, mindex = action.data()
+
+        # make item temporary edit, reset after user is finished
+        item.setIsEditable(True)
+        self.edit(mindex)
+        item.setIsEditable(False)
+
+    def _slot_custom_context_menu_requested(self, pos):
+        ''' slot connected to custom context menu request
+
+            displays custom context menu to user containing action
+            relevant to the data item selected
+
+            @param pos: TODO
+        '''
+        mindex = self.indexAt(pos)
+
+        if not mindex.isValid():
+            return
+
+        item = self._map_index_to_source_item(mindex)
+        column = mindex.column()
+        menu = None
+
+        if CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION == column and isinstance(item, CapaExplorerFunctionItem):
+            # user hovered function item
+            menu = self._load_function_item_context_menu(pos, item, mindex)
+        else:
+            # user hovered default item
+            menu = self._load_default_context_menu(pos, item, mindex)
+
+        # show custom context menu at view position
+        self._show_custom_context_menu(menu, pos)
+
+    def _slot_click(self):
+        ''' slot connected to single click event '''
+        pass
+
+    def _slot_double_click(self, mindex):
+        ''' slot connected to double click event
+
+            @param mindex: QModelIndex*
+        '''
+        if not mindex.isValid():
+            return
+
+        item = self._map_index_to_source_item(mindex)
+        column = mindex.column()
+
+        if CapaExplorerDataModel.COLUMN_INDEX_VIRTUAL_ADDRESS == column:
+            # user double-clicked virtual address column - navigate IDA to address
+            try:
+                idc.jumpto(int(item.data(1), 16))
+            except ValueError:
+                pass
+
+        if CapaExplorerDataModel.COLUMN_INDEX_RULE_INFORMATION == column:
+            # user double-clicked information column - un/expand
+            if self.isExpanded(mindex):
+                self.collapse(mindex)
+            else:
+                self.expand(mindex)
--- a/capa/ida/helpers/init.py
+++ b/capa/ida/helpers/init.py
@@ -0,0 +1,19 @@
+import idaapi
+import idc
+
+
+def get_disasm_line(va):
+    ''' '''
+    return idc.generate_disasm_line(va, idc.GENDSM_FORCE_CODE)
+
+
+def is_func_start(ea):
+    ''' check if function stat exists at virtual address '''
+    f = idaapi.get_func(ea)
+    return f and f.start_ea == ea
+
+
+def get_func_start_ea(ea):
+    ''' '''
+    f = idaapi.get_func(ea)
+    return f if f is None else f.start_ea
--- a/capa/ida/ida_capa_explorer.py
+++ b/capa/ida/ida_capa_explorer.py
@@ -0,0 +1,459 @@
+import os
+import logging
+import collections
+
+from PyQt5.QtWidgets import (
+    QHeaderView,
+    QAbstractItemView,
+    QMenuBar,
+    QAction,
+    QTabWidget,
+    QWidget,
+    QTextEdit,
+    QMenu,
+    QApplication,
+    QVBoxLayout,
+    QToolTip,
+    QCheckBox,
+    QTableWidget,
+    QTableWidgetItem
+)
+from PyQt5.QtGui import QCursor, QIcon
+from PyQt5.QtCore import Qt
+
+import idaapi
+
+import capa.main
+import capa.rules
+import capa.features.extractors.ida
+
+from capa.ida.explorer.view import CapaExplorerQtreeView
+from capa.ida.explorer.model import CapaExplorerDataModel
+from capa.ida.explorer.proxy import CapaExplorerSortFilterProxyModel
+
+
+PLUGIN_NAME = 'capaex'
+
+
+logger = logging.getLogger(PLUGIN_NAME)
+
+
+class CapaExplorerIdaHooks(idaapi.UI_Hooks):
+
+    def __init__(self, screen_ea_changed_hook, action_hooks):
+        ''' facilitate IDA UI hooks
+
+            @param screen_ea_changed: TODO
+            @param action_hooks: TODO
+        '''
+        super(CapaExplorerIdaHooks, self).__init__()
+
+        self._screen_ea_changed_hook = screen_ea_changed_hook
+        self._process_action_hooks = action_hooks
+        self._process_action_handle = None
+        self._process_action_meta = {}
+
+    def preprocess_action(self, name):
+        ''' called prior to action completed
+
+            @param name: name of action defined by idagui.cfg
+
+            @retval must be 0
+        '''
+        self._process_action_handle = self._process_action_hooks.get(name, None)
+
+        if self._process_action_handle:
+            self._process_action_handle(self._process_action_meta)
+
+        # must return 0 for IDA
+        return 0
+
+    def postprocess_action(self):
+        ''' called after action completed '''
+        if not self._process_action_handle:
+            return
+
+        self._process_action_handle(self._process_action_meta, post=True)
+        self._reset()
+
+    def screen_ea_changed(self, curr_ea, prev_ea):
+        ''' called after screen ea is changed
+
+            @param curr_ea: current ea
+            @param prev_ea: prev ea
+        '''
+        self._screen_ea_changed_hook(idaapi.get_current_widget(), curr_ea, prev_ea)
+
+    def _reset(self):
+        ''' reset internal state '''
+        self._process_action_handle = None
+        self._process_action_meta.clear()
+
+
+class CapaExplorerForm(idaapi.PluginForm):
+
+    def __init__(self):
+        ''' '''
+        super(CapaExplorerForm, self).__init__()
+
+        self.form_title = PLUGIN_NAME
+        self.parent = None
+        self._file_loc = __file__
+        self._ida_hooks = None
+
+        # models
+        self._model_data = None
+        self._model_proxy = None
+
+        # user interface elements
+        self._view_limit_results_by_function = None
+        self._view_tree = None
+        self._view_summary = None
+        self._view_tabs = None
+        self._view_menu_bar = None
+
+    def OnCreate(self, form):
+        ''' '''
+        self.parent = self.FormToPyQtWidget(form)
+        self._load_interface()
+        self._load_capa_results()
+        self._load_ida_hooks()
+
+        self._view_tree.reset()
+
+        logger.info('form created.')
+
+    def Show(self):
+        ''' '''
+        return idaapi.PluginForm.Show(self, self.form_title, options=(
+            idaapi.PluginForm.WOPN_TAB | idaapi.PluginForm.WCLS_CLOSE_LATER
+        ))
+
+    def OnClose(self, form):
+        ''' form is closed '''
+        self._unload_ida_hooks()
+        self._ida_reset()
+
+        logger.info('form closed.')
+
+    def _load_interface(self):
+        ''' load user interface '''
+        # load models
+        self._model_data = CapaExplorerDataModel()
+        self._model_proxy = CapaExplorerSortFilterProxyModel()
+        self._model_proxy.setSourceModel(self._model_data)
+
+        # load tree
+        self._view_tree = CapaExplorerQtreeView(self._model_proxy, self.parent)
+
+        # load summary table
+        self._load_view_summary()
+
+        # load parent tab and children tab views
+        self._load_view_tabs()
+        self._load_view_checkbox_limit_by()
+        self._load_view_summary_tab()
+        self._load_view_tree_tab()
+
+        # load menu bar and sub menus
+        self._load_view_menu_bar()
+        self._load_file_menu()
+
+        # load parent view
+        self._load_view_parent()
+
+    def _load_view_tabs(self):
+        ''' '''
+        tabs = QTabWidget()
+
+        self._view_tabs = tabs
+
+    def _load_view_menu_bar(self):
+        ''' '''
+        bar = QMenuBar()
+        # bar.hovered.connect(self._slot_menu_bar_hovered)
+
+        self._view_menu_bar = bar
+
+    def _load_view_summary(self):
+        ''' '''
+        table = QTableWidget()
+
+        table.setColumnCount(4)
+        table.verticalHeader().setVisible(False)
+        table.setSortingEnabled(False)
+        table.setEditTriggers(QAbstractItemView.NoEditTriggers)
+        table.setFocusPolicy(Qt.NoFocus)
+        table.setSelectionMode(QAbstractItemView.NoSelection)
+        table.setHorizontalHeaderLabels([
+            'Objectives',
+            'Behaviors',
+            'Techniques',
+            'Rule Hits'
+        ])
+        table.horizontalHeader().setDefaultAlignment(Qt.AlignLeft)
+        table.setStyleSheet('QTableWidget::item { border: none; padding: 15px; }')
+        table.setShowGrid(False)
+
+        self._view_summary = table
+
+    def _load_view_checkbox_limit_by(self):
+        ''' '''
+        check = QCheckBox('Limit results to current function')
+        check.setChecked(False)
+        check.stateChanged.connect(self._slot_checkbox_limit_by_changed)
+
+        self._view_checkbox_limit_by = check
+
+    def _load_view_parent(self):
+        ''' load view parent '''
+        layout = QVBoxLayout()
+        layout.addWidget(self._view_tabs)
+        layout.setMenuBar(self._view_menu_bar)
+
+        self.parent.setLayout(layout)
+
+    def _load_view_tree_tab(self):
+        ''' load view tree tab '''
+        layout = QVBoxLayout()
+        layout.addWidget(self._view_checkbox_limit_by)
+        layout.addWidget(self._view_tree)
+
+        tab = QWidget()
+        tab.setLayout(layout)
+
+        self._view_tabs.addTab(tab, 'Tree View')
+
+    def _load_view_summary_tab(self):
+        ''' '''
+        layout = QVBoxLayout()
+        layout.addWidget(self._view_summary)
+
+        tab = QWidget()
+        tab.setLayout(layout)
+
+        self._view_tabs.addTab(tab, 'Summary')
+
+    def _load_file_menu(self):
+        ''' load file menu actions '''
+        actions = (
+            ('Reset view', 'Reset plugin view', self.reset),
+            ('Run analysis', 'Run capa analysis on current database', self.reload),
+        )
+
+        menu = self._view_menu_bar.addMenu('File')
+
+        for name, _, handle in actions:
+            action = QAction(name, self.parent)
+            action.triggered.connect(handle)
+            # action.setToolTip(tip)
+            menu.addAction(action)
+
+    def _load_ida_hooks(self):
+        ''' '''
+        action_hooks = {
+            'MakeName': self._ida_hook_rename,
+            'EditFunction': self._ida_hook_rename,
+        }
+
+        self._ida_hooks = CapaExplorerIdaHooks(self._ida_hook_screen_ea_changed, action_hooks)
+        self._ida_hooks.hook()
+
+    def _unload_ida_hooks(self):
+        ''' unhook IDA user interface '''
+        if self._ida_hooks:
+            self._ida_hooks.unhook()
+
+    def _ida_hook_rename(self, meta, post=False):
+        ''' hook for IDA rename action
+
+            called twice, once before action and once after
+            action completes
+
+            @param meta: TODO
+            @param post: TODO
+        '''
+        ea = idaapi.get_screen_ea()
+        if not ea or not capa.ida.helpers.is_func_start(ea):
+            return
+
+        curr_name = idaapi.get_name(ea)
+
+        if post:
+            # post action update data model w/ current name
+            self._model_data.update_function_name(meta.get('prev_name', ''), curr_name)
+        else:
+            # pre action so save current name for replacement later
+            meta['prev_name'] = curr_name
+
+    def _ida_hook_screen_ea_changed(self, widget, new_ea, old_ea):
+        ''' '''
+        if not self._view_checkbox_limit_by.isChecked():
+            # ignore if checkbox not selected
+            return
+
+        if idaapi.get_widget_type(widget) != idaapi.BWN_DISASM:
+            # ignore views other than asm
+            return
+
+        # attempt to map virtual addresses to function start addresses
+        new_func_start = capa.ida.helpers.get_func_start_ea(new_ea)
+        old_func_start = capa.ida.helpers.get_func_start_ea(old_ea)
+
+        if new_func_start and new_func_start == old_func_start:
+            # navigated within the same function - do nothing
+            return
+
+        if new_func_start:
+            # navigated to new function - filter for function start virtual address
+            match = capa.ida.explorer.item.ea_to_hex_str(new_func_start)
+        else:
+            # navigated to virtual address not in valid function - clear filter
+            match = ''
+
+        # filter on virtual address to avoid updating filter string if function name is changed
+        self._model_proxy.add_single_string_filter(CapaExplorerDataModel.COLUMN_INDEX_VIRTUAL_ADDRESS, match)
+
+    def _load_capa_results(self):
+        ''' '''
+        logger.info('-' * 80)
+        logger.info(' Using default embedded rules.')
+        logger.info(' ')
+        logger.info(' You can see the current default rule set here:')
+        logger.info('     https://github.com/fireeye/capa-rules')
+        logger.info('-' * 80)
+
+        rules_path = os.path.join(os.path.dirname(self._file_loc), '../..', 'rules')
+        rules = capa.main.get_rules(rules_path)
+        rules = capa.rules.RuleSet(rules)
+        results = capa.main.find_capabilities(rules, capa.features.extractors.ida.IdaFeatureExtractor(), True)
+
+        logger.info('analysis completed.')
+
+        self._model_data.render_capa_results(rules, results)
+        self._render_capa_summary(rules, results)
+
+        logger.info('render views completed.')
+
+    def _render_capa_summary(self, ruleset, results):
+        ''' render results summary table
+
+            keep sync with capa.main
+
+            @param ruleset: TODO
+            @param results: TODO
+        '''
+        rules = set(filter(lambda x: not ruleset.rules[x].meta.get('lib', False), results.keys()))
+        objectives = set()
+        behaviors = set()
+        techniques = set()
+
+        for rule in rules:
+            parts = ruleset.rules[rule].meta.get(capa.main.RULE_CATEGORY, '').split('/')
+            if len(parts) == 0 or list(parts) == ['']:
+                continue
+            if len(parts) > 0:
+                objective = parts[0].replace('-', ' ')
+                objectives.add(objective)
+            if len(parts) > 1:
+                behavior = parts[1].replace('-', ' ')
+                behaviors.add(behavior)
+            if len(parts) > 2:
+                technique = parts[2].replace('-', ' ')
+                techniques.add(technique)
+            if len(parts) > 3:
+                raise capa.rules.InvalidRule(capa.main.RULE_CATEGORY + " tag must have at most three components")
+
+        # set row count to max set size
+        self._view_summary.setRowCount(max(map(len, (rules, objectives, behaviors, techniques))))
+
+        # format rule hits
+        rules = map(lambda x: '%s (%d)' % (x, len(results[x])), rules)
+
+        # sort results
+        columns = list(map(lambda x: sorted(x, key=lambda s: s.lower()), (objectives, behaviors, techniques, rules)))
+
+        # load results into table by column
+        for idx, column in enumerate(columns):
+            self._load_view_summary_column(idx, column)
+
+        # resize columns to content
+        self._view_summary.resizeColumnsToContents()
+
+    def _load_view_summary_column(self, column, texts):
+        ''' '''
+        for row, text in enumerate(texts):
+            self._view_summary.setItem(row, column, QTableWidgetItem(text))
+
+    def _ida_reset(self):
+        ''' reset IDA user interface '''
+        self._model_data.reset()
+        self._view_tree.reset()
+        self._view_checkbox_limit_by.setChecked(False)
+
+    def reload(self):
+        ''' reload views and re-run capa analysis '''
+        self._ida_reset()
+        self._model_proxy.invalidate()
+        self._model_data.clear()
+        self._view_summary.setRowCount(0)
+        self._load_capa_results()
+
+        logger.info('reload complete.')
+        idaapi.info('%s reload completed.' % PLUGIN_NAME)
+
+    def reset(self):
+        ''' reset user interface elements
+
+            e.g. checkboxes and IDA highlighting
+        '''
+        self._ida_reset()
+
+        logger.info('reset completed.')
+        idaapi.info('%s reset completed.' % PLUGIN_NAME)
+
+    def _slot_menu_bar_hovered(self, action):
+        ''' display menu action tooltip
+
+            @param action: QAction*
+
+            @reference: https://stackoverflow.com/questions/21725119/why-wont-qtooltips-appear-on-qactions-within-a-qmenu
+        '''
+        QToolTip.showText(QCursor.pos(), action.toolTip(), self._view_menu_bar, self._view_menu_bar.actionGeometry(action))
+
+    def _slot_checkbox_limit_by_changed(self):
+        ''' slot activated if checkbox clicked
+
+            if checked, configure function filter if screen ea is located
+            in function, otherwise clear filter
+        '''
+        match = ''
+        if self._view_checkbox_limit_by.isChecked():
+            ea = capa.ida.helpers.get_func_start_ea(idaapi.get_screen_ea())
+            if ea:
+                match = capa.ida.explorer.item.ea_to_hex_str(ea)
+        self._model_proxy.add_single_string_filter(CapaExplorerDataModel.COLUMN_INDEX_VIRTUAL_ADDRESS, match)
+
+        self._view_tree.resize_columns_to_content()
+
+
+def main():
+    ''' TODO: move to idaapi.plugin_t class '''
+    logging.basicConfig(level=logging.INFO)
+
+    global CAPA_EXPLORER_FORM
+
+    try:
+        # there is an instance, reload it
+        CAPA_EXPLORER_FORM
+        CAPA_EXPLORER_FORM.Close()
+        CAPA_EXPLORER_FORM = CapaExplorerForm()
+    except Exception:
+        # there is no instance yet
+        CAPA_EXPLORER_FORM = CapaExplorerForm()
+
+    CAPA_EXPLORER_FORM.Show()
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/ida/ida_rule_generator.py
+++ b/capa/ida/ida_rule_generator.py
@@ -0,0 +1,284 @@
+# TODO documentation
+
+import logging
+import binascii
+import textwrap
+from collections import Counter, defaultdict
+
+from PyQt5 import QtWidgets, QtCore
+from PyQt5.QtWidgets import QTreeWidget, QTreeWidgetItem, QTextEdit, QHeaderView
+
+import idc
+import idaapi
+
+import capa
+import capa.main
+
+from capa.ida import plugin_helpers
+
+import capa.features.extractors.ida.helpers
+
+
+logger = logging.getLogger('rulegen')
+
+
+AUTHOR_NAME = ''
+COLOR_HIGHLIGHT = 0xD096FF
+
+
+def get_func_start(ea):
+    f = idaapi.get_func(ea)
+    if f:
+        return f.start_ea
+    else:
+        return None
+
+
+class Hooks(idaapi.UI_Hooks):
+    '''
+    Notifies the plugin when navigating to another function
+    NOTE: it uses the global variable FLEX to access the
+    PluginForm object. This looks nasty, maybe there is a better way?
+    '''
+
+    def screen_ea_changed(self, ea, prev_ea):
+        widget = idaapi.get_current_widget()
+        if idaapi.get_widget_type(widget) != idaapi.BWN_DISASM:
+            # Ignore non disassembly views
+            return
+
+        try:
+            f1 = get_func_start(ea)
+            f2 = get_func_start(prev_ea)
+
+            if f1 != f2:
+                # changed to another function
+                RULE_GEN_FORM.reload_features_tree()
+        except Exception as e:
+            logger.warn('exception: %s', e)
+
+
+class RuleGeneratorForm(idaapi.PluginForm):
+
+    def __init__(self):
+        super(RuleGeneratorForm, self).__init__()
+        self.title = 'capa rule generator'
+
+        self.parent = None
+        self.parent_items = {}
+        self.orig_colors = None
+
+        self.hooks = Hooks()  # dirty?
+        if self.hooks.hook():
+            logger.info('UI notification hook installed successfully')
+
+    def init_ui(self):
+        self.tree = QTreeWidget()
+        self.rule_text = QTextEdit()
+        self.rule_text.setMinimumWidth(350)
+
+        self.reload_features_tree()
+
+        button_reset = QtWidgets.QPushButton('&Reset')
+        button_reset.clicked.connect(self.reset)
+
+        h_layout = QtWidgets.QHBoxLayout()
+        v_layout = QtWidgets.QVBoxLayout()
+
+        h_layout.addWidget(self.tree)
+        h_layout.addWidget(self.rule_text)
+
+        v_layout.addLayout(h_layout)
+        v_layout.addWidget(button_reset)
+
+        self.parent.setLayout(v_layout)
+
+    def reset(self):
+        plugin_helpers.reset_selection(self.tree)
+        plugin_helpers.reset_colors(self.orig_colors)
+        self.rule_text.setText('')
+
+    def reload_features_tree(self):
+        self.reset()
+        self.tree.clear()
+        self.orig_colors = None
+        self.parent_items = {}
+
+        features = self.get_features()
+
+        if not features:
+            return
+
+        feature_vas = set().union(*features.values())
+        self.orig_colors = plugin_helpers.get_orig_color_feature_vas(feature_vas)
+        self.create_tree(features)
+        self.tree.update()
+
+    def get_features(self):
+        # load like standalone tool
+        extractor = capa.features.extractors.ida.IdaFeatureExtractor()
+        f = idaapi.get_func(idaapi.get_screen_ea())
+        if not f:
+            logger.info('function does not exist at 0x%x', idaapi.get_screen_ea())
+            return
+
+        return self.extract_function_features(f)
+
+    def extract_function_features(self, f):
+        features = defaultdict(set)
+        for bb in idaapi.FlowChart(f, flags=idaapi.FC_PREDS):
+            for insn in capa.features.extractors.ida.helpers.get_instructions_in_range(bb.start_ea, bb.end_ea):
+                for feature, va in capa.features.extractors.ida.insn.extract_features(f, bb, insn):
+                    features[feature].add(va)
+            for feature, va in capa.features.extractors.ida.basicblock.extract_features(f, bb):
+                features[feature].add(va)
+        return features
+
+    def create_tree(self, features):
+        self.tree.setMinimumWidth(400)
+        # self.tree.setMinimumHeight(300)
+        self.tree.setHeaderLabels(['Feature', 'Virtual Address', 'Disassembly'])
+        # auto resize columns
+        self.tree.header().setSectionResizeMode(QHeaderView.ResizeToContents)
+        self.tree.itemClicked.connect(self.on_item_clicked)
+
+        # features sorted by location of first occurrence
+        # TODO fix characteristic features display and rule text
+        for feature, vas in sorted(features.items(), key=lambda k: sorted(k[1])):
+            # level 0
+            if type(feature) not in self.parent_items:
+                self.parent_items[type(feature)] = plugin_helpers.add_child_item(self.tree, [feature.name.lower()])
+
+            # level 1
+            if feature not in self.parent_items:
+                self.parent_items[feature] = plugin_helpers.add_child_item(self.parent_items[type(feature)], [str(feature)])
+
+            # level n > 1
+            if len(vas) > 1:
+                for va in sorted(vas):
+                    plugin_helpers.add_child_item(self.parent_items[feature], [str(feature), '0x%X' % va, plugin_helpers.get_disasm_line(va)], feature)
+            else:
+                va = vas.pop()
+                self.parent_items[feature].setText(0, str(feature))
+                self.parent_items[feature].setText(1, '0x%X' % va)
+                self.parent_items[feature].setText(2, plugin_helpers.get_disasm_line(va))
+                self.parent_items[feature].setData(0, 0x100, feature)
+
+    # @QtCore.pyqtSlot(QTreeWidgetItem, int)
+    def on_item_clicked(self, it, col):
+        # logger.debug('clicked %s, %s, %s', it, col, it.text(col))
+        # jump to address
+        if col == 1 and it.text(col):
+            va = int(it.text(col), 0x10)
+            if va:
+                idc.jumpto(va)
+
+        # highlight in disassembly
+        plugin_helpers.reset_colors(self.orig_colors)
+        selected = self.get_selected_items()
+        for va in selected.keys():
+            idc.set_color(va, idc.CIC_ITEM, COLOR_HIGHLIGHT)
+
+        self.update_rule_text()
+
+    def update_rule_text(self):
+        features = self.get_selected_items().values()
+        rule = self.get_rule_from_features(features)
+        self.rule_text.setText(rule)
+
+    def get_rule_from_features(self, features):
+        rule_parts = []
+        counted = zip(Counter(features).keys(),    # equals to list(set(words))
+                      Counter(features).values())  # counts the elements' frequency
+
+        # single features
+        for k, v in filter(lambda t: t[1] == 1, counted):
+            # TODO args to hex if int
+            if k.name.lower() == 'bytes':
+                # Convert raw bytes to uppercase hex representation (e.g., '12 34 56')
+                upper_hex_bytes = binascii.hexlify(args_to_str(k.args)).upper()
+                rule_value_str = ''
+                for i in range(0, len(upper_hex_bytes), 2):
+                    rule_value_str += upper_hex_bytes[i:i + 2] + ' '
+                r = '    - %s: %s' % (k.name.lower(), rule_value_str)
+            else:
+                r = '    - %s: %s' % (k.name.lower(), args_to_str(k.args))
+            rule_parts.append(r)
+
+        # counted features
+        for k, v in filter(lambda t: t[1] > 1, counted):
+            r = '    - count(%s): %d' % (str(k), v)
+            rule_parts.append(r)
+
+        rule_prefix = textwrap.dedent('''
+        rule:
+          meta:
+            name:
+            author: %s
+            scope: function
+            examples:
+              - %s:0x%X
+          features:
+        ''' % (AUTHOR_NAME, idc.retrieve_input_file_md5(), get_func_start(idc.here()))).strip()
+        return '%s\n%s' % (rule_prefix, '\n'.join(sorted(rule_parts)))
+
+    # TODO merge into capa_idautils, get feature data
+    def get_selected_items(self):
+        selected = {}
+        iterator = QtWidgets.QTreeWidgetItemIterator(self.tree, QtWidgets.QTreeWidgetItemIterator.Checked)
+        while iterator.value():
+            item = iterator.value()
+            if item.text(1):
+                # logger.debug('selected %s, %s, %s', item.text(1), item.text(0), item.data(0, 0x100))
+                selected[int(item.text(1), 0x10)] = item.data(0, 0x100)
+            iterator += 1
+        return selected
+
+    # ----------------------------------------------------------
+    # IDA Plugin API
+    # ----------------------------------------------------------
+    def OnCreate(self, form):
+        self.parent = self.FormToPyQtWidget(form)
+        self.init_ui()
+
+    def Show(self):
+        return idaapi.PluginForm.Show(self, self.title, options=(
+            idaapi.PluginForm.WOPN_RESTORE
+            | idaapi.PluginForm.WOPN_PERSIST
+        ))
+
+    def OnClose(self, form):
+        self.reset()
+        if self.hooks.unhook():
+            logger.info('UI notification hook uninstalled successfully')
+        logger.info('RuleGeneratorForm closed')
+
+
+def args_to_str(args):
+    a = []
+    for arg in args:
+        if (isinstance(arg, int) or isinstance(arg, long)) and arg > 10:
+            a.append('0x%X' % arg)
+        else:
+            a.append(str(arg))
+    return ','.join(a)
+
+
+def main():
+    logging.basicConfig(level=logging.INFO)
+
+    global RULE_GEN_FORM
+    try:
+        # there is an instance, reload it
+        RULE_GEN_FORM
+        RULE_GEN_FORM.Close()
+        RULE_GEN_FORM = RuleGeneratorForm()
+    except Exception:
+        # there is no instance yet
+        RULE_GEN_FORM = RuleGeneratorForm()
+
+    RULE_GEN_FORM.Show()
+
+
+if __name__ == '__main__':
+    main()
--- a/capa/ida/plugin_helpers.py
+++ b/capa/ida/plugin_helpers.py
@@ -0,0 +1,93 @@
+import os
+import logging
+
+from PyQt5.QtWidgets import QTreeWidgetItem, QTreeWidgetItemIterator
+from PyQt5.QtCore import Qt
+
+import idc
+import idaapi
+
+
+CAPA_EXTENSION = '.capas'
+
+
+logger = logging.getLogger('capa_ida')
+
+
+def get_input_file(freeze=True):
+    '''
+    get input file path
+
+        freeze (bool): if True, get freeze file if it exists
+    '''
+    # try original file in same directory as idb/i64 without idb/i64 file extension
+    input_file = idc.get_idb_path()[:-4]
+
+    if freeze:
+        # use frozen file if it exists
+        freeze_file_cand = '%s%s' % (input_file, CAPA_EXTENSION)
+        if os.path.isfile(freeze_file_cand):
+            return freeze_file_cand
+
+    if not os.path.isfile(input_file):
+        # TM naming
+        input_file = '%s.mal_' % idc.get_idb_path()[:-4]
+        if not os.path.isfile(input_file):
+            input_file = idaapi.ask_file(0, '*.*', 'Please specify input file.')
+    if not input_file:
+        raise ValueError('could not find input file')
+    return input_file
+
+
+def get_orig_color_feature_vas(vas):
+    orig_colors = {}
+    for va in vas:
+        orig_colors[va] = idc.get_color(va, idc.CIC_ITEM)
+    return orig_colors
+
+
+def reset_colors(orig_colors):
+    if orig_colors:
+        for va, color in orig_colors.iteritems():
+            idc.set_color(va, idc.CIC_ITEM, orig_colors[va])
+
+
+def reset_selection(tree):
+    iterator = QTreeWidgetItemIterator(tree, QTreeWidgetItemIterator.Checked)
+    while iterator.value():
+        item = iterator.value()
+        item.setCheckState(0, Qt.Unchecked)  # column, state
+        iterator += 1
+
+
+def get_disasm_line(va):
+    return idc.generate_disasm_line(va, idc.GENDSM_FORCE_CODE)
+
+
+def get_selected_items(tree, skip_level_1=False):
+    selected = []
+    iterator = QTreeWidgetItemIterator(tree, QTreeWidgetItemIterator.Checked)
+    while iterator.value():
+        item = iterator.value()
+        if skip_level_1:
+            # hacky way to check if item is at level 1, if so, skip
+            # alternative, check if text in disasm column
+            if item.parent() and item.parent().parent() is None:
+                iterator += 1
+                continue
+        if item.text(1):
+            # logger.debug('selected %s, %s', item.text(0), item.text(1))
+            selected.append(int(item.text(1), 0x10))
+        iterator += 1
+    return selected
+
+
+def add_child_item(parent, values, feature=None):
+    child = QTreeWidgetItem(parent)
+    child.setFlags(child.flags() | Qt.ItemIsTristate | Qt.ItemIsUserCheckable)
+    for i, v in enumerate(values):
+        child.setText(i, v)
+        if feature:
+            child.setData(0, 0x100, feature)
+        child.setCheckState(0, Qt.Unchecked)
+    return child
--- a/capa/main.py
+++ b/capa/main.py
@@ -0,0 +1,777 @@
+#!/usr/bin/env python2
+'''
+capa - detect capabilities in programs.
+'''
+import os
+import os.path
+import sys
+import logging
+import collections
+
+import tqdm
+import argparse
+
+import capa.rules
+import capa.engine
+import capa.features
+import capa.features.freeze
+import capa.features.extractors
+
+from capa.helpers import oint
+
+
+SUPPORTED_FILE_MAGIC = set(['MZ'])
+
+
+logger = logging.getLogger('capa')
+
+
+def set_vivisect_log_level(level):
+    logging.getLogger('vivisect').setLevel(level)
+    logging.getLogger('vtrace').setLevel(level)
+    logging.getLogger('envi').setLevel(level)
+
+
+def find_function_capabilities(ruleset, extractor, f):
+    # contains features from:
+    #  - insns
+    #  - function
+    function_features = collections.defaultdict(set)
+    bb_matches = collections.defaultdict(list)
+
+    for feature, va in extractor.extract_function_features(f):
+        function_features[feature].add(va)
+
+    for bb in extractor.get_basic_blocks(f):
+        # contains features from:
+        #  - insns
+        #  - basic blocks
+        bb_features = collections.defaultdict(set)
+
+        for feature, va in extractor.extract_basic_block_features(f, bb):
+            bb_features[feature].add(va)
+
+        for insn in extractor.get_instructions(f, bb):
+            for feature, va in extractor.extract_insn_features(f, bb, insn):
+                bb_features[feature].add(va)
+                function_features[feature].add(va)
+
+        _, matches = capa.engine.match(ruleset.basic_block_rules, bb_features, oint(bb))
+
+        for rule_name, res in matches.items():
+            bb_matches[rule_name].extend(res)
+            for va, _ in res:
+                function_features[capa.features.MatchedRule(rule_name)].add(va)
+
+    _, function_matches = capa.engine.match(ruleset.function_rules, function_features, oint(f))
+    return function_matches, bb_matches
+
+
+def find_file_capabilities(ruleset, extractor, function_features):
+    file_features = collections.defaultdict(set)
+
+    for feature, va in extractor.extract_file_features():
+        # not all file features may have virtual addresses.
+        # if not, then at least ensure the feature shows up in the index.
+        # the set of addresses will still be empty.
+        if va:
+            file_features[feature].add(va)
+        else:
+            if feature not in file_features:
+                file_features[feature] = set()
+
+    logger.info('analyzed file and extracted %d features', len(file_features))
+
+    file_features.update(function_features)
+
+    _, matches = capa.engine.match(ruleset.file_rules, file_features, 0x0)
+    return matches
+
+
+def find_capabilities(ruleset, extractor, disable_progress=None):
+    all_function_matches = collections.defaultdict(list)
+    all_bb_matches = collections.defaultdict(list)
+
+    for f in tqdm.tqdm(extractor.get_functions(), disable=disable_progress, unit=' functions'):
+        function_matches, bb_matches = find_function_capabilities(ruleset, extractor, f)
+        for rule_name, res in function_matches.items():
+            all_function_matches[rule_name].extend(res)
+        for rule_name, res in bb_matches.items():
+            all_bb_matches[rule_name].extend(res)
+
+    # mapping from matched rule feature to set of addresses at which it matched.
+    # type: Dict[MatchedRule, Set[int]]
+    function_features = {capa.features.MatchedRule(rule_name): set(map(lambda p: p[0], results))
+                         for rule_name, results in all_function_matches.items()}
+
+    all_file_matches = find_file_capabilities(ruleset, extractor, function_features)
+
+    matches = {}
+    matches.update(all_bb_matches)
+    matches.update(all_function_matches)
+    matches.update(all_file_matches)
+    return matches
+
+
+def pluck_meta(rules, key):
+    for rule in rules:
+        value = rule.meta.get(key)
+        if value:
+            yield value
+
+
+def get_dispositions(matched_rules):
+    for disposition in pluck_meta(matched_rules, 'maec/analysis-conclusion'):
+        yield disposition
+
+    for disposition in pluck_meta(matched_rules, 'maec/analysis-conclusion-ov'):
+        yield disposition
+
+
+def get_roles(matched_rules):
+    for role in pluck_meta(matched_rules, 'maec/malware-category'):
+        yield role
+
+    for role in pluck_meta(matched_rules, 'maec/malware-category-ov'):
+        yield role
+
+
+RULE_CATEGORY = 'rule-category'
+
+
+def is_other_feature_rule(rule):
+    '''
+    does this rule *not* have any of:
+      - maec/malware-category
+      - maec/analysis-conclusion
+      - rule-category
+
+    if so, it will be placed into the "other features" bucket
+    '''
+    if rule.meta.get('lib', False):
+        return False
+
+    for meta in ('maec/analysis-conclusion',
+                 'maec/analysis-conclusion-ov',
+                 'maec/malware-category',
+                 'maec/malware-category-ov',
+                 RULE_CATEGORY):
+        if meta in rule.meta:
+            return False
+    return True
+
+
+def render_capabilities_default(ruleset, results):
+    rules = [ruleset.rules[rule_name] for rule_name in results.keys()]
+
+    # we render the highest level conclusions first:
+    #
+    #  1. is it malware?
+    #  2. what is the role? (dropper, backdoor, etc.)
+    #
+    # after this, we'll enumerate the specific objectives, behaviors, and techniques.
+    dispositions = list(sorted(get_dispositions(rules)))
+    if dispositions:
+        print('disposition: ' + ', '.join(dispositions))
+
+    categories = list(sorted(get_roles(rules)))
+    if categories:
+        print('role: ' + ', '.join(categories))
+
+    # rules may have a meta tag `rule-category` that specifies:
+    #
+    #     rule-category: $objective[/$behavior[/$technique]]
+    #
+    # this classification describes a tree of increasingly specific conclusions.
+    # the tree allows us to tie a high-level conclusion, e.g. an objective, to
+    #   the evidence of this - the behaviors, techniques, rules, and ultimately, features.
+
+    # this data structure is a nested map:
+    #
+    #     objective name -> behavior name -> technique name -> rule name -> rule
+    #
+    # at each level, a matched rule is also legal.
+    # this indicates that only a portion of the rule-category was provided.
+    o = collections.defaultdict(
+        lambda: collections.defaultdict(
+            lambda: collections.defaultdict(
+                dict
+            )
+        )
+    )
+    objectives = set()
+    behaviors = set()
+    techniques = set()
+
+    for rule in rules:
+        objective = None
+        behavior = None
+        technique = None
+
+        parts = rule.meta.get(RULE_CATEGORY, '').split('/')
+        if len(parts) == 0 or list(parts) == ['']:
+            continue
+        if len(parts) > 0:
+            objective = parts[0].replace('-', ' ')
+            objectives.add(objective)
+        if len(parts) > 1:
+            behavior = parts[1].replace('-', ' ')
+            behaviors.add(behavior)
+        if len(parts) > 2:
+            technique = parts[2].replace('-', ' ')
+            techniques.add(technique)
+        if len(parts) > 3:
+            raise capa.rules.InvalidRule(RULE_CATEGORY + " tag must have at most three components")
+
+        if technique:
+            o[objective][behavior][technique][rule.name] = rule
+        elif behavior:
+            o[objective][behavior][rule.name] = rule
+        elif objective:
+            o[objective][rule.name] = rule
+
+    if objectives:
+        print('\nobjectives:')
+        for objective in sorted(objectives):
+            print('  ' + objective)
+
+    if behaviors:
+        print('\nbehaviors:')
+        for behavior in sorted(behaviors):
+            print('  ' + behavior)
+
+    if techniques:
+        print('\ntechniques:')
+        for technique in sorted(techniques):
+            print('  ' + technique)
+
+    other_features = list(filter(is_other_feature_rule, rules))
+    if other_features:
+        print('\nother features:')
+        for rule in sorted(map(lambda r: r.name, other_features)):
+            print('  ' + rule)
+
+    # now, render a tree of the objectives, behaviors, techniques, and matched rule names.
+    # it will look something like:
+    #
+    #     details:
+    #       load data
+    #         load data from self
+    #           load data from resource
+    #             extract resource via API
+    #
+    # implementation note:
+    # when we enumerate the items in this tree, we have two cases:
+    #
+    #   1. usually, we'll get a pair (objective name, map of children); but its possible that
+    #   2. we'll get a pair (rule name, rule instance)
+    #
+    # this is why we do the `ininstance(..., Rule)` check below.
+    #
+    # i believe the alternative, to have separate data structures for the tree and rules,
+    # is probably more code and more confusing.
+    if o:
+        print('\ndetails:')
+        for objective, behaviors in o.items():
+            print('  ' + objective)
+
+            if isinstance(behaviors, capa.rules.Rule):
+                continue
+            for behavior, techniques in behaviors.items():
+                print('    ' + behavior)
+
+                if isinstance(techniques, capa.rules.Rule):
+                    continue
+                for technique, rules in techniques.items():
+                    print('      ' + technique)
+
+                    if isinstance(rules, capa.rules.Rule):
+                        continue
+                    for rule in rules.keys():
+                        print('        ' + rule)
+
+
+def render_capabilities_concise(results):
+    '''
+    print the matching rules, newline separated.
+
+    example:
+
+        foo
+        bar
+        mimikatz::kull_m_arc_sendrecv
+    '''
+    for rule in sorted(results.keys()):
+        print(rule)
+
+
+def render_capabilities_verbose(results):
+    '''
+    print the matching rules, and the functions in which they matched.
+
+    example:
+
+        foo:
+          - 0x401000
+          - 0x401005
+        bar:
+          - 0x402044
+          - 0x402076
+        mimikatz::kull_m_arc_sendrecv:
+          - 0x40105d
+    '''
+    for rule, ress in results.items():
+        print('%s:' % (rule))
+        seen = set([])
+        for (fva, _) in sorted(ress, key=lambda p: p[0]):
+            if fva in seen:
+                continue
+            print('  - 0x%x' % (fva))
+            seen.add(fva)
+
+
+def render_result(res, indent=''):
+    '''
+    render the given Result to stdout.
+
+    args:
+      res (capa.engine.Result)
+      indent (str)
+    '''
+    # prune failing branches
+    if not res.success:
+        return
+
+    if isinstance(res.statement, capa.engine.Some):
+        if res.statement.count == 0:
+            # we asked for optional, so we'll match even if no children matched.
+            # but in this case, its not worth rendering the optional node.
+            if sum(map(lambda c: c.success, res.children)) > 0:
+                print('%soptional:' % indent)
+        else:
+            print("%s%d or more" % (indent, res.statement.count))
+    elif not isinstance(res.statement, (capa.features.Feature, capa.engine.Element, capa.engine.Range, capa.engine.Regex)):
+        # when rending a structural node (and/or/not),
+        #  then we only care about the node name.
+        #
+        # for example:
+        #
+        #     and:
+        #       Number(0x3136b0): True
+        #       Number(0x3136b0): True
+        print('%s%s:' % (indent, res.statement.name.lower()))
+    else:
+        # but when rendering a Feature, want to see any arguments to it
+        #
+        # for example:
+        #
+        #     Number(0x3136b0): True
+        print('%s%s:' % (indent, res.statement))
+        for location in sorted(res.locations):
+            print('%s  - virtual address: 0x%x' % (indent, location))
+
+    for children in res.children:
+        render_result(children, indent=indent + '  ')
+
+
+def render_capabilities_vverbose(results):
+    '''
+    print the matching rules, the functions in which they matched,
+      and the logic tree with annotated matching features.
+
+    example:
+
+        function mimikatz::kull_m_arc_sendrecv:
+          - 0x40105d
+              Or:
+                And:
+                  string("ACR  > "):
+                    - virtual address: 0x401089
+                  number(0x3136b0):
+                    - virtual address: 0x4010c8
+    '''
+    for rule, ress in results.items():
+        print('rule %s:' % (rule))
+        for (fva, res) in sorted(ress, key=lambda p: p[0]):
+            print('  - function 0x%x:' % (fva))
+            render_result(res, indent='      ')
+
+
+def appears_rule_cat(rules, capabilities, rule_cat):
+    for rule_name in capabilities.keys():
+        if rules.rules[rule_name].meta.get('rule-category', '').startswith(rule_cat):
+            return True
+    return False
+
+
+def is_supported_file_type(sample):
+    '''
+    Return if this is a supported file based on magic header values
+    '''
+    with open(sample, 'rb') as f:
+        magic = f.read(2)
+    if magic in SUPPORTED_FILE_MAGIC:
+        return True
+    else:
+        return False
+
+
+def get_shellcode_vw(sample, arch='auto'):
+    '''
+    Return shellcode workspace using explicit arch or via auto detect
+    '''
+    import viv_utils
+    with open(sample, 'rb') as f:
+        sample_bytes = f.read()
+    if arch == 'auto':
+        # choose arch with most functions, idea by Jay G.
+        vw_cands = []
+        for arch in ['i386', 'amd64']:
+            vw_cands.append(viv_utils.getShellcodeWorkspace(sample_bytes, arch))
+        if not vw_cands:
+            raise ValueError('could not generate vivisect workspace')
+        vw = max(vw_cands, key=lambda vw: len(vw.getFunctions()))
+    else:
+        vw = viv_utils.getShellcodeWorkspace(sample_bytes, arch)
+    vw.setMeta('Format', 'blob')  # TODO fix in viv_utils
+    return vw
+
+
+def get_meta_str(vw):
+    '''
+    Return workspace meta information string
+    '''
+    meta = []
+    for k in ['Format', 'Platform', 'Architecture']:
+        if k in vw.metadata:
+            meta.append('%s: %s' % (k.lower(), vw.metadata[k]))
+    return '%s, number of functions: %d' % (', '.join(meta), len(vw.getFunctions()))
+
+
+class UnsupportedFormatError(ValueError):
+    pass
+
+
+def get_workspace(path, format):
+    import viv_utils
+    logger.info('generating vivisect workspace for: %s', path)
+    if format == 'auto':
+        if not is_supported_file_type(path):
+            raise UnsupportedFormatError()
+        vw = viv_utils.getWorkspace(path)
+    elif format == 'pe':
+        vw = viv_utils.getWorkspace(path)
+    elif format == 'sc32':
+        vw = get_shellcode_vw(path, arch='i386')
+    elif format == 'sc64':
+        vw = get_shellcode_vw(path, arch='amd64')
+    logger.info('%s', get_meta_str(vw))
+    return vw
+
+
+def get_extractor_py2(path, format):
+    import capa.features.extractors.viv
+    vw = get_workspace(path, format)
+    return capa.features.extractors.viv.VivisectFeatureExtractor(vw, path)
+
+
+class UnsupportedRuntimeError(RuntimeError):
+    pass
+
+
+def get_extractor_py3(path, format):
+    raise UnsupportedRuntimeError()
+
+
+def get_extractor(path, format):
+    '''
+    raises:
+      UnsupportedFormatError:
+    '''
+    if sys.version_info >= (3, 0):
+        return get_extractor_py3(path, format)
+    else:
+        return get_extractor_py2(path, format)
+
+
+def is_nursery_rule_path(path):
+    '''
+    The nursery is a spot for rules that have not yet been fully polished.
+    For example, they may not have references to public example of a technique.
+    Yet, we still want to capture and report on their matches.
+    The nursery is currently a subdirectory of the rules directory with that name.
+
+    When nursery rules are loaded, their metadata section should be updated with:
+      `nursery=True`.
+    '''
+    return 'nursery' in path
+
+
+def get_rules(rule_path):
+    if not os.path.exists(rule_path):
+        raise IOError('%s does not exist or cannot be accessed' % rule_path)
+
+    rules = []
+    if os.path.isfile(rule_path):
+        logger.info('reading rule file: %s', rule_path)
+        with open(rule_path, 'rb') as f:
+            rule = capa.rules.Rule.from_yaml(f.read().decode('utf-8'))
+
+            if is_nursery_rule_path(root):
+                rule.meta['nursery'] = True
+
+            rules.append(rule)
+            logger.debug('rule: %s scope: %s', rule.name, rule.scope)
+
+    elif os.path.isdir(rule_path):
+        logger.info('reading rules from directory %s', rule_path)
+        for root, dirs, files in os.walk(rule_path):
+            for file in files:
+                if not file.endswith('.yml'):
+                    logger.warning('skipping non-.yml file: %s', file)
+                    continue
+
+                path = os.path.join(root, file)
+                logger.debug('reading rule file: %s', path)
+                try:
+                    rule = capa.rules.Rule.from_yaml_file(path)
+                except capa.rules.InvalidRule:
+                    raise
+                else:
+                    if is_nursery_rule_path(root):
+                        rule.meta['nursery'] = True
+
+                    rules.append(rule)
+                    logger.debug('rule: %s scope: %s', rule.name, rule.scope)
+    return rules
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    formats = [
+        ('auto', '(default) detect file type automatically'),
+        ('pe', 'Windows PE file'),
+        ('sc32', '32-bit shellcode'),
+        ('sc64', '64-bit shellcode'),
+        ('freeze', 'features previously frozen by capa'),
+    ]
+    format_help = ', '.join(['%s: %s' % (f[0], f[1]) for f in formats])
+
+    parser = argparse.ArgumentParser(description='detect capabilities in programs.')
+    parser.add_argument('sample', type=str,
+                        help='Path to sample to analyze')
+    parser.add_argument('-r', '--rules', type=str, default='(embedded rules)',
+                        help='Path to rule file or directory, use embedded rules by default')
+    parser.add_argument('-t', '--tag', type=str,
+                        help='Filter on rule meta field values')
+    parser.add_argument('-v', '--verbose', action='store_true',
+                        help='Enable verbose output')
+    parser.add_argument('-vv', '--vverbose', action='store_true',
+                        help='Enable very verbose output')
+    parser.add_argument('-q', '--quiet', action='store_true',
+                        help='Disable all output but errors')
+    parser.add_argument('-f', '--format', choices=[f[0] for f in formats], default='auto',
+                        help='Select sample format, %s' % format_help)
+    args = parser.parse_args(args=argv)
+
+    if args.quiet:
+        logging.basicConfig(level=logging.ERROR)
+        logging.getLogger().setLevel(logging.ERROR)
+    elif args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    # disable vivisect-related logging, it's verbose and not relevant for capa users
+    set_vivisect_log_level(logging.CRITICAL)
+
+    # py2 doesn't know about cp65001, which is a variant of utf-8 on windows
+    # tqdm bails when trying to render the progress bar in this setup.
+    # because cp65001 is utf-8, we just map that codepage to the utf-8 codec.
+    # see #380 and: https://stackoverflow.com/a/3259271/87207
+    import codecs
+    codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)
+
+    if args.rules == '(embedded rules)':
+        logger.info('-' * 80)
+        logger.info(' Using default embedded rules.')
+        logger.info(' To provide your own rules, use the form `capa.exe  ./path/to/rules/  /path/to/mal.exe`.')
+        logger.info(' You can see the current default rule set here:')
+        logger.info('     https://github.com/fireeye/capa-rules')
+        logger.info('-' * 80)
+
+        if hasattr(sys, 'frozen') and hasattr(sys, '_MEIPASS'):
+            logger.debug('detected running under PyInstaller')
+            args.rules = os.path.join(sys._MEIPASS, 'rules')
+            logger.debug('default rule path (PyInstaller method): %s', args.rules)
+        else:
+            logger.debug('detected running from source')
+            args.rules = os.path.join(os.path.dirname(__file__), '..', 'rules')
+            logger.debug('default rule path (source method): %s', args.rules)
+    else:
+        logger.info('using rules path: %s', args.rules)
+
+    try:
+        rules = get_rules(args.rules)
+        rules = capa.rules.RuleSet(rules)
+        logger.info('successfully loaded %s rules', len(rules))
+        if args.tag:
+            rules = rules.filter_rules_by_meta(args.tag)
+            logger.info('selected %s rules', len(rules))
+    except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
+        logger.error('%s', str(e))
+        return -1
+
+    with open(args.sample, 'rb') as f:
+        taste = f.read(8)
+
+    if ((args.format == 'freeze')
+            or (args.format == 'auto' and capa.features.freeze.is_freeze(taste))):
+        with open(args.sample, 'rb') as f:
+            extractor = capa.features.freeze.load(f.read())
+    else:
+        try:
+            extractor = get_extractor(args.sample, args.format)
+        except UnsupportedFormatError:
+            logger.error("-" * 80)
+            logger.error(" Input file does not appear to be a PE file.")
+            logger.error(" ")
+            logger.error(" Today, capa currently only supports analyzing PE files (or shellcode, when using --format sc32|sc64).")
+            logger.error(" If you don't know the input file type, you can try using the `file` utility to guess it.")
+            logger.error("-" * 80)
+            return -1
+        except UnsupportedRuntimeError:
+            logger.error("-" * 80)
+            logger.error(" Unsupported runtime or Python interpreter.")
+            logger.error(" ")
+            logger.error(" Today, capa supports running under Python 2.7 using Vivisect for binary analysis.")
+            logger.error(" It can also run within IDA Pro, using either Python 2.7 or 3.5+.")
+            logger.error(" ")
+            logger.error(" If you're seeing this message on the command line, please ensure you're running Python 2.7.")
+            logger.error("-" * 80)
+            return -1
+
+    capabilities = find_capabilities(rules, extractor)
+
+    if appears_rule_cat(rules, capabilities, 'other-features/installer/'):
+        logger.warning("-" * 80)
+        logger.warning(" This sample appears to be an installer.")
+        logger.warning(" ")
+        logger.warning(" capa cannot handle installers well. This means the results may be misleading or incomplete.")
+        logger.warning(" You should try to understand the install mechanism and analyze created files with capa.")
+        logger.warning(" ")
+        logger.warning(" Use -v or -vv if you really want to see the capabilities identified by capa.")
+        logger.warning("-" * 80)
+        # capa will likely detect installer specific functionality.
+        # this is probably not what the user wants.
+        #
+        # do show the output in verbose mode, though.
+        if not (args.verbose or args.vverbose):
+            return -1
+
+    if appears_rule_cat(rules, capabilities, 'other-features/compiled-to-dot-net'):
+        logger.warning("-" * 80)
+        logger.warning(" This sample appears to be a .NET module.")
+        logger.warning(" ")
+        logger.warning(" .NET is a cross-platform framework for running managed applications.")
+        logger.warning(
+            " Today, capa cannot handle non-native files. This means that the results may be misleading or incomplete.")
+        logger.warning(" You may have to analyze the file manually, using a tool like the .NET decompiler dnSpy.")
+        logger.warning(" ")
+        logger.warning(" Use -v or -vv if you really want to see the capabilities identified by capa.")
+        logger.warning("-" * 80)
+        # capa won't detect much in .NET samples.
+        # it might match some file-level things.
+        # for consistency, bail on things that we don't support.
+        #
+        # do show the output in verbose mode, though.
+        if not (args.verbose or args.vverbose):
+            return -1
+
+    if appears_rule_cat(rules, capabilities, 'other-features/compiled-with-autoit'):
+        logger.warning("-" * 80)
+        logger.warning(" This sample appears to be compiled with AutoIt.")
+        logger.warning(" ")
+        logger.warning(" AutoIt is a freeware BASIC-like scripting language designed for automating the Windows GUI.")
+        logger.warning(
+            " Today, capa cannot handle AutoIt scripts. This means that the results will be misleading or incomplete.")
+        logger.warning(" You may have to analyze the file manually, using a tool like the AutoIt decompiler MyAut2Exe.")
+        logger.warning(" ")
+        logger.warning(" Use -v or -vv if you really want to see the capabilities identified by capa.")
+        logger.warning("-" * 80)
+        # capa will detect dozens of capabilities for AutoIt samples,
+        # but these are due to the AutoIt runtime, not the payload script.
+        # so, don't confuse the user with FP matches - bail instead
+        #
+        # do show the output in verbose mode, though.
+        if not (args.verbose or args.vverbose):
+            return -1
+
+    if appears_rule_cat(rules, capabilities, 'anti-analysis/packing/'):
+        logger.warning("-" * 80)
+        logger.warning(" This sample appears packed.")
+        logger.warning(" ")
+        logger.warning(" Packed samples have often been obfuscated to hide their logic.")
+        logger.warning(" capa cannot handle obfuscation well. This means the results may be misleading or incomplete.")
+        logger.warning(" If possible, you should try to unpack this input file before analyzing it with capa.")
+        logger.warning("-" * 80)
+
+    if args.vverbose:
+        render_capabilities_vverbose(capabilities)
+    elif args.verbose:
+        render_capabilities_verbose(capabilities)
+    else:
+        render_capabilities_default(rules, capabilities)
+
+    logger.info('done.')
+
+    return 0
+
+
+def ida_main():
+    logging.basicConfig(level=logging.INFO)
+    logging.getLogger().setLevel(logging.INFO)
+
+    logger.info('-' * 80)
+    logger.info(' Using default embedded rules.')
+    logger.info(' ')
+    logger.info(' You can see the current default rule set here:')
+    logger.info('     https://github.com/fireeye/capa-rules')
+    logger.info('-' * 80)
+
+    if hasattr(sys, 'frozen') and hasattr(sys, '_MEIPASS'):
+        logger.debug('detected running under PyInstaller')
+        rules_path = os.path.join(sys._MEIPASS, 'rules')
+        logger.debug('default rule path (PyInstaller method): %s', rules_path)
+    else:
+        logger.debug('detected running from source')
+        rules_path = os.path.join(os.path.dirname(__file__), '..', 'rules')
+        logger.debug('default rule path (source method): %s', rules_path)
+
+    rules = get_rules(rules_path)
+    import capa.rules
+    rules = capa.rules.RuleSet(rules)
+
+    import capa.features.extractors.ida
+    capabilities = find_capabilities(rules, capa.features.extractors.ida.IdaFeatureExtractor())
+    render_capabilities_default(rules, capabilities)
+
+
+def is_runtime_ida():
+    try:
+        import idc
+    except ImportError:
+        return False
+    else:
+        return True
+
+
+if __name__ == "__main__":
+    if is_runtime_ida():
+        ida_main()
+    else:
+        sys.exit(main())
--- a/capa/rules.py
+++ b/capa/rules.py
@@ -0,0 +1,669 @@
+import yaml
+import uuid
+import codecs
+import logging
+import binascii
+
+import capa.engine
+from capa.engine import *
+import capa.features
+import capa.features.file
+import capa.features.function
+import capa.features.basicblock
+import capa.features.insn
+from capa.features import MAX_BYTES_FEATURE_SIZE
+
+
+logger = logging.getLogger(__name__)
+
+
+FILE_SCOPE = 'file'
+FUNCTION_SCOPE = 'function'
+BASIC_BLOCK_SCOPE = 'basic block'
+
+
+SUPPORTED_FEATURES = {
+    FILE_SCOPE: set([
+        capa.engine.Element,
+        capa.features.MatchedRule,
+        capa.features.file.Export,
+        capa.features.file.Import,
+        capa.features.file.Section,
+        capa.features.Characteristic('embedded pe'),
+        capa.features.String,
+    ]),
+    FUNCTION_SCOPE: set([
+        capa.engine.Element,
+        capa.features.MatchedRule,
+        capa.features.insn.API,
+        capa.features.insn.Number,
+        capa.features.String,
+        capa.features.Bytes,
+        capa.features.insn.Offset,
+        capa.features.insn.Mnemonic,
+        capa.features.basicblock.BasicBlock,
+        capa.features.Characteristic('switch'),
+        capa.features.Characteristic('nzxor'),
+        capa.features.Characteristic('peb access'),
+        capa.features.Characteristic('fs access'),
+        capa.features.Characteristic('gs access'),
+        capa.features.Characteristic('cross section flow'),
+        capa.features.Characteristic('stack string'),
+        capa.features.Characteristic('calls from'),
+        capa.features.Characteristic('calls to'),
+        capa.features.Characteristic('indirect call'),
+        capa.features.Characteristic('loop'),
+        capa.features.Characteristic('recursive call')
+    ]),
+    BASIC_BLOCK_SCOPE: set([
+        capa.engine.Element,
+        capa.features.MatchedRule,
+        capa.features.insn.API,
+        capa.features.insn.Number,
+        capa.features.String,
+        capa.features.Bytes,
+        capa.features.insn.Offset,
+        capa.features.insn.Mnemonic,
+        capa.features.Characteristic('nzxor'),
+        capa.features.Characteristic('peb access'),
+        capa.features.Characteristic('fs access'),
+        capa.features.Characteristic('gs access'),
+        capa.features.Characteristic('cross section flow'),
+        capa.features.Characteristic('tight loop'),
+        capa.features.Characteristic('stack string'),
+        capa.features.Characteristic('indirect call')
+    ]),
+}
+
+
+class InvalidRule(ValueError):
+    def __init__(self, msg):
+        super(InvalidRule, self).__init__()
+        self.msg = msg
+
+    def __str__(self):
+        return 'invalid rule: %s' % (self.msg)
+
+    def __repr__(self):
+        return str(self)
+
+
+class InvalidRuleWithPath(InvalidRule):
+    def __init__(self, path, msg):
+        super(InvalidRuleWithPath, self).__init__(msg)
+        self.path = path
+        self.msg = msg
+        self.__cause__ = None
+
+    def __str__(self):
+        return 'invalid rule: %s: %s' % (self.path, self.msg)
+
+
+class InvalidRuleSet(ValueError):
+    def __init__(self, msg):
+        super(InvalidRuleSet, self).__init__()
+        self.msg = msg
+
+    def __str__(self):
+        return 'invalid rule set: %s' % (self.msg)
+
+    def __repr__(self):
+        return str(self)
+
+
+def ensure_feature_valid_for_scope(scope, feature):
+    if isinstance(feature, capa.features.Characteristic):
+        if capa.features.Characteristic(feature.name) not in SUPPORTED_FEATURES[scope]:
+            raise InvalidRule('feature %s not support for scope %s' % (feature, scope))
+    elif not isinstance(feature, tuple(filter(lambda t: isinstance(t, type), SUPPORTED_FEATURES[scope]))):
+        raise InvalidRule('feature %s not support for scope %s' % (feature, scope))
+
+
+def parse_int(s):
+    if s.startswith('0x'):
+        return int(s, 0x10)
+    else:
+        return int(s, 10)
+
+
+def parse_range(s):
+    '''
+    parse a string "(0, 1)" into a range (min, max).
+    min and/or max may by None to indicate an unbound range.
+    '''
+    # we want to use `{` characters, but this is a dict in yaml.
+    if not s.startswith('('):
+        raise InvalidRule('invalid range: %s' % (s))
+
+    if not s.endswith(')'):
+        raise InvalidRule('invalid range: %s' % (s))
+
+    s = s[len('('):-len(')')]
+    min, _, max = s.partition(',')
+    min = min.strip()
+    max = max.strip()
+
+    if min:
+        min = parse_int(min.strip())
+        if min < 0:
+            raise InvalidRule('range min less than zero')
+    else:
+        min = None
+
+    if max:
+        max = parse_int(max.strip())
+        if max < 0:
+            raise InvalidRule('range max less than zero')
+    else:
+        max = None
+
+    if min is not None and max is not None:
+        if max < min:
+            raise InvalidRule('range max less than min')
+
+    return min, max
+
+
+def parse_feature(key):
+    # keep this in sync with supported features
+    if key == 'api':
+        return capa.features.insn.API
+    elif key == 'string':
+        return capa.features.String
+    elif key == 'bytes':
+        return capa.features.Bytes
+    elif key == 'number':
+        return capa.features.insn.Number
+    elif key == 'offset':
+        return capa.features.insn.Offset
+    elif key == 'mnemonic':
+        return capa.features.insn.Mnemonic
+    elif key == 'basic blocks':
+        return capa.features.basicblock.BasicBlock
+    elif key == 'element':
+        return Element
+    elif key.startswith('characteristic(') and key.endswith(')'):
+        characteristic = key[len('characteristic('):-len(')')]
+        return lambda v: capa.features.Characteristic(characteristic, v)
+    elif key == 'export':
+        return capa.features.file.Export
+    elif key == 'import':
+        return capa.features.file.Import
+    elif key == 'section':
+        return capa.features.file.Section
+    elif key == 'match':
+        return capa.features.MatchedRule
+    else:
+        raise InvalidRule('unexpected statement: %s' % key)
+
+
+def parse_symbol(s, value_type):
+    '''
+    s can be an int or a string
+    '''
+    if isinstance(s, str) and '=' in s:
+        value, symbol = s.split('=', 1)
+        symbol = symbol.strip()
+        if symbol == '':
+            raise InvalidRule('unexpected value: "%s", symbol name cannot be empty' % s)
+    else:
+        value = s
+        symbol = None
+
+    if isinstance(value, str):
+        if value_type == 'bytes':
+            try:
+                value = codecs.decode(value.replace(' ', ''), 'hex')
+            # TODO: Remove TypeError when Python2 is not used anymore
+            except (TypeError, binascii.Error):
+                raise InvalidRule('unexpected bytes value: "%s", must be a valid hex sequence' % value)
+
+            if len(value) > MAX_BYTES_FEATURE_SIZE:
+                raise InvalidRule('unexpected bytes value: byte sequences must be no larger than %s bytes' %
+                                  MAX_BYTES_FEATURE_SIZE)
+        else:
+            try:
+                value = parse_int(value)
+            except ValueError:
+                raise InvalidRule('unexpected value: "%s", must begin with numerical value' % value)
+
+    return value, symbol
+
+
+def build_statements(d, scope):
+    if len(d.keys()) != 1:
+        raise InvalidRule('too many statements')
+
+    key = list(d.keys())[0]
+    if key == 'and':
+        return And(*[build_statements(dd, scope) for dd in d[key]])
+    elif key == 'or':
+        return Or(*[build_statements(dd, scope) for dd in d[key]])
+    elif key == 'not':
+        if len(d[key]) != 1:
+            raise InvalidRule('not statement must have exactly one child statement')
+        return Not(*[build_statements(dd, scope) for dd in d[key]])
+    elif key.endswith(' or more'):
+        count = int(key[:-len('or more')])
+        return Some(count, *[build_statements(dd, scope) for dd in d[key]])
+    elif key == 'optional':
+        # `optional` is an alias for `0 or more`
+        # which is useful for documenting behaviors,
+        # like with `write file`, we might say that `WriteFile` is optionally found alongside `CreateFileA`.
+        return Some(0, *[build_statements(dd, scope) for dd in d[key]])
+
+    elif key == 'function':
+        if scope != FILE_SCOPE:
+            raise InvalidRule('function subscope supported only for file scope')
+
+        if len(d[key]) != 1:
+            raise InvalidRule('subscope must have exactly one child statement')
+
+        return Subscope(FUNCTION_SCOPE, *[build_statements(dd, FUNCTION_SCOPE) for dd in d[key]])
+
+    elif key == 'basic block':
+        if scope != FUNCTION_SCOPE:
+            raise InvalidRule('basic block subscope supported only for function scope')
+
+        if len(d[key]) != 1:
+            raise InvalidRule('subscope must have exactly one child statement')
+
+        return Subscope(BASIC_BLOCK_SCOPE, *[build_statements(dd, BASIC_BLOCK_SCOPE) for dd in d[key]])
+
+    elif key.startswith('count(') and key.endswith(')'):
+        # e.g.:
+        #
+        #     count(basic block)
+        #     count(mnemonic(mov))
+        #     count(characteristic(nzxor))
+
+        term = key[len('count('):-len(')')]
+
+        if term.startswith('characteristic('):
+            # characteristic features are specified a bit specially:
+            # they simply indicate the presence of something unusual/interesting,
+            # and we embed the name in the feature name, like `characteristic(nzxor)`.
+            #
+            # when we're dealing with counts, like `count(characteristic(nzxor))`,
+            # we can simply extract the feature and assume we're looking for `True` values.
+            Feature = parse_feature(term)
+            feature = Feature(True)
+            ensure_feature_valid_for_scope(scope, feature)
+        else:
+            # however, for remaining counted features, like `count(mnemonic(mov))`,
+            # we have to jump through hoops.
+            #
+            # when looking for the existance of such a feature, our rule might look like:
+            #     - mnemonic: mov
+            #
+            # but here we deal with the form: `mnemonic(mov)`.
+            term, _, arg = term.partition('(')
+            Feature = parse_feature(term)
+
+            if arg:
+                arg = arg[:-len(')')]
+                # can't rely on yaml parsing ints embedded within strings
+                # like:
+                #
+                #     count(offset(0xC))
+                #     count(number(0x11223344))
+                #     count(number(0x100 = symbol name))
+                if term in ('number', 'offset', 'bytes'):
+                    value, symbol = parse_symbol(arg, term)
+                    feature = Feature(value, symbol)
+                elif term in ('element'):
+                    arg = parse_int(arg)
+                    feature = Feature(arg)
+                else:
+                    # arg is string, like:
+                    #
+                    #     count(mnemonic(mov))
+                    #     count(string(error))
+                    # TODO: what about embedded newlines?
+                    feature = Feature(arg)
+            else:
+                feature = Feature()
+            ensure_feature_valid_for_scope(scope, feature)
+
+        count = d[key]
+        if isinstance(count, int):
+            return Range(feature, min=count, max=count)
+        elif count.endswith(' or more'):
+            min = parse_int(count[:-len(' or more')])
+            max = None
+            return Range(feature, min=min, max=max)
+        elif count.endswith(' or fewer'):
+            min = None
+            max = parse_int(count[:-len(' or fewer')])
+            return Range(feature, min=min, max=max)
+        elif count.startswith('('):
+            min, max = parse_range(count)
+            return Range(feature, min=min, max=max)
+        else:
+            raise InvalidRule('unexpected range: %s' % (count))
+    elif key == 'string' and d[key].startswith('/') and (d[key].endswith('/') or d[key].endswith('/i')):
+        try:
+            return Regex(d[key])
+        except re.error:
+            if d[key].endswith('/i'):
+                d[key] = d[key][:-len('i')]
+            raise InvalidRule('invalid regular expression: %s it should use Python syntax, try it at https://pythex.org' % d[key])
+    else:
+        Feature = parse_feature(key)
+        if key in ('number', 'offset', 'bytes'):
+            # parse numbers with symbol description, e.g. 0x4550 = IMAGE_DOS_SIGNATURE
+            # or regular numbers, e.g. 37
+            value, symbol = parse_symbol(d[key], key)
+            feature = Feature(value, symbol)
+        else:
+            feature = Feature(d[key])
+        ensure_feature_valid_for_scope(scope, feature)
+        return feature
+
+
+def first(s):
+    return s[0]
+
+
+def second(s):
+    return s[1]
+
+
+class Rule(object):
+    def __init__(self, name, scope, statement, meta, definition=''):
+        super(Rule, self).__init__()
+        self.name = name
+        self.scope = scope
+        self.statement = statement
+        self.meta = meta
+        self.definition = definition
+
+    def __str__(self):
+        return 'Rule(name=%s)' % (self.name)
+
+    def __repr__(self):
+        return 'Rule(scope=%s, name=%s)' % (self.scope, self.name)
+
+    def get_dependencies(self):
+        '''
+        fetch the names of rules this rule relies upon.
+        these are only the direct dependencies; a user must
+         compute the transitive dependency graph themself, if they want it.
+
+        Returns:
+          List[str]: names of rules upon which this rule depends.
+        '''
+        deps = set([])
+
+        def rec(statement):
+            if isinstance(statement, capa.features.MatchedRule):
+                deps.add(statement.rule_name)
+
+            elif isinstance(statement, Statement):
+                for child in statement.get_children():
+                    rec(child)
+
+            # else: might be a Feature, etc.
+            # which we don't care about here.
+
+        rec(self.statement)
+        return deps
+
+    def _extract_subscope_rules_rec(self, statement):
+        if isinstance(statement, Statement):
+            # for each child that is a subscope,
+            for subscope in filter(lambda statement: isinstance(statement, capa.engine.Subscope), statement.get_children()):
+
+                # create a new rule from it.
+                # the name is a randomly generated, hopefully unique value.
+                # ideally, this won't every be rendered to a user.
+                name = self.name + '/' + uuid.uuid4().hex
+                new_rule = Rule(name, subscope.scope, subscope.child, {
+                    'name': name,
+                    'scope': subscope.scope,
+                    # these derived rules are never meant to be inspected separately,
+                    # they are dependencies for the parent rule,
+                    # so mark it as such.
+                    'lib': True,
+                    # metadata that indicates this is derived from a subscope statement
+                    'capa/subscope-rule': True,
+                    # metadata that links the child rule the parent rule
+                    'capa/parent': self.name,
+                })
+
+                # update the existing statement to `match` the new rule
+                new_node = capa.features.MatchedRule(name)
+                statement.replace_child(subscope, new_node)
+
+                # and yield the new rule to our caller
+                yield new_rule
+
+            # now recurse to other nodes in the logic tree.
+            # note: we cannot recurse into the subscope sub-tree,
+            #  because its been replaced by a `match` statement.
+            for child in statement.get_children():
+                for new_rule in self._extract_subscope_rules_rec(child):
+                    yield new_rule
+
+    def extract_subscope_rules(self):
+        '''
+        scan through the statements of this rule,
+        replacing subscope statements with `match` references to a newly created rule,
+        which are yielded from this routine.
+
+        note: this mutates the current rule.
+
+        example::
+
+            for derived_rule in rule.extract_subscope_rules():
+                assert derived_rule.meta['capa/parent'] == rule.name
+        '''
+
+        # recurse through statements
+        # when encounter Subscope statement
+        #   create new transient rule
+        #   copy logic into the new rule
+        #   replace old node with reference to new rule
+        #   yield new rule
+
+        for new_rule in self._extract_subscope_rules_rec(self.statement):
+            yield new_rule
+
+    def evaluate(self, features):
+        return self.statement.evaluate(features)
+
+    @classmethod
+    def from_dict(cls, d, s):
+        name = d['rule']['meta']['name']
+        # if scope is not specified, default to function scope.
+        # this is probably the mode that rule authors will start with.
+        scope = d['rule']['meta'].get('scope', FUNCTION_SCOPE)
+        statements = d['rule']['features']
+
+        # the rule must start with a single logic node.
+        # doing anything else is too implicit and difficult to remove (AND vs OR ???).
+        if len(statements) != 1:
+            raise InvalidRule('rule must begin with a single top level statement')
+
+        if isinstance(statements[0], capa.engine.Subscope):
+            raise InvalidRule('top level statement may not be a subscope')
+
+        return cls(
+            name,
+            scope,
+            build_statements(statements[0], scope),
+            d['rule']['meta'],
+            s
+        )
+
+    @classmethod
+    def from_yaml(cls, s):
+        return cls.from_dict(yaml.safe_load(s), s)
+
+    @classmethod
+    def from_yaml_file(cls, path):
+        with open(path, 'rb') as f:
+            try:
+                return cls.from_yaml(f.read().decode('utf-8'))
+            except InvalidRule as e:
+                raise InvalidRuleWithPath(path, str(e))
+
+
+def get_rules_with_scope(rules, scope):
+    '''
+    from the given collection of rules, select those with the given scope.
+
+    args:
+      rules (List[capa.rules.Rule]):
+      scope (str): one of the capa.rules.*_SCOPE constants.
+
+    returns:
+      List[capa.rules.Rule]:
+    '''
+    return list(rule for rule in rules if rule.scope == scope)
+
+
+def get_rules_and_dependencies(rules, rule_name):
+    '''
+    from the given collection of rules, select a rule and its dependencies (transitively).
+
+    args:
+      rules (List[Rule]):
+      rule_name (str):
+
+    yields:
+      Rule:
+    '''
+    rules = {rule.name: rule for rule in rules}
+    wanted = set([rule_name])
+
+    def rec(rule):
+        wanted.add(rule.name)
+        for dep in rule.get_dependencies():
+            rec(rules[dep])
+
+    rec(rules[rule_name])
+
+    for rule in rules.values():
+        if rule.name in wanted:
+            yield rule
+
+
+def ensure_rules_are_unique(rules):
+    seen = set([])
+    for rule in rules:
+        if rule.name in seen:
+            raise InvalidRule('duplicate rule name: ' + rule.name)
+        seen.add(rule.name)
+
+
+def ensure_rule_dependencies_are_met(rules):
+    '''
+    raise an exception if a rule dependency does not exist.
+
+    raises:
+      InvalidRule: if a dependency is not met.
+    '''
+    rules = {rule.name: rule for rule in rules}
+    for rule in rules.values():
+        for dep in rule.get_dependencies():
+            if dep not in rules:
+                raise InvalidRule('rule "%s" depends on missing rule "%s"' % (rule.name, dep))
+
+
+class RuleSet(object):
+    '''
+    a ruleset is initialized with a collection of rules, which it verifies and sorts into scopes.
+    each set of scoped rules is sorted topologically, which enables rules to match on past rule matches.
+
+    example:
+
+        ruleset = RuleSet([
+          Rule(...),
+          Rule(...),
+          ...
+        ])
+        capa.engine.match(ruleset.file_rules, ...)
+    '''
+
+    def __init__(self, rules):
+        super(RuleSet, self).__init__()
+
+        ensure_rules_are_unique(rules)
+
+        rules = self._extract_subscope_rules(rules)
+
+        ensure_rule_dependencies_are_met(rules)
+
+        if len(rules) == 0:
+            raise InvalidRuleSet('no rules selected')
+
+        self.file_rules = self._get_rules_for_scope(rules, FILE_SCOPE)
+        self.function_rules = self._get_rules_for_scope(rules, FUNCTION_SCOPE)
+        self.basic_block_rules = self._get_rules_for_scope(rules, BASIC_BLOCK_SCOPE)
+        self.rules = {rule.name: rule for rule in rules}
+
+    def __len__(self):
+        return len(self.rules)
+
+    @staticmethod
+    def _get_rules_for_scope(rules, scope):
+        '''
+        given a collection of rules, collect the rules that are needed at the given scope.
+        these rules are ordered topologically.
+
+        don't include "lib" rules, unless they are dependencies of other rules.
+        '''
+        scope_rules = set([])
+
+        # we need to process all rules, not just rules with the given scope.
+        # this is because rules with a higher scope, e.g. file scope, may have subscope rules
+        #  at lower scope, e.g. function scope.
+        # so, we find all dependencies of all rules, and later will filter them down.
+        for rule in rules:
+            if rule.meta.get('lib', False):
+                continue
+
+            scope_rules.update(get_rules_and_dependencies(rules, rule.name))
+        return get_rules_with_scope(capa.engine.topologically_order_rules(scope_rules), scope)
+
+    @staticmethod
+    def _extract_subscope_rules(rules):
+        '''
+        process the given sequence of rules.
+        for each one, extract any embedded subscope rules into their own rule.
+        process these recursively.
+        then return a list of the refactored rules.
+
+        note: this operation mutates the rules passed in - they may now have `match` statements
+         for the extracted subscope rules.
+        '''
+        done = []
+
+        # use a queue of rules, because we'll be modifying the list (appending new items) as we go.
+        while rules:
+            rule = rules.pop(0)
+            for subscope_rule in rule.extract_subscope_rules():
+                rules.append(subscope_rule)
+            done.append(rule)
+
+        return done
+
+    def filter_rules_by_meta(self, tag):
+        '''
+        return new rule set with rules filtered based on all meta field values, adds all dependency rules
+        apply tag-based rule filter assuming that all required rules are loaded
+        can be used to specify selected rules vs. providing a rules child directory where capa cannot resolve
+        dependencies from unknown paths
+        TODO handle circular dependencies?
+        TODO support -t=metafield <k>
+        '''
+        rules = self.rules.values()
+        rules_filtered = set([])
+        for rule in rules:
+            for k, v in rule.meta.items():
+                if isinstance(v, str) and tag in v:
+                    logger.debug('using rule "%s" and dependencies, found tag in meta.%s: %s', rule.name, k, v)
+                    rules_filtered.update(set(capa.rules.get_rules_and_dependencies(rules, rule.name)))
+                    break
+        return RuleSet(list(rules_filtered))
--- a/capa/version.py
+++ b/capa/version.py
@@ -0,0 +1,2 @@
+__version__ = '0.0.0'
+__commit__ = '00000000'
--- a/ci/logo.ico
+++ b/ci/logo.ico
--- a/ci/logo.png
+++ b/ci/logo.png
--- a/ci/tox.ini
+++ b/ci/tox.ini
@@ -0,0 +1,8 @@
+[pycodestyle]
+; E402: module level import not at top of file
+; W503: line break before binary operator
+ignore = E402,W503
+max-line-length = 160
+statistics = True
+count = True
+exclude = .*
--- a/doc/capa_explorer.png
+++ b/doc/capa_explorer.png
--- a/doc/installation.md
+++ b/doc/installation.md
@@ -0,0 +1,44 @@
+# Installation
+You can install capa in a few different ways. First, if you simply want to use capa, just download the [standalone binary](https://github.com/fireeye/capa/releases). If you want to use capa as a Python library, you can install the package directly from Github using `pip`. If you'd like to contribute patches or features to capa, you can work with a local copy of the source code.
+
+## Method 1: Standalone installation
+If you simply want to use capa, use the standalone binaries we host on Github: https://github.com/fireeye/capa/releases. These binary executable files contain all the source code, Python interpreter, and associated resources needed to make capa run. This means you can run it without any installation! Just invoke the file using your terminal shell to see the help documentation.
+
+We used PyInstaller to create these packages.
+
+## Method 2: Using capa as a Python library
+To install capa as a Python library, you'll need to install a few dependencies, and then use `pip` to fetch the capa module.
+
+### 1. Install requirements
+First, install the requirements.
+`$ pip install https://github.com/williballenthin/vivisect/zipball/master`
+
+### 2. Install capa module
+Second, use `pip` to install the capa module to your local Python environment. This fetches the library code to your computer, but does not keep editable source files around for you to hack on. If you'd like to edit the source files, see below.
+`$ pip install https://github.com/fireeye/capa/archive/master.zip`
+
+### 3. Use capa
+You can now import the `capa` module from a Python script or use the IDA Pro plugins from the `capa/ida` directory. For more information please see the [usage](usage.md) documentation.
+
+## Method 3: Inspecting the capa source code
+If you'd like to review and modify the capa source code, you'll need to check it out from Github and install it locally. By following these instructions, you'll maintain a local directory of source code that you can modify and run easily. 
+
+### 1. Install requirements
+First, install the requirements.
+`$ pip install https://github.com/williballenthin/vivisect/zipball/master`
+
+### 2. Check out source code
+First, clone the capa git repository.
+
+#### SSH
+`$ git clone git@github.com:fireeye/capa.git /local/path/to/src`
+
+#### HTTPS
+`$ git clone https://github.com/fireeye/capa.git /local/path/to/src`
+
+### 3. Install the local source code
+Next, use `pip` to install the source code in "editable" mode. This means that Python will load the capa module from this local directory rather than copying it to `site-packages` or `dist-packages`. This is good, because it is easy for us to modify files and see the effects reflected immediately. But be careful not to remove this directory unless uninstalling capa.
+
+`$ pip install -e ./local/path/to/src`
+
+You'll find that the `capa.exe` (Windows) or `capa` (Linux) executables in your path now invoke the capa binary from this directory.
--- a/doc/limitations.md
+++ b/doc/limitations.md
@@ -0,0 +1,61 @@
+# Packers
+Packed programs have often been obfuscated to hide their logic. Since capa cannot handle obfuscation well, results may be misleading or incomplete. If possible, users should unpack input files before analyzing them with capa.
+
+If capa detects that a program may be packed using its rules it warns the user.
+
+
+# Installers, run-time programs, etc.
+capa cannot handle installers, run-time programs like .NET applications, or other packaged applications like AutoIt well. This means that the results may be misleading or incomplete.
+
+If capa detects an installer, run-time program, etc. it warns the user.
+
+
+# Wrapper functions and matches in child functions
+Currently capa does not handle wrapper functions or other matches in child functions.
+
+Consider this example call tree where `f1` calls a wrapper function `f2` and the `CreateProcess` API. `f2` writes to a file.
+
+```
+f1
+  f2 (WriteFile wrapper)
+    CreateFile
+    WriteFile
+  CreateProcess
+```
+
+Here capa does not match a rule that hits on file creation and execution on function `f1`.  
+
+Software often contains such nested calls because programmers wrap API calls in helper functions or because specific compilers or languages, such as Go, layer calls.
+
+While a feature to capture nested functionality is desirable it introduces various issues and complications. These include:
+
+- how to assign matches from child to parent functions?
+- a potential significant increase in analysis requirements and rule matching complexity  
+
+Moreover, we require more real-world samples to see how prevalent this really is and how much it would improve capa's results. 
+
+
+# Loop scope
+Encryption, encoding, or processing functions often contain loops and it could be beneficial to capture functionality within loops.
+
+However, tracking all basic blocks part of a loop especially with nested loop constructs is not trivial.
+
+As a compromise, capa provides the `characteristic(loop)` feature to filter on functions that contain a loop.
+
+We need more practical use cases and test samples to justify the additional workload to implement a full loop scope feature.
+
+
+# ATT&CK, MAEC, MBC, and other capability tagging
+capa uses a custom category tagging that assigns capabilities with objective, behavior, and technique (see https://github.com/fireeye/capa#meta-block).
+
+The category tagging is loosely based on the ELWUN/Nucleus capability tags.
+
+While exploring other tagging mechanisms we discovered the following shortcomings:
+
+- ATT&CK: does not cover all the capabilities we are trying to express and is intended for a different purpose (general adversary tactics and techniques)
+- MAEC: the ELWUN tags are related to the MAEC format, but express capabilities more appropriately for us
+- MBC: this is the right scope, but a rather new project, if there's more support and demand in the community for this schema further work in this direction could be promising
+
+Adding tags from a new schema to the existing rules is a cumbersome process. We will hold on to amending rules until we have identified an appropriate schema.
+
+Additionally, if we choose to support a public standard, we would like to provide expertise back to the community.
--- a/doc/usage.md
+++ b/doc/usage.md
@@ -0,0 +1,26 @@
+# Usage
+## Command line
+After you have downloaded the standalone version of capa or installed it via `pip` (see the [installation](installation.md) documentation) you can run capa directly from your terminal shell.
+
+- `$ capa -h`
+- `$ capa malware.exe`
+
+In this mode capa relies on vivisect which only runs under Python 2.
+
+## IDA Pro
+capa runs from within IDA Pro. Run `capa/main.py` via File - Script file... (ALT + F7).
+
+When running in IDA, capa uses IDA's disassembly and file analysis as its backend. These results may vary from the standalone version that uses vivisect.
+
+In IDA, capa supports Python 2 and Python 3. If you encounter issues with your specific setup please open a new [Issue](https://github.com/fireeye/capa/issues). 
+
+## IDA plugins
+capa comes with two IDA Pro plugins located in the `capa/ida` directory.
+
+### capa explorer
+The capa explorer allows you to interactively display and browse capabilities capa identified in a binary.
+
+![capa explorer](capa_explorer.png)
+
+### Rule generator
+The rule generator helps you to easily write new rules based on the function you are currently analyzing in your IDA disassembly view.
--- a/scripts/lint.py
+++ b/scripts/lint.py
@@ -0,0 +1,403 @@
+'''
+Check the given capa rules for style issues.
+
+Usage:
+
+   $ python scripts/lint.py rules/
+'''
+import os
+import sys
+import string
+import hashlib
+import logging
+import os.path
+import itertools
+
+import argparse
+
+import capa.main
+import capa.engine
+import capa.features
+
+logger = logging.getLogger('capa.lint')
+
+
+class Lint(object):
+    name = 'lint'
+    recommendation = ''
+
+    def check_rule(self, ctx, rule):
+        return False
+
+
+class NameCasing(Lint):
+    name = 'rule name casing'
+    recommendation = 'Rename rule using to start with lower case letters'
+
+    def check_rule(self, ctx, rule):
+        return (rule.name[0] in string.ascii_uppercase and
+                rule.name[1] not in string.ascii_uppercase)
+
+
+class MissingRuleCategory(Lint):
+    name = 'missing rule category'
+    recommendation = 'Add meta.rule-category so that the rule is emitted correctly'
+
+    def check_rule(self, ctx, rule):
+        return ('rule-category' not in rule.meta and
+                'maec/malware-category' not in rule.meta and
+                'lib' not in rule.meta)
+
+
+class MissingScope(Lint):
+    name = 'missing scope'
+    recommendation = 'Add meta.scope so that the scope is explicit (defaults to `function`)'
+
+    def check_rule(self, ctx, rule):
+        return 'scope' not in rule.meta
+
+
+class InvalidScope(Lint):
+    name = 'invalid scope'
+    recommendation = 'Use only file, function, or basic block rule scopes'
+
+    def check_rule(self, ctx, rule):
+        return rule.meta.get('scope') not in ('file', 'function', 'basic block')
+
+
+class MissingAuthor(Lint):
+    name = 'missing author'
+    recommendation = 'Add meta.author so that users know who to contact with questions'
+
+    def check_rule(self, ctx, rule):
+        return 'author' not in rule.meta
+
+
+class MissingExamples(Lint):
+    name = 'missing examples'
+    recommendation = 'Add meta.examples so that the rule can be tested and verified'
+
+    def check_rule(self, ctx, rule):
+        return ('examples' not in rule.meta or
+                not isinstance(rule.meta['examples'], list) or
+                len(rule.meta['examples']) == 0 or
+                rule.meta['examples'] == [None])
+
+
+class MissingExampleOffset(Lint):
+    name = 'missing example offset'
+    recommendation = 'Add offset of example function'
+
+    def check_rule(self, ctx, rule):
+        if rule.meta.get('scope') in ('function', 'basic block'):
+            for example in rule.meta.get('examples', []):
+                if example and ':' not in example:
+                    logger.debug('example: %s', example)
+                    return True
+
+
+class ExampleFileDNE(Lint):
+    name = 'referenced example doesn\'t exist'
+    recommendation = 'Add the referenced example to samples directory ($capa-root/tests/data or supplied via --samples)'
+
+    def check_rule(self, ctx, rule):
+        if not rule.meta.get('examples'):
+            # let the MissingExamples lint catch this case, don't double report.
+            return False
+
+        found = False
+        for example in rule.meta.get('examples', []):
+            if example:
+                example_id = example.partition(':')[0]
+                if example_id in ctx['samples']:
+                    found = True
+                    break
+
+        return not found
+
+
+class DoesntMatchExample(Lint):
+    name = 'doesn\'t match on referenced example'
+    recommendation = 'Fix the rule logic or provide a different example'
+
+    def check_rule(self, ctx, rule):
+        if not ctx['is_thorough']:
+            return False
+
+        for example in rule.meta.get('examples', []):
+            example_id = example.partition(':')[0]
+            try:
+                path = ctx['samples'][example_id]
+            except KeyError:
+                # lint ExampleFileDNE will catch this.
+                # don't double report.
+                continue
+
+            try:
+                extractor = capa.main.get_extractor(path, 'auto')
+                capabilities = capa.main.find_capabilities(ctx['rules'], extractor, disable_progress=True)
+            except Exception as e:
+                logger.error('failed to extract capabilities: %s %s %s', rule.name, path, e)
+                return True
+
+            if rule.name not in capabilities:
+                return True
+
+
+class FeatureStringTooShort(Lint):
+    name = 'feature string too short'
+    recommendation = 'capa only extracts strings with length >= 4; will not match on "{:s}"'
+
+    def check_features(self, ctx, features):
+        for feature in features:
+            if isinstance(feature, capa.features.String):
+                if len(feature.value) < 4:
+                    self.recommendation = self.recommendation.format(feature.value)
+                    return True
+        return False
+
+
+def run_lints(lints, ctx, rule):
+    for lint in lints:
+        if lint.check_rule(ctx, rule):
+            yield lint
+
+
+def run_feature_lints(lints, ctx, features):
+    for lint in lints:
+        if lint.check_features(ctx, features):
+            yield lint
+
+
+NAME_LINTS = (
+    NameCasing(),
+)
+
+
+def lint_name(ctx, rule):
+    return run_lints(NAME_LINTS, ctx, rule)
+
+
+SCOPE_LINTS = (
+    MissingScope(),
+    InvalidScope(),
+)
+
+
+def lint_scope(ctx, rule):
+    return run_lints(SCOPE_LINTS, ctx, rule)
+
+
+META_LINTS = (
+    MissingRuleCategory(),
+    MissingAuthor(),
+    MissingExamples(),
+    MissingExampleOffset(),
+    ExampleFileDNE(),
+)
+
+
+def lint_meta(ctx, rule):
+    return run_lints(META_LINTS, ctx, rule)
+
+
+FEATURE_LINTS = (
+    FeatureStringTooShort(),
+)
+
+
+def lint_features(ctx, rule):
+    features = get_features(ctx, rule)
+    return run_feature_lints(FEATURE_LINTS, ctx, features)
+
+
+def get_features(ctx, rule):
+    # get features from rule and all dependencies including subscopes and matched rules
+    features = []
+    deps = [ctx['rules'].rules[dep] for dep in rule.get_dependencies()]
+    for r in [rule] + deps:
+        features.extend(get_rule_features(r))
+    return features
+
+
+def get_rule_features(rule):
+    features = []
+
+    def rec(statement):
+        if isinstance(statement, capa.engine.Statement):
+            for child in statement.get_children():
+                rec(child)
+        else:
+            features.append(statement)
+
+    rec(rule.statement)
+    return features
+
+
+LOGIC_LINTS = (
+    DoesntMatchExample(),
+)
+
+
+def lint_logic(ctx, rule):
+    return run_lints(LOGIC_LINTS, ctx, rule)
+
+
+def is_nursery_rule(rule):
+    '''
+    The nursery is a spot for rules that have not yet been fully polished.
+    For example, they may not have references to public example of a technique.
+    Yet, we still want to capture and report on their matches.
+    '''
+    return rule.meta.get('nursery')
+
+
+def lint_rule(ctx, rule):
+    logger.debug(rule.name)
+
+    violations = list(itertools.chain(
+        lint_name(ctx, rule),
+        lint_scope(ctx, rule),
+        lint_meta(ctx, rule),
+        lint_logic(ctx, rule),
+        lint_features(ctx, rule),
+    ))
+
+    if len(violations) > 0:
+        category = rule.meta.get('rule-category')
+
+        print('')
+        print('%s%s %s' % ('    (nursery) ' if is_nursery_rule(rule) else '',
+                           rule.name,
+                           ('(%s)' % category) if category else ''))
+
+        level = 'WARN' if is_nursery_rule(rule) else 'FAIL'
+
+        for violation in violations:
+            print('%s  %s: %s: %s' % (
+                  '    ' if is_nursery_rule(rule) else '', level, violation.name, violation.recommendation))
+
+    return len(violations) > 0 and not is_nursery_rule(rule)
+
+
+def lint(ctx, rules):
+    '''
+    Args:
+      samples (Dict[string, string]): map from sample id to path.
+        for each sample, record sample id of sha256, md5, and filename.
+        see `collect_samples(path)`.
+      rules (List[Rule]): the rules to lint.
+    '''
+    did_suggest_fix = False
+    for rule in rules.rules.values():
+        if rule.meta.get('capa/subscope-rule', False):
+            continue
+
+        did_suggest_fix = lint_rule(ctx, rule) or did_suggest_fix
+
+    return did_suggest_fix
+
+
+def collect_samples(path):
+    '''
+    recurse through the given path, collecting all file paths, indexed by their content sha256, md5, and filename.
+    '''
+    samples = {}
+    for root, dirs, files in os.walk(path):
+        for name in files:
+            if name.endswith('.viv'):
+                continue
+            if name.endswith('.idb'):
+                continue
+            if name.endswith('.i64'):
+                continue
+
+            path = os.path.join(root, name)
+
+            try:
+                with open(path, 'rb') as f:
+                    buf = f.read()
+            except IOError:
+                continue
+
+            sha256 = hashlib.sha256()
+            sha256.update(buf)
+
+            md5 = hashlib.md5()
+            md5.update(buf)
+
+            samples[sha256.hexdigest().lower()] = path
+            samples[sha256.hexdigest().upper()] = path
+            samples[md5.hexdigest().lower()] = path
+            samples[md5.hexdigest().upper()] = path
+            samples[name] = path
+
+    return samples
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    samples_path = os.path.join(os.path.dirname(__file__), '..', 'tests', 'data')
+
+    parser = argparse.ArgumentParser(description='A program.')
+    parser.add_argument('rules', type=str,
+                        help='Path to rules')
+    parser.add_argument('--samples', type=str, default=samples_path,
+                        help='Path to samples')
+    parser.add_argument('--thorough', action='store_true',
+                        help='Enable thorough linting - takes more time, but does a better job')
+    parser.add_argument('-v', '--verbose', action='store_true',
+                        help='Enable debug logging')
+    parser.add_argument('-q', '--quiet', action='store_true',
+                        help='Disable all output but errors')
+    args = parser.parse_args(args=argv)
+
+    if args.verbose:
+        level = logging.DEBUG
+    elif args.quiet:
+        level = logging.ERROR
+    else:
+        level = logging.INFO
+
+    logging.basicConfig(level=level)
+    logging.getLogger('capa.lint').setLevel(level)
+
+    capa.main.set_vivisect_log_level(logging.CRITICAL)
+    logging.getLogger('capa').setLevel(logging.CRITICAL)
+
+    try:
+        rules = capa.main.get_rules(args.rules)
+        rules = capa.rules.RuleSet(rules)
+        logger.info('successfully loaded %s rules', len(rules))
+    except IOError as e:
+        logger.error('%s', str(e))
+        return -1
+    except capa.rules.InvalidRule as e:
+        logger.error('%s', str(e))
+        return -1
+
+    logger.info('collecting potentially referenced samples')
+    if not os.path.exists(args.samples):
+        logger.error('samples path %s does not exist', args.samples)
+        return -1
+
+    samples = collect_samples(args.samples)
+
+    ctx = {
+        'samples': samples,
+        'rules': rules,
+        'is_thorough': args.thorough,
+    }
+
+    did_violate = lint(ctx, rules)
+    if not did_violate:
+        logger.info('no suggestions, nice!')
+        return 0
+    else:
+        return 1
+
+
+if __name__ == '__main__':
+    sys.exit(main())
--- a/scripts/show-features.py
+++ b/scripts/show-features.py
@@ -0,0 +1,81 @@
+#!/usr/bin/env python2
+'''
+show the features extracted by capa.
+'''
+import sys
+import logging
+
+import argparse
+
+import capa.main
+import capa.rules
+import capa.engine
+import capa.features
+import capa.features.freeze
+import capa.features.extractors.viv
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    formats = [
+        ('auto', '(default) detect file type automatically'),
+        ('pe', 'Windows PE file'),
+        ('sc32', '32-bit shellcode'),
+        ('sc64', '64-bit shellcode'),
+        ('freeze', 'features previously frozen by capa'),
+    ]
+    format_help = ', '.join(['%s: %s' % (f[0], f[1]) for f in formats])
+
+    parser = argparse.ArgumentParser(description="detect capabilities in programs.")
+    parser.add_argument("sample", type=str,
+                        help="Path to sample to analyze")
+    parser.add_argument("-f", "--format", choices=[f[0] for f in formats], default="auto",
+                        help="Select sample format, %s" % format_help)
+    parser.add_argument("-F", "--function", type=lambda x: int(x, 0),
+                        help="Show features for specific function")
+    args = parser.parse_args(args=argv)
+
+    logging.basicConfig(level=logging.INFO)
+    logging.getLogger().setLevel(logging.INFO)
+
+    if args.format == 'freeze':
+        with open(args.sample, 'rb') as f:
+            extractor = capa.features.freeze.load(f.read())
+    else:
+        vw = capa.main.get_workspace(args.sample, args.format)
+        extractor = capa.features.extractors.viv.VivisectFeatureExtractor(vw, args.sample)
+
+    if not args.function:
+        for feature, va in extractor.extract_file_features():
+            if va:
+                print('file: 0x%08x: %s' % (va, feature))
+            else:
+                print('file: 0x00000000: %s' % (feature))
+
+    functions = extractor.get_functions()
+
+    if args.function:
+        if args.format == 'freeze':
+            functions = filter(lambda f: f == args.function, functions)
+        else:
+            functions = filter(lambda f: f.va == args.function, functions)
+
+    for f in functions:
+        for feature, va in extractor.extract_function_features(f):
+            print('func: 0x%08x: %s' % (va, feature))
+
+        for bb in extractor.get_basic_blocks(f):
+            for feature, va in extractor.extract_basic_block_features(f, bb):
+                print('bb  : 0x%08x: %s' % (va, feature))
+
+            for insn in extractor.get_instructions(f, bb):
+                for feature, va in extractor.extract_insn_features(f, bb, insn):
+                    print('insn: 0x%08x: %s' % (va, feature))
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/testbed/README.md
+++ b/scripts/testbed/README.md
@@ -0,0 +1,71 @@
+# Testbed
+Goal of the testbed is to support the development of new `capa` rules. Scripts allow to test rules against a large sample set and to batch process samples, e.g. to freeze features or to generate other meta data used for testing.
+
+The testbed contains malicious and benign files. Data sources are:
+- Microsoft EXE and DLL files from `C:\Windows\System32`, `C:\Windows\SysWOW64`, etc.
+- samples analyzed and annotated by FLARE analysts during malware analysis
+
+Samples containing the keyword `slow` in their path indicate a longer test run time (>20 seconds) and can be ignored via the `-f` argument.
+
+Running a rule against a large set of executable programs helps to quickly determine on which functions/samples a rule hits. This helps to identify:
+- true positives: hits on expected functions
+- false positives: hits on unexpected functions, for example
+  - if a rule is to generic or
+  - if a rule hits on a capability present in many (benign) samples
+
+To provide additional context the testbed contains function names from the following data sources:
+- benign files: function names from Microsoft's PDB information
+- malicious files: function names provided by FLARE analysts and obtained from 
+the LabelMaker 2000 (LM2k) annotations repository
+
+For each test sample the testbed contains the following files:
+- a `.frz` file storing the extracted `capa` features
+  - `capa`'s serialized features, via `capa.features.freeze`
+- a `.fnames` file mapping function addresses to function names
+  - JSON file that maps fvas to function names or
+  - CSV file with entries `idbmd5;md5;fva;fname`
+- (optional) the binary file with extension `.exe_`, `.dll_`, or `.mal_`
+
+## Scripts
+### `run_rule_on_testbed.py`
+Run a `capa` rule file against the testbed (frozen features in a directory).
+
+Meant to be run on directories that contain `.frz` and `.fnames` files. 
+
+Example usage:
+
+    run_rule_on_testbed.py <testbed dir>
+    run_rule_on_testbed.py samples
+
+With the `-s <image_path>` argument, the script exports images of function graphs to the provided path.
+Converting the images requires `graphviz`. See https://graphviz.gitlab.io/about/; get Python interface via `pip install graphviz`.
+
+## Helper Scripts
+### `freeze_features.py`
+Use `freeze_features.py` to freeze `capa` features of a file or of files in a directory.
+
+Example usage:
+
+    freeze_features.py <testbed dir>
+    freeze_features.py samples
+
+### `start_ida_dump_fnames.py`
+Start IDA Pro in autonomous mode to dump JSON file of function names `{fva: fname}`. Processes a single file or a directory.
+
+This script uses `_dump_fnames.py` to dump the JSON file of functions names and is meant to be run on benign files with PDB information. IDA should apply function names from the PDB information automatically.
+
+Example usage:
+
+    start_ida_dump_fnames.py <candidate files dir>
+    start_ida_dump_fnames.py samples\benign
+
+### `start_ida_export_fimages.py`
+Start IDA Pro in autonomous mode to export images of function graphs.
+`run_rule_on_testbed.py` integrates the export mechanism (`-s` option)
+
+This script uses `_export_fimages.py` to export DOT files of function graphs and then converts them to PNG images using `graphviz`.
+
+Example usage:
+
+    start_ida_export_fimages.py <target file> <output dir> -f <function list>
+    start_ida_export_fimages.py test.exe imgs -f 0x401000,0x402F90
--- a/scripts/testbed/init.py
+++ b/scripts/testbed/init.py
@@ -0,0 +1,2 @@
+FNAMES_EXTENSION = '.fnames'
+FREEZE_EXTENSION = '.frz'
--- a/scripts/testbed/_dump_fnames.py
+++ b/scripts/testbed/_dump_fnames.py
@@ -0,0 +1,46 @@
+'''
+IDAPython script to dump JSON file of functions names { fva: fname }.
+Meant to be run on benign files with PDB information. IDA should apply function names from the PDB files automatically.
+Can also be run on annotated IDA database files.
+
+Example usage (via IDA autonomous mode):
+  ida.exe -A -S_dump_fnames.py "<output path>" <sample_path>
+'''
+
+import json
+
+import idc
+import idautils
+
+
+def main():
+    if len(idc.ARGV) != 2:
+        # requires output file path argument
+        idc.qexit(-1)
+
+    # wait for auto-analysis to finish
+    idc.auto_wait()
+
+    INF_SHORT_DN_ATTR = idc.get_inf_attr(idc.INF_SHORT_DN)  # short form of demangled names
+
+    fnames = {}
+    for f in idautils.Functions():
+        fname = idc.get_name(f)
+        if fname.startswith("sub_"):
+            continue
+
+        name_demangled = idc.demangle_name(fname, INF_SHORT_DN_ATTR)
+        if name_demangled:
+            fname = name_demangled
+
+        fnames[f] = fname
+
+    with open(idc.ARGV[1], "w") as f:
+        json.dump(fnames, f)
+
+    # exit IDA
+    idc.qexit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/testbed/_export_fimages.py
+++ b/scripts/testbed/_export_fimages.py
@@ -0,0 +1,44 @@
+'''
+IDAPython script to export DOT files of function graphs.
+
+Example usage (via IDA autonomous mode):
+  ida.exe -A -S_export_fimages.py "<output dir>" <fva1> [<fva2> ...] <sample_path>
+'''
+
+import os
+
+import idc
+import idaapi
+import ida_gdl
+
+
+def main():
+    if len(idc.ARGV) < 3:
+        # requires output directory and function VAs argument(s)
+        idc.qexit(-1)
+
+    # wait for auto-analysis to finish
+    idc.auto_wait()
+
+    out_dir = idc.ARGV[1]
+    fvas = [int(fva, 0x10) for fva in idc.ARGV[2:]]
+    idb_name = os.path.split(idc.get_idb_path())[-1]
+
+    for fva in fvas:
+        fstart = idc.get_func_attr(fva, idc.FUNCATTR_START)
+        name = '%s_0x%x' % (idb_name.replace('.', '_'), fstart)
+        out_path = os.path.join(out_dir, name)
+        fname = idc.get_name(fstart)
+
+        if not ida_gdl.gen_flow_graph(out_path, '%s (0x%x)' % (fname, fstart), idaapi.get_func(fstart), 0, 0,
+                                      ida_gdl.CHART_GEN_DOT | ida_gdl.CHART_PRINT_NAMES):
+            print 'IDA error generating flow graph'
+        # TODO add label to DOT file, see https://stackoverflow.com/a/6452088/10548020
+        # TODO highlight where rule matched
+
+    # exit IDA
+    idc.qexit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/testbed/freeze_features.py
+++ b/scripts/testbed/freeze_features.py
@@ -0,0 +1,102 @@
+'''
+Freeze capa features.
+
+Example usage:
+  freeze_features.py <test files dir>
+  freeze_features.py samples\benign
+'''
+
+import os
+import sys
+import time
+import logging
+
+import argparse
+
+from scripts.testbed import FREEZE_EXTENSION
+from capa.features.freeze import main as freeze_features
+
+
+# only process files with these extensions
+TARGET_EXTENSIONS = [
+    '.mal_',
+    '.exe_',
+    '.dll_',
+    '.sys_'
+]
+
+
+logger = logging.getLogger('check_rule')
+
+
+def freeze(input_path, reprocess):
+    if not os.path.exists(input_path):
+        raise IOError('%s does not exist or cannot be accessed' % input_path)
+
+    if os.path.isfile(input_path):
+        outfile = '%s%s' % (input_path, FREEZE_EXTENSION)
+        freeze_file(input_path, outfile, reprocess)
+
+    elif os.path.isdir(input_path):
+        logger.info('freezing features of %s files in %s', '|'.join(TARGET_EXTENSIONS), input_path)
+        for root, dirs, files in os.walk(input_path):
+            for file in files:
+                if not os.path.splitext(file)[1] in TARGET_EXTENSIONS:
+                    logger.debug('skipping non-target file: %s', file)
+                    continue
+                path = os.path.join(root, file)
+                outfile = '%s%s' % (path, FREEZE_EXTENSION)
+                freeze_file(path, outfile, reprocess)
+
+
+def freeze_file(path, output, reprocess=False):
+    logger.info('freezing features of %s', path)
+
+    if os.path.exists(output) and not reprocess:
+        logger.info('%s already exists, provide -r argument to reprocess', output)
+        return
+
+    try:
+        freeze_features([path, output])  # args: sample, output
+    except Exception as e:
+        logger.error('could not freeze features for %s: %s', path, str(e))
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    parser = argparse.ArgumentParser(description="Freeze capa features of a file or of files in a directory")
+    parser.add_argument("file_path", type=str,
+                        help="Path to file or directory to analyze")
+    parser.add_argument("-r", "--reprocess", action="store_true", default=False,
+                        help="Overwrite existing analysis")
+    parser.add_argument("-v", "--verbose", action="store_true",
+                        help="Enable verbose output")
+    parser.add_argument("-q", "--quiet", action="store_true",
+                        help="Disable all output but errors")
+    args = parser.parse_args(args=argv)
+
+    if args.quiet:
+        logging.basicConfig(level=logging.ERROR)
+        logging.getLogger().setLevel(logging.ERROR)
+    elif args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    time0 = time.time()
+    try:
+        freeze(args.file_path, args.reprocess)
+    except IOError as e:
+        logger.error('%s', str(e))
+        return -1
+
+    logger.info('freezing features took %d seconds', time.time() - time0)
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/testbed/run_rule_on_testbed.py
+++ b/scripts/testbed/run_rule_on_testbed.py
@@ -0,0 +1,297 @@
+'''
+Run a capa rule file against the testbed (frozen features in a directory).
+
+Example usage:
+  run_rule_on_testbed.py <path to rules> <rule name> <testbed dir>
+  run_rule_on_testbed.py ..\\rules "create pipe" samples
+'''
+
+import os
+import sys
+import json
+import time
+import logging
+
+from collections import defaultdict
+
+import argparse
+
+import capa.main
+import capa.rules
+import capa.features.freeze
+
+from scripts.testbed import FNAMES_EXTENSION, FREEZE_EXTENSION
+from start_ida_export_fimages import export_fimages
+
+
+logger = logging.getLogger(__name__)
+
+# sorry globals...
+file_count = 0
+file_hits = 0
+mal_hits = 0
+other_hits = 0
+function_hits = 0
+errors = 0
+function_names = set([])
+
+
+CATEGORY = {
+    'malicious': 'MAL',
+    'benign': 'BEN',
+}
+
+
+def check_rule(path, rules, rule_name, only_matching, save_image, verbose):
+    global file_count, file_hits, mal_hits, other_hits, function_hits, errors
+
+    try:
+        capabilities = get_capabilities(path, rules)
+    except (ValueError, KeyError) as e:
+        logger.error('cannot load %s due to %s: %s', path, type(e).__name__, str(e))
+        errors += 1
+        return
+
+    file_count += 1
+    hits = get_function_hits(capabilities, rule_name)
+    if hits == 0:
+        if not only_matching:
+            render_no_hit(path)
+    else:
+        print('[x] rule matches %d function(s) in %s (%s)' % (hits, path, get_category(path)))
+
+        file_hits += 1
+        function_hits += hits
+
+        if get_category(path) == 'MAL':
+            mal_hits += 1
+        else:
+            other_hits += 1
+
+        if verbose:
+            render_hit_verbose(capabilities, path, verbose > 1)
+
+        if save_image:
+            fvas = ['0x%x' % fva for fva in get_hit_fvas(capabilities)]
+            file_path = get_idb_or_sample_path(path)
+            if file_path:
+                if not export_fimages(file_path, save_image, fvas):
+                    logger.warning('exporting images failed')
+            else:
+                logger.warning('could not get IDB or sample path')
+
+
+def get_idb_or_sample_path(path):
+    exts = ['.idb', '.i64', '.exe_', '.dll_', '.mal_']
+    roots = [os.path.splitext(path)[0], path]
+    for e in exts:
+        for r in roots:
+            p = '%s%s' % (r, e)
+            if os.path.exists(p):
+                return p
+    return None
+
+
+def get_capabilities(path, rules):
+    logger.debug('matching rules in %s', path)
+    with open(path, 'rb') as f:
+        extractor = capa.features.freeze.load(f.read())
+    return capa.main.find_capabilities(rules, extractor, disable_progress=True)
+
+
+def get_function_hits(capabilities, rule_name):
+    return len(capabilities.get(rule_name, []))
+
+
+def get_category(path):
+    for c in CATEGORY:
+        if c in path:
+            return CATEGORY[c]
+    return 'UNK'
+
+
+def render_no_hit(path):
+    print('[ ] no match in %s (%s)' % (path, get_category(path)))
+
+
+def render_hit_verbose(capabilities, path, vverbose):
+    try:
+        fnames = load_fnames(path)
+    except IOError as e:
+        logger.error('%s', str(e))
+        fnames = None
+
+    for rule, ress in capabilities.items():
+        for (fva, res) in sorted(ress, key=lambda p: p[0]):
+            if fnames and fva in fnames:
+                fname = fnames[fva]
+                function_names.add(fname)
+            else:
+                fname = '<name unknown>'
+            print('  - function 0x%x (%s)' % (fva, fname))
+
+            if vverbose:
+                capa.main.render_result(res, indent='      ')
+
+
+def get_hit_fvas(capabilities):
+    fvas = []
+    for rule, ress in capabilities.items():
+        for (fva, res) in sorted(ress, key=lambda p: p[0]):
+            fvas.append(fva)
+    return fvas
+
+
+def load_fnames(path):
+    fnames_path = path.replace(FREEZE_EXTENSION, FNAMES_EXTENSION)
+    if not os.path.exists(fnames_path):
+        raise IOError('%s does not exist' % fnames_path)
+
+    logger.debug('fnames path: %s', fnames_path)
+    try:
+        # json file with format { fva: fname }
+        fnames = load_json(fnames_path)
+        logger.debug('loaded JSON file')
+    except TypeError:
+        # csv file with format idbmd5;md5;fva;fname
+        fnames = load_csv(fnames_path)
+        logger.debug('loaded CSV file')
+    fnames = convert_keys_to_int(fnames)
+    logger.debug('read %d function names' % len(fnames))
+    return fnames
+
+
+def load_json(path):
+    with open(path, 'r') as f:
+        try:
+            funcs = json.load(f)
+        except ValueError as e:
+            logger.debug('not a JSON file, %s', str(e))
+            raise TypeError
+    return funcs
+
+
+def load_csv(path):
+    funcs = defaultdict(str)
+    with open(path, 'r') as f:
+        data = f.read().splitlines()
+    for line in data:
+        try:
+            idbmd5, md5, fva, name = line.split(':', 3)
+        except ValueError as e:
+            logger.warning('%s: "%s"', str(e), line)
+        funcs[fva] = name
+    return funcs
+
+
+def convert_keys_to_int(funcs_in):
+    funcs = {}
+    for k, v in funcs_in.iteritems():
+        try:
+            k = int(k)
+        except ValueError:
+            k = int(k, 0x10)
+        funcs[k] = v
+    return funcs
+
+
+def print_summary(verbose, start_time):
+    global file_count, file_hits, function_hits, errors
+
+    print('\n[SUMMARY]')
+    m, s = divmod(time.time() - start_time, 60)
+    logger.info('ran for %d:%02d minutes', m, s)
+    ratio = ' (%d%%)' % ((float(file_hits) / file_count) * 100) if file_count else ''
+    print('matched %d function(s) in %d/%d%s sample(s), encountered %d error(s)' % (
+        function_hits, file_hits, file_count, ratio, errors))
+    print('%d hits on (MAL) files; %d hits on other files' % (mal_hits, other_hits))
+
+    if verbose:
+        if len(function_names) > 0:
+            print('matched function names (unique):')
+            for fname in function_names:
+                print '  - %s' % fname
+
+
+def main(argv=None):
+    if argv is None:
+        argv = sys.argv[1:]
+
+    parser = argparse.ArgumentParser(description="Run capa rule file against frozen features in a directory")
+    parser.add_argument("rules", type=str,
+                        help="Path to directory containing rules")
+    parser.add_argument("rule_name", type=str,
+                        help="Name of rule to test")
+    parser.add_argument("frozen_path", type=str,
+                        help="Path to frozen feature file or directory")
+    parser.add_argument("-f", "--fast", action="store_true",
+                        help="Don't test slow files")
+    parser.add_argument("-o", "--only_matching", action="store_true",
+                        help="Print only if rule matches")
+    parser.add_argument("-s", "--save_image", action="store",
+                        help="Directory to save exported images of function graphs")
+    parser.add_argument("-v", "--verbose", action="count", default=0,
+                        help="Increase output verbosity")
+    parser.add_argument("-q", "--quiet", action="store_true",
+                        help="Disable all output but errors")
+    args = parser.parse_args(args=argv)
+
+    if args.quiet:
+        logging.basicConfig(level=logging.ERROR)
+        logging.getLogger().setLevel(logging.ERROR)
+    elif args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    if not os.path.isdir(args.rules):
+        logger.error('%s is not a directory', args.rules)
+        return -1
+
+    # load rule
+    try:
+        rules = capa.main.get_rules(args.rules)
+        rules = list(capa.rules.get_rules_and_dependencies(rules, args.rule_name))
+        rules = capa.rules.RuleSet(rules)
+    except IOError as e:
+        logger.error('%s', str(e))
+        return -1
+    except capa.rules.InvalidRule as e:
+        logger.error('%s', str(e))
+        return -1
+
+    time0 = time.time()
+
+    print('[RULE %s]' % args.rule_name)
+    if os.path.isfile(args.frozen_path):
+        check_rule(args.frozen_path, rules, args.rule_name, args.only_matching, args.save_image, args.verbose)
+
+    try:
+        # get only freeze files from directory
+        freeze_files = []
+        for root, dirs, files in os.walk(args.frozen_path):
+            for file in files:
+                if not file.endswith(FREEZE_EXTENSION):
+                    continue
+
+                path = os.path.join(root, file)
+                if args.fast and 'slow' in path:
+                    logger.debug('fast mode skipping %s', path)
+                    continue
+
+                freeze_files.append(path)
+
+        for path in sorted(freeze_files):
+            sample_time0 = time.time()
+            check_rule(path, rules, args.rule_name, args.only_matching, args.save_image, args.verbose)
+            logger.debug('rule check took %d seconds', time.time() - sample_time0)
+    except KeyboardInterrupt:
+        logger.info('Received keyboard interrupt, terminating')
+
+    print_summary(args.verbose, time0)
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/testbed/start_ida_dump_fnames.py
+++ b/scripts/testbed/start_ida_dump_fnames.py
@@ -0,0 +1,131 @@
+'''
+Start IDA Pro in autonomous mode to dump JSON file of function names { fva: fname }.
+Processes a single file or a directory.
+Only runs on files with supported file extensions.
+
+Example usage:
+  start_ida_dump_fnames.py <candidate files dir>
+  start_ida_dump_fnames.py samples\benign
+'''
+
+import os
+import sys
+import json
+import hashlib
+import logging
+import subprocess
+
+import argparse
+
+from scripts.testbed import FNAMES_EXTENSION
+
+IDA32_PATH = 'C:\\Program Files\\IDA Pro 7.3\\ida.exe'
+IDA64_PATH = 'C:\\Program Files\\IDA Pro 7.3\\ida64.exe'
+
+# expected in same directory as this file
+DUMP_SCRIPT_PATH = os.path.abspath('_dump_fnames.py')
+
+SUPPORTED_EXTENSIONS = [
+    '.exe_',
+    '.dll_',
+    '.sys_',
+    '.idb',
+    '.i64',
+]
+
+
+logger = logging.getLogger(__name__)
+
+
+def call_ida_dump_script(sample_path, reprocess):
+    ''' call IDA in autonomous mode and return True if success, False on failure '''
+    logger.info('processing %s (MD5: %s)', sample_path, get_md5_hexdigest(sample_path))
+
+    # TODO detect 64-bit binaries
+    if os.path.splitext(sample_path)[-1] == '.i64':
+        IDA_PATH = IDA64_PATH
+    else:
+        IDA_PATH = IDA32_PATH
+
+    if sample_path.endswith('.idb') or sample_path.endswith('.i64'):
+        sample_path = sample_path[:-4]
+
+    fnames = '%s%s' % (sample_path, FNAMES_EXTENSION)
+    if os.path.exists(fnames) and not reprocess:
+        logger.info('%s already exists and contains %d function names, provide -r argument to reprocess',
+                    fnames, len(get_function_names(fnames)))
+        return True
+
+    out_path = os.path.split(fnames)[-1]  # relative to IDA database file
+    args = [IDA_PATH, '-A', '-S%s "%s"' % (DUMP_SCRIPT_PATH, out_path), sample_path]
+    logger.debug('calling "%s"' % ' '.join(args))
+    subprocess.call(args)
+
+    if not os.path.exists(fnames):
+        logger.warning('%s was not created', fnames)
+        return False
+
+    logger.debug('extracted %d function names to %s', len(get_function_names(fnames)), fnames)
+    return True
+
+
+def get_md5_hexdigest(sample_path):
+    m = hashlib.md5()
+    with open(sample_path, 'rb') as f:
+        m.update(f.read())
+    return m.hexdigest()
+
+
+def get_function_names(fnames_file):
+    if not os.path.exists(fnames_file):
+        return None
+    with open(fnames_file, 'r') as f:
+        return json.load(f)
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Launch IDA Pro in autonomous mode to dump function names of a file or of files in a directory")
+    parser.add_argument("file_path", type=str,
+                        help="File or directory path to analyze")
+    parser.add_argument("-r", "--reprocess", action="store_true", default=False,
+                        help="Overwrite existing analysis")
+    parser.add_argument("-v", "--verbose", action="store_true",
+                        help="Enable verbose output")
+    args = parser.parse_args(args=sys.argv[1:])
+
+    if args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    if not os.path.exists(args.file_path):
+        logger.warning('%s does not exist', args.file_path)
+        return -1
+
+    if os.path.isfile(args.file_path):
+        call_ida_dump_script(args.file_path, args.reprocess)
+        return 0
+
+    errors = 0
+
+    logger.info('processing files in %s with file extension %s', args.file_path, '|'.join(SUPPORTED_EXTENSIONS))
+    for root, dirs, files in os.walk(args.file_path):
+        for file in files:
+            if not os.path.splitext(file)[1] in SUPPORTED_EXTENSIONS:
+                logger.debug('%s does not have supported file extension', file)
+                continue
+            path = os.path.join(root, file)
+            if not call_ida_dump_script(path, args.reprocess):
+                errors += 1
+
+    if errors:
+        logger.warning('encountered %d errors', errors)
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/testbed/start_ida_export_fimages.py
+++ b/scripts/testbed/start_ida_export_fimages.py
@@ -0,0 +1,135 @@
+'''
+Start IDA Pro in autonomous mode to export images of function graphs.
+
+Example usage:
+  start_ida_export_fimages.py <target file> <output dir> -f <function list>
+  start_ida_export_fimages.py test.exe imgs -f 0x401000,0x402F90
+'''
+
+import os
+import imp
+import sys
+import hashlib
+import logging
+import subprocess
+
+import argparse
+
+try:
+    imp.find_module('graphviz')
+    from graphviz import Source
+    graphviz_found = True
+except ImportError:
+    graphviz_found = False
+
+
+IDA32_PATH = 'C:\\Program Files\\IDA Pro 7.3\\ida.exe'
+IDA64_PATH = 'C:\\Program Files\\IDA Pro 7.3\\ida64.exe'
+
+# expected in same directory as this file
+EXPORT_SCRIPT_PATH = os.path.abspath('_export_fimages.py')
+
+
+logger = logging.getLogger(__name__)
+
+
+def export_fimages(file_path, out_dir, functions, manual=False):
+    '''
+    Export images of function graphs.
+    :param file_path: file to analyze
+    :param out_dir: output directory
+    :param functions: list of strings of hex formatted fvas
+    :param manual: non-autonomous mode
+    :return: True on success, False otherwise
+    '''
+    if not graphviz_found:
+        logger.warning('please install graphviz to export images')
+        return False
+
+    if not os.path.exists(out_dir):
+        os.mkdir(out_dir)
+
+    script_args = [os.path.abspath(out_dir)] + functions
+    call_ida_script(EXPORT_SCRIPT_PATH, script_args, file_path, manual)
+
+    img_count = 0
+    for root, dirs, files in os.walk(out_dir):
+        for file in files:
+            if not file.endswith('.dot'):
+                continue
+            try:
+                s = Source.from_file(file, directory=out_dir)
+                s.render(file, directory=out_dir, format='png', cleanup=True)
+                img_count += 1
+            except BaseException:
+                logger.warning('graphviz error rendering file')
+    if img_count > 0:
+        logger.info('exported %d function graph images to "%s"', img_count, os.path.abspath(out_dir))
+        return True
+    else:
+        logger.warning('failed to export function graph images')
+        return False
+
+
+def call_ida_script(script_path, script_args, sample_path, manual):
+    logger.info('processing %s (MD5: %s)', sample_path, get_md5_hexdigest(sample_path))
+
+    # TODO detect 64-bit binaries
+    if os.path.splitext(sample_path)[-1] == '.i64':
+        IDA_PATH = IDA64_PATH
+    else:
+        IDA_PATH = IDA32_PATH
+
+    args = [IDA_PATH, '-A', '-S%s %s' % (script_path, ' '.join(script_args)), sample_path]
+
+    if manual:
+        args.remove('-A')
+
+    logger.debug('calling "%s"' % ' '.join(args))
+    if subprocess.call(args) == 0:
+        return True
+    else:
+        return False
+
+
+def get_md5_hexdigest(sample_path):
+    m = hashlib.md5()
+    with open(sample_path, 'rb') as f:
+        m.update(f.read())
+    return m.hexdigest()
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Launch IDA Pro in autonomous mode to export images of function graphs")
+    parser.add_argument("file_path", type=str,
+                        help="File to export from")
+    parser.add_argument("out_dir", type=str,
+                        help="Export target directory")
+    parser.add_argument("-f", "--functions", action="store",
+                        help="Comma separated list of functions to export")
+    parser.add_argument("-m", "--manual", action="store_true",
+                        help="Manual mode: show IDA dialog boxes")
+    parser.add_argument("-v", "--verbose", action="store_true",
+                        help="Enable verbose output")
+    args = parser.parse_args(args=sys.argv[1:])
+
+    if args.verbose:
+        logging.basicConfig(level=logging.DEBUG)
+        logging.getLogger().setLevel(logging.DEBUG)
+    else:
+        logging.basicConfig(level=logging.INFO)
+        logging.getLogger().setLevel(logging.INFO)
+
+    if not os.path.isfile(args.file_path):
+        logger.warning('%s is not a file', args.file_path)
+        return -1
+
+    functions = args.functions.split(',')
+    export_fimages(args.file_path, args.out_dir, functions, args.manual)
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/setup.py
+++ b/setup.py
@@ -0,0 +1,62 @@
+import os
+import sys
+
+import setuptools
+
+
+requirements = [
+    "six",
+    "tqdm",
+    "pyyaml",
+    "tabulate",
+]
+
+if sys.version_info >= (3, 0):
+    # py3
+    requirements.append("networkx")
+else:
+    # py2
+    requirements.append("enum34")
+    requirements.append("vivisect")
+    requirements.append("viv-utils")
+    requirements.append("networkx==2.2") # v2.2 is last version supported by Python 2.7
+
+# this sets __version__
+# via: http://stackoverflow.com/a/7071358/87207
+# and: http://stackoverflow.com/a/2073599/87207
+with open(os.path.join("capa", "version.py"), "rb") as f:
+    exec(f.read())
+
+
+def get_rule_paths():
+    return [os.path.join('..', x[0], '*.yml') for x in os.walk('rules')]
+
+
+setuptools.setup(
+    name='capa',
+    version=__version__,
+    description="",
+    long_description="",
+    author="Willi Ballenthin, Moritz Raabe",
+    author_email='william.ballenthin@mandiant.com, moritz.raabe@mandiant.com',
+    url='https://www.github.com/fireeye/capa',
+    packages=setuptools.find_packages(exclude=['tests', 'testbed']),
+    package_dir={'capa': 'capa'},
+    package_data={'capa': get_rule_paths()},
+    entry_points={
+        "console_scripts": [
+            "capa=capa.main:main",
+        ]
+    },
+    include_package_data=True,
+    install_requires=requirements,
+    zip_safe=False,
+    keywords='capa',
+    classifiers=[
+        'Development Status :: 3 - Alpha',
+        'Intended Audience :: Developers',
+        'Natural Language :: English',
+        "Programming Language :: Python :: 2",
+        "Programming Language :: Python :: 3",
+    ],
+)
--- a/tests/fixtures.py
+++ b/tests/fixtures.py
@@ -0,0 +1,78 @@
+import os
+import os.path
+import collections
+
+import pytest
+import viv_utils
+
+
+CD = os.path.dirname(__file__)
+
+
+Sample = collections.namedtuple('Sample', ['vw', 'path'])
+
+
+@pytest.fixture
+def mimikatz():
+    path = os.path.join(CD, 'data', 'mimikatz.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_a933a1a402775cfa94b6bee0963f4b46():
+    path = os.path.join(CD, 'data', 'a933a1a402775cfa94b6bee0963f4b46.dll_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def kernel32():
+    path = os.path.join(CD, 'data', 'kernel32.dll_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_a198216798ca38f280dc413f8c57f2c2():
+    path = os.path.join(CD, 'data', 'a198216798ca38f280dc413f8c57f2c2.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_9324d1a8ae37a36ae560c37448c9705a():
+    path = os.path.join(CD, 'data', '9324d1a8ae37a36ae560c37448c9705a.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def pma_lab_12_04():
+    path = os.path.join(CD, 'data', 'Practical Malware Analysis Lab 12-04.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_bfb9b5391a13d0afd787e87ab90f14f5():
+    path = os.path.join(CD, 'data', 'bfb9b5391a13d0afd787e87ab90f14f5.dll_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_lab21_01():
+    path = os.path.join(CD, 'data', 'Practical Malware Analysis Lab 21-01.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_c91887d861d9bd4a5872249b641bc9f9():
+    path = os.path.join(CD, 'data', 'c91887d861d9bd4a5872249b641bc9f9.exe_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41():
+    path = os.path.join(CD, 'data', '39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.dll_')
+    return Sample(viv_utils.getWorkspace(path), path)
+
+
+@pytest.fixture
+def sample_499c2a85f6e8142c3f48d4251c9c7cd6_raw32():
+    path = os.path.join(CD, 'data', '499c2a85f6e8142c3f48d4251c9c7cd6.raw32')
+    return Sample(viv_utils.getShellcodeWorkspace(path), path)
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@@ -0,0 +1,218 @@
+import textwrap
+
+import capa.rules
+import capa.engine
+from capa.engine import *
+import capa.features
+
+
+def test_element():
+    assert Element(1).evaluate(set([0])) == False
+    assert Element(1).evaluate(set([1])) == True
+    assert Element(1).evaluate(set([None])) == False
+    assert Element(1).evaluate(set([''])) == False
+    assert Element(1).evaluate(set([False])) == False
+
+
+def test_and():
+    assert And(Element(1)).evaluate(set([0])) == False
+    assert And(Element(1)).evaluate(set([1])) == True
+    assert And(Element(1), Element(2)).evaluate(set([0])) == False
+    assert And(Element(1), Element(2)).evaluate(set([1])) == False
+    assert And(Element(1), Element(2)).evaluate(set([2])) == False
+    assert And(Element(1), Element(2)).evaluate(set([1, 2])) == True
+
+
+def test_or():
+    assert Or(Element(1)).evaluate(set([0])) == False
+    assert Or(Element(1)).evaluate(set([1])) == True
+    assert Or(Element(1), Element(2)).evaluate(set([0])) == False
+    assert Or(Element(1), Element(2)).evaluate(set([1])) == True
+    assert Or(Element(1), Element(2)).evaluate(set([2])) == True
+    assert Or(Element(1), Element(2)).evaluate(set([1, 2])) == True
+
+
+def test_not():
+    assert Not(Element(1)).evaluate(set([0])) == True
+    assert Not(Element(1)).evaluate(set([1])) == False
+
+
+def test_some():
+    assert Some(0, Element(1)).evaluate(set([0])) == True
+    assert Some(1, Element(1)).evaluate(set([0])) == False
+
+    assert Some(2, Element(1), Element(2), Element(3)).evaluate(set([0])) == False
+    assert Some(2, Element(1), Element(2), Element(3)).evaluate(set([0, 1])) == False
+    assert Some(2, Element(1), Element(2), Element(3)).evaluate(set([0, 1, 2])) == True
+    assert Some(2, Element(1), Element(2), Element(3)).evaluate(set([0, 1, 2, 3])) == True
+    assert Some(2, Element(1), Element(2), Element(3)).evaluate(set([0, 1, 2, 3, 4])) == True
+
+
+def test_complex():
+    assert True == Or(
+        And(Element(1), Element(2)),
+        Or(Element(3),
+           Some(2, Element(4), Element(5), Element(6)))
+    ).evaluate(set([5, 6, 7, 8]))
+
+    assert False == Or(
+        And(Element(1), Element(2)),
+        Or(Element(3),
+           Some(2, Element(4), Element(5)))
+    ).evaluate(set([5, 6, 7, 8]))
+
+
+def test_range():
+    # unbounded range, but no matching feature
+    assert Range(Element(1)).evaluate({Element(2): {}}) == False
+
+    # unbounded range with matching feature should always match
+    assert Range(Element(1)).evaluate({Element(1): {}}) == True
+    assert Range(Element(1)).evaluate({Element(1): {0}}) == True
+
+    # unbounded max
+    assert Range(Element(1), min=1).evaluate({Element(1): {0}}) == True
+    assert Range(Element(1), min=2).evaluate({Element(1): {0}}) == False
+    assert Range(Element(1), min=2).evaluate({Element(1): {0, 1}}) == True
+
+    # unbounded min
+    assert Range(Element(1), max=0).evaluate({Element(1): {0}}) == False
+    assert Range(Element(1), max=1).evaluate({Element(1): {0}}) == True
+    assert Range(Element(1), max=2).evaluate({Element(1): {0}}) == True
+    assert Range(Element(1), max=2).evaluate({Element(1): {0, 1}}) == True
+    assert Range(Element(1), max=2).evaluate({Element(1): {0, 1, 3}}) == False
+
+    # we can do an exact match by setting min==max
+    assert Range(Element(1), min=1, max=1).evaluate({Element(1): {}}) == False
+    assert Range(Element(1), min=1, max=1).evaluate({Element(1): {1}}) == True
+    assert Range(Element(1), min=1, max=1).evaluate({Element(1): {1, 2}}) == False
+
+    # bounded range
+    assert Range(Element(1), min=1, max=3).evaluate({Element(1): {}}) == False
+    assert Range(Element(1), min=1, max=3).evaluate({Element(1): {1}}) == True
+    assert Range(Element(1), min=1, max=3).evaluate({Element(1): {1, 2}}) == True
+    assert Range(Element(1), min=1, max=3).evaluate({Element(1): {1, 2, 3}}) == True
+    assert Range(Element(1), min=1, max=3).evaluate({Element(1): {1, 2, 3, 4}}) == False
+
+
+def test_match_adds_matched_rule_feature():
+    '''show that using `match` adds a feature for matched rules.'''
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - number: 100
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    features, matches = capa.engine.match([r], {capa.features.insn.Number(100): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') in features
+
+
+def test_match_matched_rules():
+    '''show that using `match` adds a feature for matched rules.'''
+    rules = [
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule1
+                features:
+                    - number: 100
+        ''')),
+         capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule2
+                features:
+                    - match: test rule1
+        ''')),
+    ]
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                            {capa.features.insn.Number(100): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule1') in features
+    assert capa.features.MatchedRule('test rule2') in features
+
+    # the ordering of the rules must not matter,
+    # the engine should match rules in an appropriate order.
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(reversed(rules)),
+                                            {capa.features.insn.Number(100): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule1') in features
+    assert capa.features.MatchedRule('test rule2') in features
+
+
+def test_regex():
+    rules = [
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+             rule:
+                 meta:
+                     name: test rule
+                 features:
+                     - and:
+                         - string: /.*bbbb.*/
+         ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+             rule:
+                 meta:
+                     name: rule with implied wildcards
+                 features:
+                     - and:
+                         - string: /bbbb/
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+             rule:
+                 meta:
+                     name: rule with anchor
+                 features:
+                     - and:
+                         - string: /^bbbb/
+        ''')),
+    ]
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                            {capa.features.insn.Number(100): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') not in features
+
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                            {capa.features.String('aaaa'): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') not in features
+
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                          {capa.features.String('aBBBBa'): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') not in features
+
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                            {capa.features.String('abbbba'): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') in features
+    assert capa.features.MatchedRule('rule with implied wildcards') in features
+    assert capa.features.MatchedRule('rule with anchor') not in features
+
+
+def test_regex_ignorecase():
+    rules = [
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+             rule:
+                 meta:
+                     name: test rule
+                 features:
+                     - and:
+                         - string: /.*bbbb.*/i
+         ''')),
+    ]
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                          {capa.features.String('aBBBBa'): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') in features
+
+
+def test_regex_complex():
+    rules = [
+        capa.rules.Rule.from_yaml(textwrap.dedent(r'''
+             rule:
+                 meta:
+                     name: test rule
+                 features:
+                     - or:
+                         - string: /.*HARDWARE\\Key\\key with spaces\\.*/i
+         ''')),
+    ]
+    features, matches = capa.engine.match(capa.engine.topologically_order_rules(rules),
+                                            {capa.features.String(r'Hardware\Key\key with spaces\some value'): {1}}, 0x0)
+    assert capa.features.MatchedRule('test rule') in features
--- a/tests/test_freeze.py
+++ b/tests/test_freeze.py
@@ -0,0 +1,173 @@
+import textwrap
+
+import capa.main
+import capa.helpers
+import capa.features
+import capa.features.insn
+import capa.features.extractors
+import capa.features.freeze
+
+from fixtures import *
+
+
+EXTRACTOR = capa.features.extractors.NullFeatureExtractor({
+    'file features': [
+        (0x402345, capa.features.Characteristic('embedded pe', True)),
+    ],
+    'functions': {
+        0x401000: {
+            'features': [
+                (0x401000, capa.features.Characteristic('switch', True)),
+            ],
+            'basic blocks': {
+                0x401000: {
+                    'features': [
+                        (0x401000, capa.features.Characteristic('tight loop', True)),
+                    ],
+                    'instructions': {
+                        0x401000: {
+                            'features': [
+                                (0x401000, capa.features.insn.Mnemonic('xor')),
+                                (0x401000, capa.features.Characteristic('nzxor', True)),
+                            ],
+                        },
+                        0x401002: {
+                            'features': [
+                                (0x401002, capa.features.insn.Mnemonic('mov')),
+                            ]
+                        }
+                    }
+                },
+            }
+        },
+    }
+})
+
+
+def test_null_feature_extractor():
+    assert list(EXTRACTOR.get_functions()) == [0x401000]
+    assert list(EXTRACTOR.get_basic_blocks(0x401000)) == [0x401000]
+    assert list(EXTRACTOR.get_instructions(0x401000, 0x0401000)) == [0x401000, 0x401002]
+
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: xor loop
+                    scope: basic block
+                features:
+                    - and:
+                        - characteristic(tight loop): true
+                        - mnemonic: xor
+                        - characteristic(nzxor): true
+        ''')),
+    ])
+    capabilities = capa.main.find_capabilities(rules, EXTRACTOR)
+    assert 'xor loop' in capabilities
+
+
+def compare_extractors(a, b):
+    '''
+    args:
+      a (capa.features.extractors.NullFeatureExtractor)
+      b (capa.features.extractors.NullFeatureExtractor)
+    '''
+
+    # TODO: ordering of these things probably doesn't work yet
+
+    assert list(a.extract_file_features()) == list(b.extract_file_features())
+    assert list(a.get_functions()) == list(b.get_functions())
+    for f in a.get_functions():
+        assert list(a.get_basic_blocks(f)) == list(b.get_basic_blocks(f))
+        assert list(a.extract_function_features(f)) == list(b.extract_function_features(f))
+
+        for bb in a.get_basic_blocks(f):
+            assert list(a.get_instructions(f, bb)) == list(b.get_instructions(f, bb))
+            assert list(a.extract_basic_block_features(f, bb)) == list(b.extract_basic_block_features(f, bb))
+
+            for insn in a.get_instructions(f, bb):
+                assert list(a.extract_insn_features(f, bb, insn)) == list(b.extract_insn_features(f, bb, insn))
+
+
+def compare_extractors_viv_null(viv_ext, null_ext):
+    '''
+    almost identical to compare_extractors but adds casts to ints since the VivisectFeatureExtractor returns objects
+    and NullFeatureExtractor returns ints
+
+    args:
+      viv_ext (capa.features.extractors.viv.VivisectFeatureExtractor)
+      null_ext (capa.features.extractors.NullFeatureExtractor)
+    '''
+
+    # TODO: ordering of these things probably doesn't work yet
+
+    assert list(viv_ext.extract_file_features()) == list(null_ext.extract_file_features())
+    assert to_int(list(viv_ext.get_functions())) == list(null_ext.get_functions())
+    for f in viv_ext.get_functions():
+        assert to_int(list(viv_ext.get_basic_blocks(f))) == list(null_ext.get_basic_blocks(to_int(f)))
+        assert list(viv_ext.extract_function_features(f)) == list(null_ext.extract_function_features(to_int(f)))
+
+        for bb in viv_ext.get_basic_blocks(f):
+            assert to_int(list(viv_ext.get_instructions(f, bb))) == list(null_ext.get_instructions(to_int(f), to_int(bb)))
+            assert list(viv_ext.extract_basic_block_features(f, bb)) == list(null_ext.extract_basic_block_features(to_int(f), to_int(bb)))
+
+            for insn in viv_ext.get_instructions(f, bb):
+                assert list(viv_ext.extract_insn_features(f, bb, insn)) == list(null_ext.extract_insn_features(to_int(f), to_int(bb), to_int(insn)))
+
+
+def to_int(o):
+    '''helper to get int value of extractor items'''
+    if isinstance(o, list):
+        return map(lambda x: capa.helpers.oint(x), o)
+    else:
+        return capa.helpers.oint(o)
+
+
+def test_freeze_s_roundtrip():
+    load = capa.features.freeze.loads
+    dump = capa.features.freeze.dumps
+    reanimated = load(dump(EXTRACTOR))
+    compare_extractors(EXTRACTOR, reanimated)
+
+
+def test_freeze_b_roundtrip():
+    load = capa.features.freeze.load
+    dump = capa.features.freeze.dump
+    reanimated = load(dump(EXTRACTOR))
+    compare_extractors(EXTRACTOR, reanimated)
+
+
+def roundtrip_feature(feature):
+    serialize = capa.features.freeze.serialize_feature
+    deserialize = capa.features.freeze.deserialize_feature
+    assert feature == deserialize(serialize(feature))
+
+
+def test_serialize_features():
+    roundtrip_feature(capa.features.insn.API('advapi32.CryptAcquireContextW'))
+    roundtrip_feature(capa.features.String('SCardControl'))
+    roundtrip_feature(capa.features.insn.Number(0xFF))
+    roundtrip_feature(capa.features.insn.Offset(0x0))
+    roundtrip_feature(capa.features.insn.Mnemonic('push'))
+    roundtrip_feature(capa.features.file.Section('.rsrc'))
+    roundtrip_feature(capa.features.Characteristic('tight loop', True))
+    roundtrip_feature(capa.features.basicblock.BasicBlock())
+    roundtrip_feature(capa.features.file.Export('BaseThreadInitThunk'))
+    roundtrip_feature(capa.features.file.Import('kernel32.IsWow64Process'))
+    roundtrip_feature(capa.features.file.Import('#11'))
+
+
+def test_freeze_sample(tmpdir, sample_9324d1a8ae37a36ae560c37448c9705a):
+    # tmpdir fixture handles cleanup
+    o = tmpdir.mkdir("capa").join("test.frz").strpath
+    assert capa.features.freeze.main([sample_9324d1a8ae37a36ae560c37448c9705a.path, o, '-v']) == 0
+
+
+def test_freeze_load_sample(tmpdir, sample_9324d1a8ae37a36ae560c37448c9705a):
+    o = tmpdir.mkdir("capa").join("test.frz")
+    viv_extractor = capa.features.extractors.viv.VivisectFeatureExtractor(sample_9324d1a8ae37a36ae560c37448c9705a.vw,
+                                                                          sample_9324d1a8ae37a36ae560c37448c9705a.path)
+    with open(o.strpath, 'wb') as f:
+        f.write(capa.features.freeze.dump(viv_extractor))
+    null_extractor = capa.features.freeze.load(o.open('rb').read())
+    compare_extractors_viv_null(viv_extractor, null_extractor)
--- a/tests/test_helpers.py
+++ b/tests/test_helpers.py
@@ -0,0 +1,16 @@
+import codecs
+
+from capa.features.extractors import helpers
+
+
+def test_all_zeros():
+    # Python 2: <str>
+    # Python 3: <bytes>
+    a = b'\x00\x00\x00\x00'
+    b = codecs.decode('00000000', 'hex')
+    c = b'\x01\x00\x00\x00'
+    d = codecs.decode('01000000', 'hex')
+    assert helpers.all_zeros(a) is True
+    assert helpers.all_zeros(b) is True
+    assert helpers.all_zeros(c) is False
+    assert helpers.all_zeros(d) is False
--- a/tests/test_main.py
+++ b/tests/test_main.py
@@ -0,0 +1,188 @@
+import textwrap
+
+import capa.main
+import capa.rules
+import capa.engine
+from capa.engine import *
+import capa.features
+import capa.features.extractors.viv
+
+from fixtures import *
+
+
+def test_main(sample_9324d1a8ae37a36ae560c37448c9705a):
+    # tests rules can be loaded successfully
+    assert capa.main.main([sample_9324d1a8ae37a36ae560c37448c9705a.path, '-v']) == 0
+
+
+def test_main_shellcode(sample_499c2a85f6e8142c3f48d4251c9c7cd6_raw32):
+    assert capa.main.main([sample_499c2a85f6e8142c3f48d4251c9c7cd6_raw32.path, '-v', '-f', 'sc32']) == 0
+
+
+def test_ruleset():
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: file rule
+                    scope: file 
+                features:
+                  - characteristic(embedded pe): y
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: function rule
+                    scope: function 
+                features:
+                  - characteristic(switch): y
+        ''')),
+         capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: basic block rule
+                    scope: basic block
+                features:
+                  - characteristic(nzxor): y
+        ''')),
+
+    ])
+    assert len(rules.file_rules) == 1
+    assert len(rules.function_rules) == 1
+    assert len(rules.basic_block_rules) == 1
+
+
+def test_match_across_scopes_file_function(sample_9324d1a8ae37a36ae560c37448c9705a):
+    rules = capa.rules.RuleSet([
+        # this rule should match on a function (0x4073F0)
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: install service
+                    scope: function
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a:0x4073F0
+                features:
+                    - and:
+                        - api: advapi32.OpenSCManagerA
+                        - api: advapi32.CreateServiceA
+                        - api: advapi32.StartServiceA
+        ''')),
+        # this rule should match on a file feature
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: .text section
+                    scope: file
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a
+                features:
+                    - section: .text
+        ''')),
+        # this rule should match on earlier rule matches:
+        #  - install service, with function scope
+        #  - .text section, with file scope
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: .text section and install service
+                    scope: file
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a
+                features:
+                    - and:
+                      - match: install service
+                      - match: .text section
+        ''')),
+    ])
+    extractor = capa.features.extractors.viv.VivisectFeatureExtractor(sample_9324d1a8ae37a36ae560c37448c9705a.vw, sample_9324d1a8ae37a36ae560c37448c9705a.path)
+    capabilities = capa.main.find_capabilities(rules, extractor)
+    assert 'install service' in capabilities
+    assert '.text section' in capabilities
+    assert '.text section and install service' in capabilities
+
+
+def test_match_across_scopes(sample_9324d1a8ae37a36ae560c37448c9705a):
+    rules = capa.rules.RuleSet([
+        # this rule should match on a basic block (including at least 0x403685)
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: tight loop
+                    scope: basic block
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a:0x403685
+                features:
+                  - characteristic(tight loop): true
+        ''')),
+        # this rule should match on a function (0x403660)
+        # based on API, as well as prior basic block rule match
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: kill thread loop
+                    scope: function
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a:0x403660
+                features:
+                  - and:
+                    - api: kernel32.TerminateThread
+                    - api: kernel32.CloseHandle
+                    - match: tight loop
+        ''')),
+        # this rule should match on a file feature and a prior function rule match
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: kill thread program
+                    scope: file
+                    examples:
+                      - 9324d1a8ae37a36ae560c37448c9705a
+                features:
+                  - and:
+                    - section: .text
+                    - match: kill thread loop
+        ''')),
+    ])
+    extractor = capa.features.extractors.viv.VivisectFeatureExtractor(sample_9324d1a8ae37a36ae560c37448c9705a.vw, sample_9324d1a8ae37a36ae560c37448c9705a.path)
+    capabilities = capa.main.find_capabilities(rules, extractor)
+    assert 'tight loop' in capabilities
+    assert 'kill thread loop' in capabilities
+    assert 'kill thread program' in capabilities
+
+
+def test_subscope_bb_rules(sample_9324d1a8ae37a36ae560c37448c9705a):
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+             rule:
+                 meta:
+                     name: test rule
+                     scope: function
+                 features:
+                     - and:
+                         - basic block:
+                             - characteristic(tight loop): true
+         '''))
+    ])
+    # tight loop at 0x403685
+    extractor = capa.features.extractors.viv.VivisectFeatureExtractor(sample_9324d1a8ae37a36ae560c37448c9705a.vw, sample_9324d1a8ae37a36ae560c37448c9705a.path)
+    capabilities = capa.main.find_capabilities(rules, extractor)
+    assert 'test rule' in capabilities
+
+
+def test_byte_matching(sample_9324d1a8ae37a36ae560c37448c9705a):
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                 rule:
+                     meta:
+                         name: byte match test
+                         scope: function
+                     features:
+                         - and:
+                             - bytes: ED 24 9E F4 52 A9 07 47 55 8E E1 AB 30 8E 23 61
+             '''))
+    ])
+
+    extractor = capa.features.extractors.viv.VivisectFeatureExtractor(sample_9324d1a8ae37a36ae560c37448c9705a.vw, sample_9324d1a8ae37a36ae560c37448c9705a.path)
+    capabilities = capa.main.find_capabilities(rules, extractor)
+    assert 'byte match test' in capabilities
--- a/tests/test_rules.py
+++ b/tests/test_rules.py
@@ -0,0 +1,455 @@
+import textwrap
+
+import pytest
+
+import capa.rules
+from capa.engine import Element
+from capa.features.insn import Number, Offset
+
+
+def test_rule_ctor():
+    r = capa.rules.Rule('test rule', capa.rules.FUNCTION_SCOPE, Element(1), {})
+    assert r.evaluate(set([0])) == False
+    assert r.evaluate(set([1])) == True
+
+
+def test_rule_yaml():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+                author: user@domain.com
+                scope: function
+                examples:
+                    - foo1234
+                    - bar5678
+            features:
+                - and:
+                    - element: 1
+                    - element: 2
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate(set([0])) == False
+    assert r.evaluate(set([0, 1])) == False
+    assert r.evaluate(set([0, 1, 2])) == True
+    assert r.evaluate(set([0, 1, 2, 3])) == True
+
+
+def test_rule_yaml_complex():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - or:
+                    - and:
+                        - element: 1
+                        - element: 2
+                    - or:
+                        - element: 3
+                        - 2 or more:
+                            - element: 4
+                            - element: 5
+                            - element: 6
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate(set([5, 6, 7, 8])) == True
+    assert r.evaluate(set([6, 7, 8])) == False
+
+
+def test_rule_yaml_not():
+    rule = textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                features:
+                    - and:
+                        - element: 1
+                        - not:
+                            - element: 2
+        ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate(set([1])) == True
+    assert r.evaluate(set([1, 2])) == False
+
+
+def test_rule_yaml_count():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - count(element(100)): 1
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate({Element(100): {}}) == False
+    assert r.evaluate({Element(100): {1}}) == True
+    assert r.evaluate({Element(100): {1, 2}}) == False
+
+
+def test_rule_yaml_count_range():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - count(element(100)): (1, 2)
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate({Element(100): {}}) == False
+    assert r.evaluate({Element(100): {1}}) == True
+    assert r.evaluate({Element(100): {1, 2}}) == True
+    assert r.evaluate({Element(100): {1, 2, 3}}) == False
+
+
+def test_invalid_rule_feature():
+    with pytest.raises(capa.rules.InvalidRule):
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                features:
+                    - foo: true
+        '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                    scope: file
+                features:
+                    - characteristic(nzxor): true
+        '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                    scope: function
+                features:
+                    - characteristic(embedded pe): true
+        '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                    scope: basic block
+                features:
+                    - characteristic(embedded pe): true
+        '''))
+
+
+def test_lib_rules():
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: a lib rule
+                    lib: true
+                features:
+                    - api: CreateFileA
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: a standard rule
+                        lib: false
+                    features:
+                        - api: CreateFileW
+            ''')),
+    ])
+    assert len(rules.function_rules) == 1
+
+
+def test_subscope_rules():
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+            rule:
+                meta:
+                    name: test rule
+                    scope: file
+                features:
+                    - and:
+                        - characteristic(embedded pe): true
+                        - function:
+                            - and:
+                                - characteristic(nzxor): true
+                                - characteristic(switch): true
+        '''))
+    ])
+    # the file rule scope will have one rules:
+    #  - `test rule`
+    assert len(rules.file_rules) == 1
+
+    # the function rule scope have one rule:
+    #  - the rule on which `test rule` depends
+    assert len(rules.function_rules) == 1
+
+
+def test_duplicate_rules():
+    with pytest.raises(capa.rules.InvalidRule):
+        rules = capa.rules.RuleSet([
+            capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule-name
+                    features:
+                        - api: CreateFileA
+            ''')),
+            capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: rule-name
+                        features:
+                            - api: CreateFileW
+                ''')),
+        ])
+
+
+def test_missing_dependency():
+    with pytest.raises(capa.rules.InvalidRule):
+        rules = capa.rules.RuleSet([
+            capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: dependent rule
+                        features:
+                            - match: missing rule
+                ''')),
+        ])
+
+
+def test_invalid_rules():
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - characteristic(number(1)): True
+            '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: test rule
+                    features:
+                        - characteristic(count(element(100))): True
+            '''))
+
+
+def test_number_symbol():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - and:
+                    - number: 1
+                    - number: -1
+                    - number: 2 = symbol name
+                    - number: 3  =  symbol name
+                    - number: 4  =  symbol name = another name
+                    - number: 0x100 = symbol name
+                    - number: 0x11 = (FLAG_A | FLAG_B)
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    children = list(r.statement.get_children())
+    assert (Number(1) in children) == True
+    assert (Number(-1) in children) == True
+    assert (Number(2, 'symbol name') in children) == True
+    assert (Number(3, 'symbol name') in children) == True
+    assert (Number(4, 'symbol name = another name') in children) == True
+    assert (Number(0x100, 'symbol name') in children) == True
+
+
+def test_count_number_symbol():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - or:
+                    - count(number(2 = symbol name)): 1
+                    - count(number(0x100 = symbol name)): 2 or more
+                    - count(number(0x11 = (FLAG_A | FLAG_B))): 2 or more
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate({Number(2): {}}) == False
+    assert r.evaluate({Number(2): {1}}) == True
+    assert r.evaluate({Number(2): {1, 2}}) == False
+    assert r.evaluate({Number(0x100, 'symbol name'): {1}}) == False
+    assert r.evaluate({Number(0x100, 'symbol name'): {1, 2, 3}}) == True
+
+
+def test_invalid_number():
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - number: "this is a string"
+                '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - number: 2=
+                '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - number: symbol name = 2
+                '''))
+
+
+def test_offset_symbol():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - and:
+                    - offset: 1
+                    # what about negative offsets?
+                    - offset: 2 = symbol name
+                    - offset: 3  =  symbol name
+                    - offset: 4  =  symbol name = another name
+                    - offset: 0x100 = symbol name
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    children = list(r.statement.get_children())
+    assert (Offset(1) in children) == True
+    assert (Offset(2, 'symbol name') in children) == True
+    assert (Offset(3, 'symbol name') in children) == True
+    assert (Offset(4, 'symbol name = another name') in children) == True
+    assert (Offset(0x100, 'symbol name') in children) == True
+
+
+def test_count_offset_symbol():
+    rule = textwrap.dedent('''
+        rule:
+            meta:
+                name: test rule
+            features:
+                - or:
+                    - count(offset(2 = symbol name)): 1
+                    - count(offset(0x100 = symbol name)): 2 or more
+                    - count(offset(0x11 = (FLAG_A | FLAG_B))): 2 or more
+    ''')
+    r = capa.rules.Rule.from_yaml(rule)
+    assert r.evaluate({Offset(2): {}}) == False
+    assert r.evaluate({Offset(2): {1}}) == True
+    assert r.evaluate({Offset(2): {1, 2}}) == False
+    assert r.evaluate({Offset(0x100, 'symbol name'): {1}}) == False
+    assert r.evaluate({Offset(0x100, 'symbol name'): {1, 2, 3}}) == True
+
+
+def test_invalid_offset():
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - offset: "this is a string"
+                '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - offset: 2=
+                '''))
+
+    with pytest.raises(capa.rules.InvalidRule):
+        r = capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: test rule
+                        features:
+                            - offset: symbol name = 2
+                '''))
+
+
+def test_filter_rules():
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule 1
+                        author: joe
+                    features:
+                        - api: CreateFile
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule 2
+                    features:
+                        - string: joe
+        ''')),
+    ])
+    rules = rules.filter_rules_by_meta('joe')
+    assert len(rules) == 1
+    assert ('rule 1' in rules.rules)
+
+
+def test_filter_rules_dependencies():
+    rules = capa.rules.RuleSet([
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule 1
+                    features:
+                        - match: rule 2
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule 2
+                    features:
+                        - match: rule 3
+        ''')),
+        capa.rules.Rule.from_yaml(textwrap.dedent('''
+                rule:
+                    meta:
+                        name: rule 3
+                    features:
+                        - api: CreateFile
+        ''')),
+    ])
+    rules = rules.filter_rules_by_meta('rule 1')
+    assert(len(rules.rules) == 3)
+    assert('rule 1' in rules.rules)
+    assert('rule 2' in rules.rules)
+    assert('rule 3' in rules.rules)
+
+
+def test_filter_rules_missing_dependency():
+    with pytest.raises(capa.rules.InvalidRule):
+        capa.rules.RuleSet([
+            capa.rules.Rule.from_yaml(textwrap.dedent('''
+                    rule:
+                        meta:
+                            name: rule 1
+                            author: joe
+                        features:
+                            - match: rule 2
+            ''')),
+        ])
--- a/tests/test_viv_features.py
+++ b/tests/test_viv_features.py
@@ -0,0 +1,297 @@
+import collections
+
+import viv_utils
+
+import capa.features
+import capa.features.file
+import capa.features.function
+import capa.features.basicblock
+import capa.features.insn
+import capa.features.extractors.viv.file
+import capa.features.extractors.viv.function
+import capa.features.extractors.viv.basicblock
+import capa.features.extractors.viv.insn
+
+from fixtures import *
+
+
+def extract_file_features(vw, path):
+    features = set([])
+    for feature, va in capa.features.extractors.viv.file.extract_features(vw, path):
+        features.add(feature)
+    return features
+
+
+def extract_function_features(f):
+    features = collections.defaultdict(set)
+    for bb in f.basic_blocks:
+        for insn in bb.instructions:
+            for feature, va in capa.features.extractors.viv.insn.extract_features(f, bb, insn):
+                features[feature].add(va)
+        for feature, va in capa.features.extractors.viv.basicblock.extract_features(f, bb):
+            features[feature].add(va)
+    for feature, va in capa.features.extractors.viv.function.extract_features(f):
+        features[feature].add(va)
+    return features
+
+
+def extract_basic_block_features(f, bb):
+    features = set({})
+    for insn in bb.instructions:
+        for feature, _ in capa.features.extractors.viv.insn.extract_features(f, bb, insn):
+            features.add(feature)
+    for feature, _ in capa.features.extractors.viv.basicblock.extract_features(f, bb):
+        features.add(feature)
+    return features
+
+
+def test_api_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x403BAC))
+    assert capa.features.insn.API('advapi32.CryptAcquireContextW') in features
+    assert capa.features.insn.API('advapi32.CryptAcquireContext') in features
+    assert capa.features.insn.API('advapi32.CryptGenKey') in features
+    assert capa.features.insn.API('advapi32.CryptImportKey') in features
+    assert capa.features.insn.API('advapi32.CryptDestroyKey') in features
+    assert capa.features.insn.API('CryptAcquireContextW') in features
+    assert capa.features.insn.API('CryptAcquireContext') in features
+    assert capa.features.insn.API('CryptGenKey') in features
+    assert capa.features.insn.API('CryptImportKey') in features
+    assert capa.features.insn.API('CryptDestroyKey') in features
+
+
+def test_api_features_64_bit(sample_a198216798ca38f280dc413f8c57f2c2):
+    features = extract_function_features(viv_utils.Function(sample_a198216798ca38f280dc413f8c57f2c2.vw, 0x4011B0))
+    assert capa.features.insn.API('kernel32.GetStringTypeA') in features
+    assert capa.features.insn.API('kernel32.GetStringType') in features
+    assert capa.features.insn.API('GetStringTypeA') in features
+    assert capa.features.insn.API('GetStringType') in features
+    # call via thunk in IDA Pro
+    features = extract_function_features(viv_utils.Function(sample_a198216798ca38f280dc413f8c57f2c2.vw, 0x401CB0))
+    assert capa.features.insn.API('msvcrt.vfprintf') in features
+    assert capa.features.insn.API('vfprintf') in features
+
+
+def test_string_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x40105D))
+    assert capa.features.String('SCardControl') in features
+    assert capa.features.String('SCardTransmit') in features
+    assert capa.features.String('ACR  > ') in features
+    # other strings not in this function
+    assert capa.features.String('bcrypt.dll') not in features
+
+
+def test_byte_features(sample_9324d1a8ae37a36ae560c37448c9705a):
+    features = extract_function_features(viv_utils.Function(sample_9324d1a8ae37a36ae560c37448c9705a.vw, 0x406F60))
+    wanted = capa.features.Bytes(b"\xED\x24\x9E\xF4\x52\xA9\x07\x47\x55\x8E\xE1\xAB\x30\x8E\x23\x61")
+    # use `==` rather than `is` because the result is not `True` but a truthy value.
+    assert wanted.evaluate(features) == True
+
+
+def test_byte_features64(sample_lab21_01):
+    features = extract_function_features(viv_utils.Function(sample_lab21_01.vw, 0x1400010C0))
+    wanted = capa.features.Bytes(b"\x32\xA2\xDF\x2D\x99\x2B\x00\x00")
+    # use `==` rather than `is` because the result is not `True` but a truthy value.
+    assert wanted.evaluate(features) == True
+
+
+def test_number_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x40105D))
+    assert capa.features.insn.Number(0xFF) in features
+    assert capa.features.insn.Number(0x3136B0) in features
+    # the following are stack adjustments
+    assert capa.features.insn.Number(0xC) not in features
+    assert capa.features.insn.Number(0x10) not in features
+
+
+def test_offset_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x40105D))
+    assert capa.features.insn.Offset(0x0) in features
+    assert capa.features.insn.Offset(0x4) in features
+    assert capa.features.insn.Offset(0xC) in features
+    # the following are stack references
+    assert capa.features.insn.Offset(0x8) not in features
+    assert capa.features.insn.Offset(0x10) not in features
+
+
+def test_nzxor_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x410DFC))
+    assert capa.features.Characteristic('nzxor', True) in features  # 0x0410F0B
+
+
+def get_bb_insn(f, va):
+    '''fetch the BasicBlock and Instruction instances for the given VA in the given function.'''
+    for bb in f.basic_blocks:
+        for insn in bb.instructions:
+            if insn.va == va:
+                return (bb, insn)
+    raise KeyError(va)
+
+
+def test_is_security_cookie(mimikatz):
+    # not a security cookie check
+    f = viv_utils.Function(mimikatz.vw, 0x410DFC)
+    for va in [0x0410F0B]:
+        bb, insn = get_bb_insn(f, va)
+        assert capa.features.extractors.viv.insn.is_security_cookie(f, bb, insn) == False
+
+    # security cookie initial set and final check
+    f = viv_utils.Function(mimikatz.vw, 0x46C54A)
+    for va in [0x46C557, 0x46C63A]:
+        bb, insn = get_bb_insn(f, va)
+        assert capa.features.extractors.viv.insn.is_security_cookie(f, bb, insn) == True
+
+
+def test_mnemonic_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x40105D))
+    assert capa.features.insn.Mnemonic('push') in features
+    assert capa.features.insn.Mnemonic('movzx') in features
+    assert capa.features.insn.Mnemonic('xor') in features
+
+    assert capa.features.insn.Mnemonic('in') not in features
+    assert capa.features.insn.Mnemonic('out') not in features
+
+
+def test_peb_access_features(sample_a933a1a402775cfa94b6bee0963f4b46):
+    features = extract_function_features(viv_utils.Function(sample_a933a1a402775cfa94b6bee0963f4b46.vw, 0xABA6FEC))
+    assert capa.features.Characteristic('peb access', True) in features
+
+
+def test_file_section_name_features(mimikatz):
+    features = extract_file_features(mimikatz.vw, mimikatz.path)
+    assert capa.features.file.Section('.rsrc') in features
+    assert capa.features.file.Section('.text') in features
+    assert capa.features.file.Section('.nope') not in features
+
+
+def test_tight_loop_features(mimikatz):
+    f = viv_utils.Function(mimikatz.vw, 0x402EC4)
+    for bb in f.basic_blocks:
+        if bb.va != 0x402F8E:
+            continue
+        features = extract_basic_block_features(f, bb)
+        assert capa.features.Characteristic('tight loop', True) in features
+        assert capa.features.basicblock.BasicBlock() in features
+
+
+def test_tight_loop_bb_features(mimikatz):
+    f = viv_utils.Function(mimikatz.vw, 0x402EC4)
+    for bb in f.basic_blocks:
+        if bb.va != 0x402F8E:
+            continue
+        features = extract_basic_block_features(f, bb)
+        assert capa.features.Characteristic('tight loop', True) in features
+        assert capa.features.basicblock.BasicBlock() in features
+
+
+def test_file_export_name_features(kernel32):
+    features = extract_file_features(kernel32.vw, kernel32.path)
+    assert capa.features.file.Export('BaseThreadInitThunk') in features
+    assert capa.features.file.Export('lstrlenW') in features
+
+
+def test_file_import_name_features(mimikatz):
+    features = extract_file_features(mimikatz.vw, mimikatz.path)
+    assert capa.features.file.Import('advapi32.CryptSetHashParam') in features
+    assert capa.features.file.Import('CryptSetHashParam') in features
+    assert capa.features.file.Import('kernel32.IsWow64Process') in features
+    assert capa.features.file.Import('msvcrt.exit') in features
+    assert capa.features.file.Import('cabinet.#11') in features
+    assert capa.features.file.Import('#11') not in features
+
+
+def test_cross_section_flow_features(sample_a198216798ca38f280dc413f8c57f2c2):
+    features = extract_function_features(viv_utils.Function(sample_a198216798ca38f280dc413f8c57f2c2.vw, 0x4014D0))
+    assert capa.features.Characteristic('cross section flow', True) in features
+
+    # this function has calls to some imports,
+    # which should not trigger cross-section flow characteristic
+    features = extract_function_features(viv_utils.Function(sample_a198216798ca38f280dc413f8c57f2c2.vw, 0x401563))
+    assert capa.features.Characteristic('cross section flow', True) not in features
+
+
+def test_segment_access_features(sample_a933a1a402775cfa94b6bee0963f4b46):
+    features = extract_function_features(viv_utils.Function(sample_a933a1a402775cfa94b6bee0963f4b46.vw, 0xABA6FEC))
+    assert capa.features.Characteristic('fs access', True) in features
+
+
+def test_thunk_features(sample_9324d1a8ae37a36ae560c37448c9705a):
+    features = extract_function_features(viv_utils.Function(sample_9324d1a8ae37a36ae560c37448c9705a.vw, 0x407970))
+    assert capa.features.insn.API('kernel32.CreateToolhelp32Snapshot') in features
+    assert capa.features.insn.API('CreateToolhelp32Snapshot') in features
+
+
+def test_file_embedded_pe(pma_lab_12_04):
+    features = extract_file_features(pma_lab_12_04.vw, pma_lab_12_04.path)
+    assert capa.features.Characteristic('embedded pe', True) in features
+
+
+def test_stackstring_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x4556E5))
+    assert capa.features.Characteristic('stack string', True) in features
+
+
+def test_switch_features(mimikatz):
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x409411))
+    assert capa.features.Characteristic('switch', True) in features
+
+    features = extract_function_features(viv_utils.Function(mimikatz.vw, 0x409393))
+    assert capa.features.Characteristic('switch', True) not in features
+
+
+def test_recursive_call_feature(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41):
+    features = extract_function_features(viv_utils.Function(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.vw, 0x10003100))
+    assert capa.features.Characteristic('recursive call', True) in features
+
+    features = extract_function_features(viv_utils.Function(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.vw, 0x10007B00))
+    assert capa.features.Characteristic('recursive call', True) not in features
+
+
+def test_loop_feature(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41):
+    features = extract_function_features(viv_utils.Function(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.vw, 0x10003D30))
+    assert capa.features.Characteristic('loop', True) in features
+
+    features = extract_function_features(viv_utils.Function(sample_39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.vw, 0x10007250))
+    assert capa.features.Characteristic('loop', True) not in features
+
+
+def test_file_string_features(sample_bfb9b5391a13d0afd787e87ab90f14f5):
+    features = extract_file_features(sample_bfb9b5391a13d0afd787e87ab90f14f5.vw, sample_bfb9b5391a13d0afd787e87ab90f14f5.path)
+    assert capa.features.String('WarStop') in features  # ASCII, offset 0x40EC
+    assert capa.features.String('cimage/png') in features  # UTF-16 LE, offset 0x350E
+
+
+def test_function_calls_to(sample_9324d1a8ae37a36ae560c37448c9705a):
+    features = extract_function_features(viv_utils.Function(sample_9324d1a8ae37a36ae560c37448c9705a.vw, 0x406F60))
+    assert capa.features.Characteristic('calls to', True) in features
+    assert len(features[capa.features.Characteristic('calls to', True)]) == 1
+
+
+def test_function_calls_to64(sample_lab21_01):
+    features = extract_function_features(viv_utils.Function(sample_lab21_01.vw, 0x1400052D0))  # memcpy
+    assert capa.features.Characteristic('calls to', True) in features
+    assert len(features[capa.features.Characteristic('calls to', True)]) == 8
+
+
+def test_function_calls_from(sample_9324d1a8ae37a36ae560c37448c9705a):
+    features = extract_function_features(viv_utils.Function(sample_9324d1a8ae37a36ae560c37448c9705a.vw, 0x406F60))
+    assert capa.features.Characteristic('calls from', True) in features
+    assert len(features[capa.features.Characteristic('calls from', True)]) == 23
+
+
+def test_basic_block_count(sample_9324d1a8ae37a36ae560c37448c9705a):
+    features = extract_function_features(viv_utils.Function(sample_9324d1a8ae37a36ae560c37448c9705a.vw, 0x406F60))
+    assert len(features[capa.features.basicblock.BasicBlock()]) == 26
+
+
+def test_indirect_call_features(sample_a933a1a402775cfa94b6bee0963f4b46):
+    features = extract_function_features(viv_utils.Function(sample_a933a1a402775cfa94b6bee0963f4b46.vw, 0xABA68A0))
+    assert capa.features.Characteristic('indirect call', True) in features
+    assert len(features[capa.features.Characteristic('indirect call', True)]) == 3
+
+
+def test_indirect_calls_resolved(sample_c91887d861d9bd4a5872249b641bc9f9):
+    features = extract_function_features(viv_utils.Function(sample_c91887d861d9bd4a5872249b641bc9f9.vw, 0x401A77))
+    assert capa.features.insn.API('kernel32.CreatePipe') in features
+    assert capa.features.insn.API('kernel32.SetHandleInformation') in features
+    assert capa.features.insn.API('kernel32.CloseHandle') in features
+    assert capa.features.insn.API('kernel32.WriteFile') in features