Go to file
Willi Ballenthin ee17d75be9 implement BinExport2 backend (#1950)
* elf: os: detect Android via clang compiler .ident note

* elf: os: detect Android via dependency on liblog.so

* main: split main into a bunch of "main routines"

[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.

* features: add BinExport2 declarations

* BinExport2: initial skeleton of feature extraction

* main: remove references to wip BinExport2 code

* changelog

* main: rename first position argument "input_file"

closes #1946

* main: linters

* main: move rule-related routines to capa.rules

ref #1821

* main: extract routines to capa.loader module

closes #1821

* add loader module

* loader: learn to load freeze format

* freeze: use new cli arg handling

* Update capa/loader.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* main: remove duplicate documentation

* main: add doc about where some functions live

* scripts: migrate to new main wrapper helper functions

* scripts: port to main routines

* main: better handle auto-detection of backend

* scripts: migrate bulk-process to main wrappers

* scripts: migrate scripts to main wrappers

* main: rename *_from_args to *_from_cli

* changelog

* cache-ruleset: remove duplication

* main: fix tag handling

* cache-ruleset: fix cli args

* cache-ruleset: fix special rule cli handling

* scripts: fix type bytes

* main: nicely format debug messages

* helpers: ensure log messages aren't very long

* flake8 config

* binexport2: formatting

* loader: learn to load BinExport2 files

* main: debug log the format and backend

* elf: add more arch constants

* binexport: parse global features

* binexport: extract file features

* binexport2: begin to enumerate function/bb/insns

* binexport: pass context to function/bb/insn extractors

* binexport: linters

* binexport: linters

* scripts: add script to inspect binexport2 file

* inspect-binexport: fix xref symbols

* inspect-binexport: factor out the index building

* binexport: move index to binexport extractor module

* binexport: implement ELF/aarch64 GOT/thunk analyzer

* binexport: implement API features

* binexport: record the full vertex for a thunk

* binexport: learn to extract numbers

* binexport: number: skipped mapped numbers

* binexport: fix basic block address indexing

* binexport: rename function

* binexport: extract operand numbers

* binexport: learn to extract calls from characteristics

* binexport: learn to extract mnemonics

* pre-commit: skip protobuf file

* binexport: better search for sample file

* loader: add file extractors for BinExport2

* binexport: remove extra parameter

* new black config

* binexport: index string xrefs

* binexport: learn to extract bytes and strings

* binexport: cache parsed PE/ELF

* binexport: handle Ghidra SYMBOL numbers

* binexport2: handle binexport#78 (Ghidra only uses SYMBOL expresssions)

* main: write error output to stderr, not stdout

* scripts: add example detect-binexport2-capabilities.py

* detect-binexport2-capabilities: more documentation/examples

* elffile: recognize more architectures

* binexport: handle read_memory errors

* binexport: index flow graphs by address

* binexport: cleanup logging

* binexport: learn to extract function names

* binexport: learn to extract all function features

* binexport: learn to extract bb tight loops

* elf: don't require vivisect just for type annotations

* main: remove unused imports

* rules: don't eagerly import ruamel until needed

* loader: avoid eager imports of some backend-related code

* changelog

* fmt

* binexport: better render optional fields

* fix merge conflicts

* fix formatting

* remove Ghidra data reference madness

* handle PermissionError when searching sample file for BinExport2 file

* handle PermissionError when searching sample file for BinExport2 file

* add Android as valid OS

* inspect-binexport: strip strings

* inspect-binexport: render operands

* fix lints

* ruff: update config layout

* inspect-binexport: better align comments/xrefs

* use explicit search paths to get sample for BinExport file

* add initial BinExport tests

* add/update BinExport tests and minor fixes

* inspect-binexport: add perf tracking

* inspect-binexport: cache rendered operands

* lints

* do not extract number features for ret instructions

* Fix BinExport's "tight loop" feature extraction.

`idx.target_edges_by_basic_block_index[basic_block_index]` is of type
`List[Edges]`. The index `basic_block_index` was definitely not an
element.

* inspect-binexport: better render data section

* linters

* main: accept --format=binexport2

* binexport: insn: add support for parsing bare immediate int operands

* binexport2: bb: fix tight loop detection

ref #2050

* binexport: api: generate variations of Win32 APIs

* lints

* binexport: index: don't assume instruction index is 1:1 with address

* be2: index instruction addresses

* be2: temp remove bytes feature processing

* binexport: read memory from an address space extracted from PE/ELF

closes #2061

* be2: resolve thunks to imported functions

* be2: check for be2 string reference before bytes/string extraction overhead

* be2: remove unneeded check

* be2: do not process thunks

* be2: insn: polish thunk handling a bit

* be2: pre-compute thunk targets

* parse negative numbers

* update tests to use Ghidra-generated BinExport file

* remove unused import

* black reformat

* run tests always (for now)

* binexport: tests: fix test case

* binexport: extractor: fix insn lint

* binexport: addressspace: use base address recovered from binexport file

* Add nzxor charecteristic in BinExport extractor.

by referencing vivisect implementation.

* add tests, fix stack cookie detection

* test BinExport feature PRs

* reformat and fix

* complete TODO descriptions

* wip tests

* binexport: add typing where applicable (#2106)

* binexport2: revert import names from BinExport2 proto

binexport2_pb.BinExport2 isnt a package so we can't import it like:

    from ...binexport2_pb.BinExport2 import CallGraph

* fix stack offset numbers and disable offset tests

* xfail OperandOffset

* generate symbol variants

* wip: read negative numbers

* update tight loop tests

* binexport: fix function loop feature detection

* binexport: update binexport function loop tests

* binexport: fix lints and imports

* binexport: add back assert statement to thunk calculation

* binexport: update tests to use Ghidra binexport file

* binexport: add additional debug info to thunk calculation assert

* binexport: update unit tests to focus on Ghidra

* binexport: fix lints

* binexport: remove Ghidra symbol madness and fix x86/amd64 stack offset number tests

* binexport: use masking for Number features

* binexport: ignore call/jmp immediates for intel architecture

* binexport: check if immediate is a mapped address

* binexport: emit offset features for immediates likely structure offsets

* binexport: add twos complement wrapper insn.py

* binexport: add support for x86 offset features

* binexport: code refactor

* binexport: init refactor for multi-arch instruction feature parsing

* binexport: intel: emit indirect call characteristic

* binexport: use helper method for instruction mnemonic

* binexport: arm: emit offset features from stp instruction

* binexport: arm: emit indirect call characteristic

* binexport: arm: improve offset feature extraction

* binexport: add workaroud for Ghidra bug that results in empty operands (no expressions)

* binexport: skip x86 stack string tests

* binexport: update mimikatz.exe_ feature count tests for Ghidra

* core: loader: update binja import

* core: loader: update binja imports

* binexport: arm: ignore number features for add instruction manipulating stack

* binexport: update unit tests

* binexport: arm: ignore number features for sub instruction manipulating stack

* binexport: arm: emit offset features for add instructions

* binexport: remove TODO from tests workflow

* binexport: update CHANGELOG

* binexport: remove outdated TODOs

* binexport: re-enable support for data references in inspect-binexport2.py

* binexport: skip data references to code

* binexport: remove outdated TODOs

* Update scripts/inspect-binexport2.py

* Update CHANGELOG.md

* Update capa/helpers.py

* Update capa/features/extractors/common.py

* Update capa/features/extractors/binexport2/extractor.py

* Update capa/features/extractors/binexport2/arch/arm/insn.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* initial add

* test binexport scripts

* add tests using small ARM ELF

* add method to get instruction by address

* index instructions by address

* adjust and extend tests

* handle operator with no children bug

* binexport: use instruction address index

ref: https://github.com/mandiant/capa/pull/1950/files#r1728570811

* inspect binexport: handle lsl with no children

add pruning phase to expression tree building
to remove known-bad branches. This might address
some of the data we're seeing due to:
https://github.com/NationalSecurityAgency/ghidra/issues/6821

Also introduces a --instruction optional argument
to dump the details of a specific instruction.

* binexport: consolidate expression tree logic into helpers

* binexport: index instruction indices by address

* binexport: introduce instruction pattern matching

Introduce intruction pattern matching to declaratively
describe the instructions and operands that we want to
extract. While there's a bit more code, its much more
thoroughly tested, and is less brittle than the prior
if/else/if/else/if/else implementation.

* binexport: helpers: fix missing comment words

* binexport: update tests to reflect updated test files

* remove testing of feature branch

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
Co-authored-by: Lin Chen <larch.lin.chen@gmail.com>
2024-09-12 10:09:05 -06:00
2024-09-11 20:28:04 +02:00
2024-09-12 10:09:05 -06:00
2024-09-11 15:42:34 +00:00
2021-09-29 12:55:16 +02:00
2024-09-11 20:28:04 +02:00
2021-03-19 09:40:44 +01:00
2024-09-11 20:28:04 +02:00



capa detects capabilities in executable files. You run it against a PE, ELF, .NET module, shellcode file, or a sandbox report and it tells you what it thinks the program can do. For example, it might suggest that the file is a backdoor, is capable of installing services, or relies on HTTP to communicate.

To interactively inspect capa results in your browser use the capa Explorer Web.

If you want to inspect or write capa rules, head on over to the capa-rules repository. Otherwise, keep reading.

Below you find a list of our capa blog posts with more details.

example capa output

$ capa.exe suspicious.exe

+------------------------+--------------------------------------------------------------------------------+
| ATT&CK Tactic          | ATT&CK Technique                                                               |
|------------------------+--------------------------------------------------------------------------------|
| DEFENSE EVASION        | Obfuscated Files or Information [T1027]                                        |
| DISCOVERY              | Query Registry [T1012]                                                         |
|                        | System Information Discovery [T1082]                                           |
| EXECUTION              | Command and Scripting Interpreter::Windows Command Shell [T1059.003]           |
|                        | Shared Modules [T1129]                                                         |
| EXFILTRATION           | Exfiltration Over C2 Channel [T1041]                                           |
| PERSISTENCE            | Create or Modify System Process::Windows Service [T1543.003]                   |
+------------------------+--------------------------------------------------------------------------------+

+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY                                            | NAMESPACE                                       |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error                     | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server              | c2/file-transfer                                |
| execute shell command and capture output              | c2/shell                                        |
| receive data (2 matches)                              | communication                                   |
| send data (6 matches)                                 | communication                                   |
| connect to HTTP server (3 matches)                    | communication/http/client                       |
| send HTTP request (3 matches)                         | communication/http/client                       |
| create pipe                                           | communication/named-pipe/create                 |
| get socket status (2 matches)                         | communication/socket                            |
| receive data on socket (2 matches)                    | communication/socket/receive                    |
| send data on socket (3 matches)                       | communication/socket/send                       |
| connect TCP socket                                    | communication/socket/tcp                        |
| encode data using Base64                              | data-manipulation/encoding/base64               |
| encode data using XOR (6 matches)                     | data-manipulation/encoding/xor                  |
| run as a service                                      | executable/pe                                   |
| get common file path (3 matches)                      | host-interaction/file-system                    |
| read file                                             | host-interaction/file-system/read               |
| write file (2 matches)                                | host-interaction/file-system/write              |
| print debug messages (2 matches)                      | host-interaction/log/debug/write-event          |
| resolve DNS                                           | host-interaction/network/dns/resolve            |
| get hostname                                          | host-interaction/os/hostname                    |
| create a process with modified I/O handles and window | host-interaction/process/create                 |
| create process                                        | host-interaction/process/create                 |
| create registry key                                   | host-interaction/registry/create                |
| create service                                        | host-interaction/service/create                 |
| create thread                                         | host-interaction/thread/create                  |
| persist via Windows service                           | persistence/service                             |
+-------------------------------------------------------+-------------------------------------------------+

download and usage

Download stable releases of the standalone capa binaries here. You can run the standalone binaries without installation. capa is a command line tool that should be run from the terminal.

To use capa as a library or integrate with another tool, see doc/installation.md for further setup instructions.

capa Explorer Web

The capa Explorer Web enables you to interactively explore capa results in your web browser. Besides the online version you can download a standalone HTML file for local offline usage.

capa Explorer Web screenshot

More details on the web UI is available in the capa Explorer Web README.

example

In the above sample output, we run capa against an unknown binary (suspicious.exe), and the tool reports that the program can send HTTP requests, decode data via XOR and Base64, install services, and spawn new processes. Taken together, this makes us think that suspicious.exe could be a persistent backdoor. Therefore, our next analysis step might be to run suspicious.exe in a sandbox and try to recover the command and control server.

detailed results

By passing the -vv flag (for very verbose), capa reports exactly where it found evidence of these capabilities. This is useful for at least two reasons:

  • it helps explain why we should trust the results, and enables us to verify the conclusions, and
  • it shows where within the binary an experienced analyst might study with IDA Pro
$ capa.exe suspicious.exe -vv
...
execute shell command and capture output
namespace   c2/shell
author      matthew.williams@mandiant.com
scope       function
att&ck      Execution::Command and Scripting Interpreter::Windows Command Shell [T1059.003]
references  https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/ns-processthreadsapi-startupinfoa
function @ 0x4011C0
  and:
    match: create a process with modified I/O handles and window @ 0x4011C0
      and:
        number: 257 = STARTF_USESTDHANDLES | STARTF_USESHOWWINDOW @ 0x4012B8
        or:
          number: 68 = StartupInfo.cb (size) @ 0x401282
        or: = API functions that accept a pointer to a STARTUPINFO structure
          api: kernel32.CreateProcess @ 0x401343
    match: create pipe @ 0x4011C0
      or:
        api: kernel32.CreatePipe @ 0x40126F, 0x401280
    optional:
      match: create thread @ 0x40136A, 0x4013BA
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x4013D7
        or:
          and:
            os: windows
            or:
              api: kernel32.CreateThread @ 0x401395
    or:
      string: "cmd.exe" @ 0x4012FD
...

capa also supports dynamic capabilities detection for multiple sandboxes including:

  • CAPE (supported report formats: .json, .json_, .json.gz)
  • DRAKVUF (supported report formats: .log, .log.gz)
  • VMRay (supported report formats: analysis archive .zip)

To use this feature, submit your file to a supported sandbox and then download and run capa against the generated report file. This feature enables capa to match capabilities against dynamic and static features that the sandbox captured during execution.

Here's an example of running capa against a packed file, and then running capa against the CAPE report generated for the same packed file:

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.exe
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------
WARNING:capa.capabilities.common: This sample appears to be packed.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Packed samples have often been obfuscated to hide their logic.
WARNING:capa.capabilities.common: capa cannot handle obfuscation well using static analysis. This means the results may be misleading or incomplete.
WARNING:capa.capabilities.common: If possible, you should try to unpack this input file before analyzing it with capa.
WARNING:capa.capabilities.common: Alternatively, run the sample in a supported sandbox and invoke capa against the report to obtain dynamic analysis results.
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Identified via rule: (internal) packer file limitation
WARNING:capa.capabilities.common: 
WARNING:capa.capabilities.common: Use -v or -vv if you really want to see the capabilities identified by capa.
WARNING:capa.capabilities.common:--------------------------------------------------------------------------------

$ capa 05be49819139a3fdcdbddbdefd298398779521f3d68daa25275cc77508e42310.json

┍━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ ATT&CK Tactic          │ ATT&CK Technique                                                                   │
┝━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ CREDENTIAL ACCESS      │ Credentials from Password Stores T1555                                             │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DEFENSE EVASION        │ File and Directory Permissions Modification T1222                                  │
│                        │ Modify Registry T1112                                                              │
│                        │ Obfuscated Files or Information T1027                                              │
│                        │ Virtualization/Sandbox Evasion::User Activity Based Checks T1497.002               │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ DISCOVERY              │ Account Discovery T1087                                                            │
│                        │ Application Window Discovery T1010                                                 │
│                        │ File and Directory Discovery T1083                                                 │
│                        │ Query Registry T1012                                                               │
│                        │ System Information Discovery T1082                                                 │
│                        │ System Location Discovery::System Language Discovery T1614.001                     │
│                        │ System Owner/User Discovery T1033                                                  │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ EXECUTION              │ System Services::Service Execution T1569.002                                       │
├────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┤
│ PERSISTENCE            │ Boot or Logon Autostart Execution::Registry Run Keys / Startup Folder T1547.001    │
│                        │ Boot or Logon Autostart Execution::Winlogon Helper DLL T1547.004                   │
│                        │ Create or Modify System Process::Windows Service T1543.003                         │
┕━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

┍━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
│ Capability                                           │ Namespace                                            │
┝━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥
│ check for unmoving mouse cursor (3 matches)          │ anti-analysis/anti-vm/vm-detection                   │
│ gather bitkinex information                          │ collection/file-managers                             │
│ gather classicftp information                        │ collection/file-managers                             │
│ gather filezilla information                         │ collection/file-managers                             │
│ gather total-commander information                   │ collection/file-managers                             │
│ gather ultrafxp information                          │ collection/file-managers                             │
│ resolve DNS (23 matches)                             │ communication/dns                                    │
│ initialize Winsock library (7 matches)               │ communication/socket                                 │
│ act as TCP client (3 matches)                        │ communication/tcp/client                             │
│ create new key via CryptAcquireContext               │ data-manipulation/encryption                         │
│ encrypt or decrypt via WinCrypt                      │ data-manipulation/encryption                         │
│ hash data via WinCrypt                               │ data-manipulation/hashing                            │
│ initialize hashing via WinCrypt                      │ data-manipulation/hashing                            │
│ hash data with MD5                                   │ data-manipulation/hashing/md5                        │
│ generate random numbers via WinAPI                   │ data-manipulation/prng                               │
│ extract resource via kernel32 functions (2 matches)  │ executable/resource                                  │
│ interact with driver via control codes (2 matches)   │ host-interaction/driver                              │
│ get Program Files directory (18 matches)             │ host-interaction/file-system                         │
│ get common file path (575 matches)                   │ host-interaction/file-system                         │
│ create directory (2 matches)                         │ host-interaction/file-system/create                  │
│ delete file                                          │ host-interaction/file-system/delete                  │
│ get file attributes (122 matches)                    │ host-interaction/file-system/meta                    │
│ set file attributes (8 matches)                      │ host-interaction/file-system/meta                    │
│ move file                                            │ host-interaction/file-system/move                    │
│ find taskbar (3 matches)                             │ host-interaction/gui/taskbar/find                    │
│ get keyboard layout (12 matches)                     │ host-interaction/hardware/keyboard                   │
│ get disk size                                        │ host-interaction/hardware/storage                    │
│ get hostname (4 matches)                             │ host-interaction/os/hostname                         │
│ allocate or change RWX memory (3 matches)            │ host-interaction/process/inject                      │
│ query or enumerate registry key (3 matches)          │ host-interaction/registry                            │
│ query or enumerate registry value (8 matches)        │ host-interaction/registry                            │
│ delete registry key                                  │ host-interaction/registry/delete                     │
│ start service                                        │ host-interaction/service/start                       │
│ get session user name                                │ host-interaction/session                             │
│ persist via Run registry key                         │ persistence/registry/run                             │
│ persist via Winlogon Helper DLL registry key         │ persistence/registry/winlogon-helper                 │
│ persist via Windows service (2 matches)              │ persistence/service                                  │
┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙

capa rules

capa uses a collection of rules to identify capabilities within a program. These rules are easy to write, even for those new to reverse engineering. By authoring rules, you can extend the capabilities that capa recognizes. In some regards, capa rules are a mixture of the OpenIOC, Yara, and YAML formats.

Here's an example rule used by capa:

rule:
  meta:
    name: create TCP socket
    namespace: communication/socket/tcp
    authors:
      - william.ballenthin@mandiant.com
      - joakim@intezer.com
      - anushka.virgaonkar@mandiant.com
    scopes:
      static: basic block
      dynamic: call
    mbc:
      - Communication::Socket Communication::Create TCP Socket [C0001.011]
    examples:
      - Practical Malware Analysis Lab 01-01.dll_:0x10001010
  features:
    - or:
      - and:
        - number: 6 = IPPROTO_TCP
        - number: 1 = SOCK_STREAM
        - number: 2 = AF_INET
        - or:
          - api: ws2_32.socket
          - api: ws2_32.WSASocket
          - api: socket
      - property/read: System.Net.Sockets.TcpClient::Client

The github.com/mandiant/capa-rules repository contains hundreds of standard rules that are distributed with capa. Please learn to write rules and contribute new entries as you find interesting techniques in malware.

IDA Pro plugin: capa explorer

If you use IDA Pro, then you can use the capa explorer plugin. capa explorer helps you identify interesting areas of a program and build new capa rules using features extracted directly from your IDA Pro database. It also uses your local changes to the .idb to extract better features, such as when you rename a global variable that contains a dynamically resolved API address.

capa + IDA Pro integration

Ghidra integration

If you use Ghidra, then you can use the capa + Ghidra integration to run capa's analysis directly on your Ghidra database and render the results in Ghidra's user interface.

blog posts

further information

capa

capa rules

capa testfiles

The capa-testfiles repository contains the data we use to test capa's code and rules

Description
The FLARE team's open-source tool to identify capabilities in executable files.
Readme Cite this repository 51 MiB
Languages
Standard ML 77.7%
Python 21.6%
Vue 0.3%
JavaScript 0.2%
HTML 0.1%