Compare commits

..

7 Commits

Author SHA1 Message Date
Moritz
ec1ddb506c Merge pull request #1893 from mrexodia/dex-support
Initial plumbing to support DEX files
2024-01-31 12:03:23 +01:00
Duncan Ogilvie
e2f655428e Differentiate between function-name and import for DEX 2023-12-08 01:12:48 +01:00
Duncan Ogilvie
b5a4d766d9 Add string features for DEX and clean up method handling 2023-12-08 00:15:20 +01:00
Duncan Ogilvie
b77103a646 Mark DEX methods without code as library functions 2023-12-08 00:15:20 +01:00
Duncan Ogilvie
036f147df8 Support function-name, class, namespace for DEX 2023-12-08 00:15:20 +01:00
Duncan Ogilvie
52d20d2f46 Combine DEX feature extraction into a single class 2023-12-08 00:15:19 +01:00
Duncan Ogilvie
e90be5a9bb Initial plumbing to support DEX files 2023-12-08 00:15:16 +01:00
64 changed files with 2128 additions and 34510 deletions

2
.github/flake8.ini vendored
View File

@@ -10,8 +10,6 @@ extend-ignore =
F811,
# E501 line too long (prefer black)
E501,
# E701 multiple statements on one line (colon) (prefer black, see https://github.com/psf/black/issues/4173)
E701,
# B010 Do not call setattr with a constant attribute value
B010,
# G200 Logging statement uses exception in arguments

View File

@@ -17,6 +17,7 @@ a = Analysis(
# when invoking pyinstaller from the project root,
# this gets invoked from the directory of the spec file,
# i.e. ./.github/pyinstaller
("../../assets", "assets"),
("../../rules", "rules"),
("../../sigs", "sigs"),
("../../cache", "cache"),

View File

@@ -57,15 +57,15 @@ jobs:
- name: Build standalone executable
run: pyinstaller --log-level DEBUG .github/pyinstaller/pyinstaller.spec
- name: Does it run (PE)?
run: dist/capa -d "tests/data/Practical Malware Analysis Lab 01-01.dll_"
run: dist/capa "tests/data/Practical Malware Analysis Lab 01-01.dll_"
- name: Does it run (Shellcode)?
run: dist/capa -d "tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32"
run: dist/capa "tests/data/499c2a85f6e8142c3f48d4251c9c7cd6.raw32"
- name: Does it run (ELF)?
run: dist/capa -d "tests/data/7351f8a40c5450557b24622417fc478d.elf_"
run: dist/capa "tests/data/7351f8a40c5450557b24622417fc478d.elf_"
- name: Does it run (CAPE)?
run: |
7z e "tests/data/dynamic/cape/v2.2/d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json.gz"
dist/capa -d "d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json"
dist/capa "d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json"
- uses: actions/upload-artifact@0b7f8abb1508181956e8e162db84b466c27e18ce # v3.1.2
with:
name: ${{ matrix.asset_name }}

View File

@@ -3,35 +3,7 @@
## master (unreleased)
### New Features
### Breaking Changes
### New Rules (0)
-
### Bug Fixes
### capa explorer IDA Pro plugin
### Development
### Raw diffs
- [capa v7.0.0...master](https://github.com/mandiant/capa/compare/v7.0.0...master)
- [capa-rules v7.0.0...master](https://github.com/mandiant/capa-rules/compare/v7.0.0...master)
## v7.0.0
This is the v7.0.0 release of capa which was mainly worked on during the Google Summer of Code (GSoC) 2023. A huge
shoutout to our GSoC contributors @colton-gabertan and @yelhamer for their amazing work.
Also, a big thanks to the other contributors: @aaronatp, @Aayush-Goel-04, @bkojusner, @doomedraven, @ruppde, @larchchen, @JCoonradt, and @xusheng6.
### New Features
- add Ghidra backend #1770 #1767 @colton-gabertan @mike-hunhoff
- add Ghidra UI integration #1734 @colton-gabertan @mike-hunhoff
- add dynamic analysis via CAPE sandbox reports #48 #1535 @yelhamer
- add call scope #771 @yelhamer
- add thread scope #1517 @yelhamer
@@ -41,7 +13,6 @@ Also, a big thanks to the other contributors: @aaronatp, @Aayush-Goel-04, @bkoju
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6
- add com class/interface features #322 @Aayush-goel-04
- dotnet: emit enclosing class information for nested classes #1780 #1913 @bkojusner @mike-hunhoff
### Breaking Changes
@@ -50,11 +21,8 @@ Also, a big thanks to the other contributors: @aaronatp, @Aayush-Goel-04, @bkoju
- protobuf: deprecate `Metadata.analysis` in favor of `Metadata.analysis2` that is dynamic analysis aware @williballenthin
- update freeze format to v3, adding support for dynamic analysis @williballenthin
- extractor: ignore DLL name for api features #1815 @mr-tz
- main: introduce wrapping routines within main for working with CLI args #1813 @williballenthin
- move functions from `capa.main` to new `capa.loader` namespace #1821 @williballenthin
- proto: add `package` declaration #1960 @larchchen
### New Rules (41)
### New Rules (34)
- nursery/get-ntoskrnl-base-address @mr-tz
- host-interaction/network/connectivity/set-tcp-connection-state @johnk3r
@@ -89,53 +57,21 @@ Also, a big thanks to the other contributors: @aaronatp, @Aayush-Goel-04, @bkoju
- data-manipulation/compression/create-cabinet-on-windows michael.hunhoff@mandiant.com jakub.jozwiak@mandiant.com
- data-manipulation/compression/extract-cabinet-on-windows jakub.jozwiak@mandiant.com
- lib/create-file-decompression-interface-context-on-windows jakub.jozwiak@mandiant.com
- nursery/enumerate-files-in-dotnet moritz.raabe@mandiant.com anushka.virgaonkar@mandiant.com
- nursery/get-mac-address-in-dotnet moritz.raabe@mandiant.com michael.hunhoff@mandiant.com echernofsky@google.com
- nursery/get-current-process-command-line william.ballenthin@mandiant.com
- nursery/get-current-process-file-path william.ballenthin@mandiant.com
- nursery/hook-routines-via-dlsym-rtld_next william.ballenthin@mandiant.com
- nursery/linked-against-hp-socket still@teamt5.org
- host-interaction/process/inject/process-ghostly-hollowing sara.rincon@mandiant.com
-
### Bug Fixes
- ghidra: fix `ints_to_bytes` performance #1761 @mike-hunhoff
- binja: improve function call site detection @xusheng6
- binja: use `binaryninja.load` to open files @xusheng6
- binja: bump binja version to 3.5 #1789 @xusheng6
- elf: better detect ELF OS via GCC .ident directives #1928 @williballenthin
- elf: better detect ELF OS via Android dependencies #1947 @williballenthin
- fix setuptools package discovery #1886 @gmacon @mr-tz
- remove unnecessary scripts/vivisect-py2-vs-py3.sh file #1949 @JCoonradt
### capa explorer IDA Pro plugin
- various integration updates and minor bug fixes
### Development
- update ATT&CK/MBC data for linting #1932 @mr-tz
#### Developer Notes
With this new release, many classes and concepts have been split up into static (mostly identical to the
prior implementations) and dynamic ones. For example, the legacy FeatureExtractor class has been renamed to
StaticFeatureExtractor and the DynamicFeatureExtractor has been added.
Starting from version 7.0, we have moved the component responsible for feature extractor from main to a new
capabilities' module. Now, users wishing to utilize capas feature extraction abilities should use that module instead
of importing the relevant logic from the main file.
For sandbox-based feature extractors, we are using Pydantic models. Contributions of more models for other sandboxes
are very welcome!
With this release we've reorganized the logic found in `main()` to localize logic and ease readability and ease changes
and integrations. The new "main routines" are expected to be used only within main functions, either capa main or
related scripts. These functions should not be invoked from library code.
Beyond copying code around, we've refined the handling of the input file/format/backend. The logic for picking the
format and backend is more consistent. We've documented that the input file is not necessarily the sample itself
(cape/freeze/etc.) inputs are not actually the sample.
### Raw diffs
- [capa v6.1.0...v7.0.0](https://github.com/mandiant/capa/compare/v6.1.0...v7.0.0)
- [capa-rules v6.1.0...v7.0.0](https://github.com/mandiant/capa-rules/compare/v6.1.0...v7.0.0)
- [capa v6.1.0...master](https://github.com/mandiant/capa/compare/v6.1.0...master)
- [capa-rules v6.1.0...master](https://github.com/mandiant/capa-rules/compare/v6.1.0...master)
## v6.1.0
@@ -1690,4 +1626,4 @@ Download a standalone binary below and checkout the readme [here on GitHub](http
### Raw diffs
- [capa v1.0.0...v1.1.0](https://github.com/mandiant/capa/compare/v1.0.0...v1.1.0)
- [capa-rules v1.0.0...v1.1.0](https://github.com/mandiant/capa-rules/compare/v1.0.0...v1.1.0)
- [capa-rules v1.0.0...v1.1.0](https://github.com/mandiant/capa-rules/compare/v1.0.0...v1.1.0)

View File

@@ -2,7 +2,7 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/flare-capa)](https://pypi.org/project/flare-capa)
[![Last release](https://img.shields.io/github/v/release/mandiant/capa)](https://github.com/mandiant/capa/releases)
[![Number of rules](https://img.shields.io/badge/rules-866-blue.svg)](https://github.com/mandiant/capa-rules)
[![Number of rules](https://img.shields.io/badge/rules-859-blue.svg)](https://github.com/mandiant/capa-rules)
[![CI status](https://github.com/mandiant/capa/workflows/CI/badge.svg)](https://github.com/mandiant/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
[![Downloads](https://img.shields.io/github/downloads/mandiant/capa/total)](https://github.com/mandiant/capa/releases)
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)

BIN
assets/classes.json.gz Normal file

Binary file not shown.

BIN
assets/interfaces.json.gz Normal file

Binary file not shown.

View File

@@ -10,7 +10,8 @@ import abc
class Address(abc.ABC):
@abc.abstractmethod
def __eq__(self, other): ...
def __eq__(self, other):
...
@abc.abstractmethod
def __lt__(self, other):
@@ -176,6 +177,34 @@ class DNTokenOffsetAddress(Address):
return self.token + self.offset
class DexMethodAddress(int, Address):
def __new__(cls, offset: int):
return int.__new__(cls, offset)
def __repr__(self):
return f"DexMethodAddress(offset={hex(self)})"
def __str__(self) -> str:
return repr(self)
def __hash__(self):
return int.__hash__(self)
class DexClassAddress(int, Address):
def __new__(cls, offset: int):
return int.__new__(cls, offset)
def __repr__(self):
return f"DexClassAddress(offset={hex(self)})"
def __str__(self) -> str:
return repr(self)
def __hash__(self):
return int.__hash__(self)
class _NoAddress(Address):
def __eq__(self, other):
return True

View File

@@ -1,36 +0,0 @@
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from enum import Enum
from typing import Dict, List
from capa.helpers import assert_never
class ComType(Enum):
CLASS = "class"
INTERFACE = "interface"
COM_PREFIXES = {
ComType.CLASS: "CLSID_",
ComType.INTERFACE: "IID_",
}
def load_com_database(com_type: ComType) -> Dict[str, List[str]]:
# lazy load these python files since they are so large.
# that is, don't load them unless a COM feature is being handled.
import capa.features.com.classes
import capa.features.com.interfaces
if com_type == ComType.CLASS:
return capa.features.com.classes.COM_CLASSES
elif com_type == ComType.INTERFACE:
return capa.features.com.interfaces.COM_INTERFACES
else:
assert_never(com_type)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -409,7 +409,9 @@ ARCH_I386 = "i386"
ARCH_AMD64 = "amd64"
# dotnet
ARCH_ANY = "any"
VALID_ARCH = (ARCH_I386, ARCH_AMD64, ARCH_ANY)
# dex
ARCH_DALVIK = "dalvik"
VALID_ARCH = (ARCH_I386, ARCH_AMD64, ARCH_ANY, ARCH_DALVIK)
class Arch(Feature):
@@ -421,10 +423,11 @@ class Arch(Feature):
OS_WINDOWS = "windows"
OS_LINUX = "linux"
OS_MACOS = "macos"
OS_ANDROID = "android"
# dotnet
OS_ANY = "any"
VALID_OS = {os.value for os in capa.features.extractors.elf.OS}
VALID_OS.update({OS_WINDOWS, OS_LINUX, OS_MACOS, OS_ANY})
VALID_OS.update({OS_WINDOWS, OS_LINUX, OS_MACOS, OS_ANY, OS_ANDROID})
# internal only, not to be used in rules
OS_AUTO = "auto"
@@ -452,28 +455,26 @@ class OS(Feature):
FORMAT_PE = "pe"
FORMAT_ELF = "elf"
FORMAT_DOTNET = "dotnet"
VALID_FORMAT = (FORMAT_PE, FORMAT_ELF, FORMAT_DOTNET)
FORMAT_DEX = "dex"
VALID_FORMAT = (FORMAT_PE, FORMAT_ELF, FORMAT_DOTNET, FORMAT_DEX)
# internal only, not to be used in rules
FORMAT_AUTO = "auto"
FORMAT_SC32 = "sc32"
FORMAT_SC64 = "sc64"
FORMAT_CAPE = "cape"
FORMAT_FREEZE = "freeze"
FORMAT_RESULT = "result"
STATIC_FORMATS = {
FORMAT_SC32,
FORMAT_SC64,
FORMAT_PE,
FORMAT_ELF,
FORMAT_DOTNET,
FORMAT_FREEZE,
FORMAT_RESULT,
FORMAT_DEX,
}
DYNAMIC_FORMATS = {
FORMAT_CAPE,
FORMAT_FREEZE,
FORMAT_RESULT,
}
FORMAT_FREEZE = "freeze"
FORMAT_RESULT = "result"
FORMAT_UNKNOWN = "unknown"

View File

@@ -128,14 +128,6 @@ class CapeExtractor(DynamicFeatureExtractor):
if cr.info.version not in TESTED_VERSIONS:
logger.warning("CAPE version '%s' not tested/supported yet", cr.info.version)
# TODO(mr-tz): support more file types
# https://github.com/mandiant/capa/issues/1933
if "PE" not in cr.target.file.type:
logger.error(
"capa currently only supports PE target files, this target file's type is: '%s'.\nPlease report this at: https://github.com/mandiant/capa/issues/1933",
cr.target.file.type,
)
# observed in 2.4-CAPE reports from capesandbox.com
if cr.static is None and cr.target.file.pe is not None:
cr.static = Static()

View File

@@ -24,8 +24,11 @@ from capa.features.common import (
OS_AUTO,
ARCH_ANY,
FORMAT_PE,
FORMAT_DEX,
FORMAT_ELF,
OS_ANDROID,
OS_WINDOWS,
ARCH_DALVIK,
FORMAT_FREEZE,
FORMAT_RESULT,
Arch,
@@ -41,11 +44,12 @@ logger = logging.getLogger(__name__)
# match strings for formats
MATCH_PE = b"MZ"
MATCH_ELF = b"\x7fELF"
MATCH_DEX = b"dex\n"
MATCH_RESULT = b'{"meta":'
MATCH_JSON_OBJECT = b'{"'
def extract_file_strings(buf: bytes, **kwargs) -> Iterator[Tuple[String, Address]]:
def extract_file_strings(buf, **kwargs) -> Iterator[Tuple[String, Address]]:
"""
extract ASCII and UTF-16 LE strings from file
"""
@@ -56,11 +60,13 @@ def extract_file_strings(buf: bytes, **kwargs) -> Iterator[Tuple[String, Address
yield String(s.s), FileOffsetAddress(s.offset)
def extract_format(buf: bytes) -> Iterator[Tuple[Feature, Address]]:
def extract_format(buf) -> Iterator[Tuple[Feature, Address]]:
if buf.startswith(MATCH_PE):
yield Format(FORMAT_PE), NO_ADDRESS
elif buf.startswith(MATCH_ELF):
yield Format(FORMAT_ELF), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield Format(FORMAT_DEX), NO_ADDRESS
elif is_freeze(buf):
yield Format(FORMAT_FREEZE), NO_ADDRESS
elif buf.startswith(MATCH_RESULT):
@@ -96,6 +102,9 @@ def extract_arch(buf) -> Iterator[Tuple[Feature, Address]]:
yield Arch(arch), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield Arch(ARCH_DALVIK), NO_ADDRESS
else:
# we likely end up here:
# 1. handling shellcode, or
@@ -129,6 +138,9 @@ def extract_os(buf, os=OS_AUTO) -> Iterator[Tuple[Feature, Address]]:
yield OS(os), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield OS(OS_ANDROID), NO_ADDRESS
else:
# we likely end up here:
# 1. handling shellcode, or

View File

@@ -0,0 +1,421 @@
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import struct
import logging
from typing import Set, Dict, List, Tuple, Iterator, Optional, TypedDict
from pathlib import Path
from dataclasses import dataclass
import dexparser.disassembler as disassembler
from dexparser import DEXParser, uleb128_value
from capa.features.file import Import, FunctionName
from capa.features.common import (
OS,
FORMAT_DEX,
OS_ANDROID,
ARCH_DALVIK,
Arch,
Class,
Format,
String,
Feature,
Namespace,
)
from capa.features.address import NO_ADDRESS, Address, DexClassAddress, DexMethodAddress, FileOffsetAddress
from capa.features.extractors.base_extractor import (
BBHandle,
InsnHandle,
SampleHashes,
FunctionHandle,
StaticFeatureExtractor,
)
logger = logging.getLogger(__name__)
# Reference: https://source.android.com/docs/core/runtime/dex-format
class DexProtoId(TypedDict):
shorty_idx: int
return_type_idx: int
param_off: int
class DexMethodId(TypedDict):
class_idx: int
proto_idx: int
name_idx: int
@dataclass
class DexAnalyzedMethod:
class_type: str
name: str
shorty_descriptor: str
return_type: str
parameters: List[str]
id_offset: int = 0
code_offset: int = 0
access_flags: Optional[int] = None
@property
def address(self):
# NOTE: Some methods do not have code, in that case we use the method_id offset
if self.has_code:
return self.code_offset
else:
return self.id_offset
@property
def has_code(self):
# NOTE: code_offset is zero if the method is abstract/native or not defined in a class
return self.code_offset != 0
@property
def has_definition(self):
# NOTE: access_flags is only known if the method is defined in a class
return self.access_flags is not None
@property
def qualified_name(self):
return f"{self.class_type}::{self.name}"
class DexFieldId(TypedDict):
class_idx: int
type_idx: int
name_idx: int
class DexClassDef(TypedDict):
class_idx: int
access_flags: int
superclass_idx: int
interfaces_off: int
source_file_idx: int
annotations_off: int
class_data_off: int
static_values_off: int
class DexFieldDef(TypedDict):
diff: int
access_flags: int
class DexMethodDef(TypedDict):
diff: int
access_flags: int
code_off: int
class DexClassData(TypedDict):
static_fields: List[DexFieldDef]
instance_fields: List[DexFieldDef]
direct_methods: List[DexMethodDef]
virtual_methods: List[DexMethodDef]
@dataclass
class DexAnalyzedClass:
offset: int
class_type: str
superclass_type: str
interfaces: List[str]
source_file: str
data: Optional[DexClassData]
class DexAnnotation(TypedDict):
visibility: int
type_idx_diff: int
size_diff: int
name_idx_diff: int
value_type: int
encoded_value: int
class DexAnalysis:
def get_strings(self):
# NOTE: Copied from dexparser, upstream later
strings: List[Tuple[int, bytes]] = []
string_ids_off = self.dex.header_data["string_ids_off"]
for i in range(self.dex.header_data["string_ids_size"]):
offset = struct.unpack("<L", self.dex.data[string_ids_off + (i * 4) : string_ids_off + (i * 4) + 4])[0]
c_size, size_offset = uleb128_value(self.dex.data, offset)
c_char = self.dex.data[offset + size_offset : offset + size_offset + c_size]
strings.append((offset, c_char))
return strings
def __init__(self, dex: DEXParser):
self.dex = dex
self.strings = self.get_strings()
self.strings_utf8: List[str] = []
for _, data in self.strings:
# NOTE: This is technically incorrect
# Reference: https://source.android.com/devices/tech/dalvik/dex-format#mutf-8
self.strings_utf8.append(data.decode("utf-8", errors="backslashreplace"))
self.type_ids: List[int] = dex.get_typeids()
self.method_ids: List[DexMethodId] = dex.get_methods()
self.proto_ids: List[DexProtoId] = dex.get_protoids()
self.field_ids: List[DexFieldId] = dex.get_fieldids()
self.class_defs: List[DexClassDef] = dex.get_classdef_data()
self._is_analyzing = True
self.used_classes: Set[str] = set()
self.classes = self._analyze_classes()
self.methods = self._analyze_methods()
self.methods_by_address: Dict[int, DexAnalyzedMethod] = {m.address: m for m in self.methods}
self.namespaces: Set[str] = set()
for class_type in self.used_classes:
idx = class_type.rfind(".")
if idx != -1:
self.namespaces.add(class_type[:idx])
for class_type in self.classes:
self.used_classes.remove(class_type)
# Only available after code analysis
self._is_analyzing = False
def analyze_code(self):
# Loop over the classes and analyze them
# self.classes: List[DexClass] = self.dex.get_class_data(offset=-1)
# self.annotations: List[DexAnnotation] = dex.get_annotations(offset=-1)
# self.static_values: List[int] = dex.get_static_values(offset=-1)
pass
def get_string(self, index: int) -> str:
return self.strings_utf8[index]
def _decode_descriptor(self, descriptor: str) -> str:
first = descriptor[0]
if first == "L":
pretty = descriptor[1:-1].replace("/", ".")
if self._is_analyzing:
self.used_classes.add(pretty)
elif first == "[":
pretty = self._decode_descriptor(descriptor[1:]) + "[]"
else:
pretty = disassembler.type_descriptor[first]
return pretty
def get_pretty_type(self, index: int) -> str:
if index == 0xFFFFFFFF:
return "<NO_INDEX>"
descriptor = self.get_string(self.type_ids[index])
return self._decode_descriptor(descriptor)
def _analyze_classes(self):
classes: Dict[str, DexAnalyzedClass] = {}
offset = self.dex.header_data["class_defs_off"]
for index, clazz in enumerate(self.class_defs):
class_type = self.get_pretty_type(clazz["class_idx"])
# Superclass
superclass_idx = clazz["superclass_idx"]
if superclass_idx != 0xFFFFFFFF:
superclass_type = self.get_pretty_type(superclass_idx)
else:
superclass_type = ""
# Interfaces
interfaces = []
interfaces_offset = clazz["interfaces_off"]
if interfaces_offset != 0:
size = struct.unpack("<L", self.dex.data[interfaces_offset : interfaces_offset + 4])[0]
for i in range(size):
type_idx = struct.unpack(
"<H", self.dex.data[interfaces_offset + 4 + i * 2 : interfaces_offset + 6 + i * 2]
)[0]
interface_type = self.get_pretty_type(type_idx)
interfaces.append(interface_type)
# Source file
source_file_idx = clazz["source_file_idx"]
if source_file_idx != 0xFFFFFFFF:
source_file = self.get_string(source_file_idx)
else:
source_file = ""
# Data
data_offset = clazz["class_data_off"]
if data_offset != 0:
data = self.dex.get_class_data(data_offset)
else:
data = None
classes[class_type] = DexAnalyzedClass(
offset=offset + index * 32,
class_type=class_type,
superclass_type=superclass_type,
interfaces=interfaces,
source_file=source_file,
data=data,
)
return classes
def _analyze_methods(self):
methods: List[DexAnalyzedMethod] = []
for method_id in self.method_ids:
proto = self.proto_ids[method_id["proto_idx"]]
parameters = []
param_off = proto["param_off"]
if param_off != 0:
size = struct.unpack("<L", self.dex.data[param_off : param_off + 4])[0]
for i in range(size):
type_idx = struct.unpack("<H", self.dex.data[param_off + 4 + i * 2 : param_off + 6 + i * 2])[0]
param_type = self.get_pretty_type(type_idx)
parameters.append(param_type)
methods.append(
DexAnalyzedMethod(
class_type=self.get_pretty_type(method_id["class_idx"]),
name=self.get_string(method_id["name_idx"]),
shorty_descriptor=self.get_string(proto["shorty_idx"]),
return_type=self.get_pretty_type(proto["return_type_idx"]),
parameters=parameters,
)
)
# Fill in the missing method data
for clazz in self.classes.values():
if clazz.data is None:
continue
for method_def in clazz.data["direct_methods"]:
diff = method_def["diff"]
methods[diff].access_flags = method_def["access_flags"]
methods[diff].code_offset = method_def["code_off"]
for method_def in clazz.data["virtual_methods"]:
diff = method_def["diff"]
methods[diff].access_flags = method_def["access_flags"]
methods[diff].code_offset = method_def["code_off"]
# Fill in the missing code offsets with fake data
offset = self.dex.header_data["method_ids_off"]
for index, method in enumerate(methods):
method.id_offset = offset + index * 8
return methods
def extract_file_features(self) -> Iterator[Tuple[Feature, Address]]:
yield Format(FORMAT_DEX), NO_ADDRESS
for i in range(len(self.strings)):
yield String(self.strings_utf8[i]), FileOffsetAddress(self.strings[i][0])
for method in self.methods:
if method.has_definition:
yield FunctionName(method.qualified_name), DexMethodAddress(method.address)
else:
yield Import(method.qualified_name), DexMethodAddress(method.address)
for namespace in self.namespaces:
yield Namespace(namespace), NO_ADDRESS
for clazz in self.classes.values():
yield Class(clazz.class_type), DexClassAddress(clazz.offset)
for class_type in self.used_classes:
yield Class(class_type), NO_ADDRESS
class DexFeatureExtractor(StaticFeatureExtractor):
def __init__(self, path: Path, *, code_analysis: bool):
super().__init__(hashes=SampleHashes.from_bytes(path.read_bytes()))
self.path: Path = path
self.code_analysis = code_analysis
self.dex = DEXParser(filedir=str(path))
self.analysis = DexAnalysis(self.dex)
# Perform more expensive code analysis only when requested
if self.code_analysis:
self.analysis.analyze_code()
def todo(self):
import inspect
message = "[DexparserFeatureExtractor:TODO] " + inspect.stack()[1].function
logger.debug(message)
def get_base_address(self):
return NO_ADDRESS
def extract_global_features(self) -> Iterator[Tuple[Feature, Address]]:
# These are hardcoded global features
yield Format(FORMAT_DEX), NO_ADDRESS
yield OS(OS_ANDROID), NO_ADDRESS
yield Arch(ARCH_DALVIK), NO_ADDRESS
def extract_file_features(self) -> Iterator[Tuple[Feature, Address]]:
yield from self.analysis.extract_file_features()
def is_library_function(self, addr: Address) -> bool:
assert isinstance(addr, DexMethodAddress)
method = self.analysis.methods_by_address[addr]
# exclude androidx/kotlin stuff?
return not method.has_definition
def get_function_name(self, addr: Address) -> str:
assert isinstance(addr, DexMethodAddress)
method = self.analysis.methods_by_address[addr]
return method.qualified_name
def get_functions(self) -> Iterator[FunctionHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
for method in self.analysis.methods:
yield FunctionHandle(DexMethodAddress(method.address), method)
def extract_function_features(self, f: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
method: DexAnalyzedMethod = f.inner
if method.has_code:
return self.todo()
yield
def get_basic_blocks(self, f: FunctionHandle) -> Iterator[BBHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
method: DexAnalyzedMethod = f.inner
if method.has_code:
return self.todo()
yield
def extract_basic_block_features(self, f: FunctionHandle, bb: BBHandle) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield
def get_instructions(self, f: FunctionHandle, bb: BBHandle) -> Iterator[InsnHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield
def extract_insn_features(
self, f: FunctionHandle, bb: BBHandle, insn: InsnHandle
) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield

View File

@@ -131,14 +131,10 @@ def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnType]:
# remove get_/set_ from MemberRef name
member_ref_name = member_ref_name[4:]
typerefnamespace, typerefname = resolve_nested_typeref_name(
member_ref.Class.row_index, member_ref.Class.row, pe
)
yield DnType(
token,
typerefname,
namespace=typerefnamespace,
member_ref.Class.row.TypeName,
namespace=member_ref.Class.row.TypeNamespace,
member=member_ref_name,
access=access,
)
@@ -192,8 +188,6 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
TypeNamespace (index into String heap)
MethodList (index into MethodDef table; it marks the first of a contiguous run of Methods owned by this Type)
"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
accessor_map: Dict[int, str] = {}
for methoddef, methoddef_access in get_dotnet_methoddef_property_accessors(pe):
accessor_map[methoddef] = methoddef_access
@@ -217,9 +211,7 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
# remove get_/set_
method_name = method_name[4:]
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
yield DnType(token, typedefname, namespace=typedefnamespace, member=method_name, access=access)
yield DnType(token, typedef.TypeName, namespace=typedef.TypeNamespace, member=method_name, access=access)
def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
@@ -233,8 +225,6 @@ def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
TypeNamespace (index into String heap)
FieldList (index into Field table; it marks the first of a contiguous run of Fields owned by this Type)
"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
@@ -245,11 +235,8 @@ def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
if field.row is None:
logger.debug("TypeDef[0x%X] FieldList[0x%X] row is None", rid, idx)
continue
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
token: int = calculate_dotnet_token_value(field.table.number, field.row_index)
yield DnType(token, typedefname, namespace=typedefnamespace, member=field.row.Name)
yield DnType(token, typedef.TypeName, namespace=typedef.TypeNamespace, member=field.row.Name)
def get_dotnet_managed_method_bodies(pe: dnfile.dnPE) -> Iterator[Tuple[int, CilMethodBody]]:
@@ -313,119 +300,19 @@ def get_dotnet_unmanaged_imports(pe: dnfile.dnPE) -> Iterator[DnUnmanagedMethod]
yield DnUnmanagedMethod(token, module, method)
def get_dotnet_table_row(pe: dnfile.dnPE, table_index: int, row_index: int) -> Optional[dnfile.base.MDTableRow]:
assert pe.net is not None
assert pe.net.mdtables is not None
if row_index - 1 <= 0:
return None
try:
table = pe.net.mdtables.tables.get(table_index, [])
return table[row_index - 1]
except IndexError:
return None
def resolve_nested_typedef_name(
nested_class_table: dict, index: int, typedef: dnfile.mdtable.TypeDefRow, pe: dnfile.dnPE
) -> Tuple[str, Tuple[str, ...]]:
"""Resolves all nested TypeDef class names. Returns the namespace as a str and the nested TypeRef name as a tuple"""
if index in nested_class_table:
typedef_name = []
name = typedef.TypeName
# Append the current typedef name
typedef_name.append(name)
while nested_class_table[index] in nested_class_table:
# Iterate through the typedef table to resolve the nested name
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeDef.number, nested_class_table[index])
if table_row is None:
return typedef.TypeNamespace, tuple(typedef_name[::-1])
name = table_row.TypeName
typedef_name.append(name)
index = nested_class_table[index]
# Document the root enclosing details
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeDef.number, nested_class_table[index])
if table_row is None:
return typedef.TypeNamespace, tuple(typedef_name[::-1])
enclosing_name = table_row.TypeName
typedef_name.append(enclosing_name)
return table_row.TypeNamespace, tuple(typedef_name[::-1])
else:
return typedef.TypeNamespace, (typedef.TypeName,)
def resolve_nested_typeref_name(
index: int, typeref: dnfile.mdtable.TypeRefRow, pe: dnfile.dnPE
) -> Tuple[str, Tuple[str, ...]]:
"""Resolves all nested TypeRef class names. Returns the namespace as a str and the nested TypeRef name as a tuple"""
# If the ResolutionScope decodes to a typeRef type then it is nested
if isinstance(typeref.ResolutionScope.table, dnfile.mdtable.TypeRef):
typeref_name = []
name = typeref.TypeName
# Not appending the current typeref name to avoid potential duplicate
# Validate index
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeRef.number, index)
if table_row is None:
return typeref.TypeNamespace, (typeref.TypeName,)
while isinstance(table_row.ResolutionScope.table, dnfile.mdtable.TypeRef):
# Iterate through the typeref table to resolve the nested name
typeref_name.append(name)
name = table_row.TypeName
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeRef.number, table_row.ResolutionScope.row_index)
if table_row is None:
return typeref.TypeNamespace, tuple(typeref_name[::-1])
# Document the root enclosing details
typeref_name.append(table_row.TypeName)
return table_row.TypeNamespace, tuple(typeref_name[::-1])
else:
return typeref.TypeNamespace, (typeref.TypeName,)
def get_dotnet_nested_class_table_index(pe: dnfile.dnPE) -> Dict[int, int]:
"""Build index for EnclosingClass based off the NestedClass row index in the nestedclass table"""
nested_class_table = {}
# Used to find nested classes in typedef
for _, nestedclass in iter_dotnet_table(pe, dnfile.mdtable.NestedClass.number):
assert isinstance(nestedclass, dnfile.mdtable.NestedClassRow)
nested_class_table[nestedclass.NestedClass.row_index] = nestedclass.EnclosingClass.row_index
return nested_class_table
def get_dotnet_types(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get .NET types from TypeDef and TypeRef tables"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
typedef_token: int = calculate_dotnet_token_value(dnfile.mdtable.TypeDef.number, rid)
yield DnType(typedef_token, typedefname, namespace=typedefnamespace)
yield DnType(typedef_token, typedef.TypeName, namespace=typedef.TypeNamespace)
for rid, typeref in iter_dotnet_table(pe, dnfile.mdtable.TypeRef.number):
assert isinstance(typeref, dnfile.mdtable.TypeRefRow)
typerefnamespace, typerefname = resolve_nested_typeref_name(typeref.ResolutionScope.row_index, typeref, pe)
typeref_token: int = calculate_dotnet_token_value(dnfile.mdtable.TypeRef.number, rid)
yield DnType(typeref_token, typerefname, namespace=typerefnamespace)
yield DnType(typeref_token, typeref.TypeName, namespace=typeref.TypeNamespace)
def calculate_dotnet_token_value(table: int, rid: int) -> int:

View File

@@ -6,17 +6,15 @@
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from typing import Tuple, Optional
from typing import Optional
class DnType:
def __init__(
self, token: int, class_: Tuple[str, ...], namespace: str = "", member: str = "", access: Optional[str] = None
):
def __init__(self, token: int, class_: str, namespace: str = "", member: str = "", access: Optional[str] = None):
self.token: int = token
self.access: Optional[str] = access
self.namespace: str = namespace
self.class_: Tuple[str, ...] = class_
self.class_: str = class_
if member == ".ctor":
member = "ctor"
@@ -44,13 +42,9 @@ class DnType:
return str(self)
@staticmethod
def format_name(class_: Tuple[str, ...], namespace: str = "", member: str = ""):
if len(class_) > 1:
class_str = "/".join(class_) # Concat items in tuple, separated by a "/"
else:
class_str = "".join(class_) # Convert tuple to str
def format_name(class_: str, namespace: str = "", member: str = ""):
# like File::OpenRead
name: str = f"{class_str}::{member}" if member else class_str
name: str = f"{class_}::{member}" if member else class_
if namespace:
# like System.IO.File::OpenRead
name = f"{namespace}.{name}"

View File

@@ -38,11 +38,8 @@ from capa.features.extractors.dnfile.helpers import (
is_dotnet_mixed_mode,
get_dotnet_managed_imports,
get_dotnet_managed_methods,
resolve_nested_typedef_name,
resolve_nested_typeref_name,
calculate_dotnet_token_value,
get_dotnet_unmanaged_imports,
get_dotnet_nested_class_table_index,
)
logger = logging.getLogger(__name__)
@@ -95,25 +92,19 @@ def extract_file_namespace_features(pe: dnfile.dnPE, **kwargs) -> Iterator[Tuple
def extract_file_class_features(pe: dnfile.dnPE, **kwargs) -> Iterator[Tuple[Class, Address]]:
"""emit class features from TypeRef and TypeDef tables"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
# emit internal .NET classes
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
token = calculate_dotnet_token_value(dnfile.mdtable.TypeDef.number, rid)
yield Class(DnType.format_name(typedefname, namespace=typedefnamespace)), DNTokenAddress(token)
yield Class(DnType.format_name(typedef.TypeName, namespace=typedef.TypeNamespace)), DNTokenAddress(token)
for rid, typeref in iter_dotnet_table(pe, dnfile.mdtable.TypeRef.number):
# emit external .NET classes
assert isinstance(typeref, dnfile.mdtable.TypeRefRow)
typerefnamespace, typerefname = resolve_nested_typeref_name(typeref.ResolutionScope.row_index, typeref, pe)
token = calculate_dotnet_token_value(dnfile.mdtable.TypeRef.number, rid)
yield Class(DnType.format_name(typerefname, namespace=typerefnamespace)), DNTokenAddress(token)
yield Class(DnType.format_name(typeref.TypeName, namespace=typeref.TypeNamespace)), DNTokenAddress(token)
def extract_file_os(**kwargs) -> Iterator[Tuple[OS, Address]]:

View File

@@ -108,9 +108,6 @@ class Shdr:
buf,
)
def get_name(self, elf: "ELF") -> str:
return elf.shstrtab.buf[self.name :].partition(b"\x00")[0].decode("ascii")
class ELF:
def __init__(self, f: BinaryIO):
@@ -123,7 +120,6 @@ class ELF:
self.e_phnum: int
self.e_shentsize: int
self.e_shnum: int
self.e_shstrndx: int
self.phbuf: bytes
self.shbuf: bytes
@@ -155,15 +151,11 @@ class ELF:
if self.bitness == 32:
e_phoff, e_shoff = struct.unpack_from(self.endian + "II", self.file_header, 0x1C)
self.e_phentsize, self.e_phnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x2A)
self.e_shentsize, self.e_shnum, self.e_shstrndx = struct.unpack_from(
self.endian + "HHH", self.file_header, 0x2E
)
self.e_shentsize, self.e_shnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x2E)
elif self.bitness == 64:
e_phoff, e_shoff = struct.unpack_from(self.endian + "QQ", self.file_header, 0x20)
self.e_phentsize, self.e_phnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x36)
self.e_shentsize, self.e_shnum, self.e_shstrndx = struct.unpack_from(
self.endian + "HHH", self.file_header, 0x3A
)
self.e_shentsize, self.e_shnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x3A)
else:
raise NotImplementedError()
@@ -373,10 +365,6 @@ class ELF:
except ValueError:
continue
@property
def shstrtab(self) -> Shdr:
return self.parse_section_header(self.e_shstrndx)
@property
def linker(self):
PT_INTERP = 0x3
@@ -828,50 +816,6 @@ def guess_os_from_sh_notes(elf: ELF) -> Optional[OS]:
return None
def guess_os_from_ident_directive(elf: ELF) -> Optional[OS]:
# GCC inserts the GNU version via an .ident directive
# that gets stored in a section named ".comment".
# look at the version and recognize common OSes.
#
# assume the GCC version matches the target OS version,
# which I guess could be wrong during cross-compilation?
# therefore, don't rely on this if possible.
#
# https://stackoverflow.com/q/6263425
# https://gcc.gnu.org/onlinedocs/cpp/Other-Directives.html
SHT_PROGBITS = 0x1
for shdr in elf.section_headers:
if shdr.type != SHT_PROGBITS:
continue
if shdr.get_name(elf) != ".comment":
continue
try:
comment = shdr.buf.decode("utf-8")
except ValueError:
continue
if "GCC:" not in comment:
continue
logger.debug(".ident: %s", comment)
# these values come from our testfiles, like:
# rg -a "GCC: " tests/data/
if "Debian" in comment:
return OS.LINUX
elif "Ubuntu" in comment:
return OS.LINUX
elif "Red Hat" in comment:
return OS.LINUX
elif "Android" in comment:
return OS.ANDROID
return None
def guess_os_from_linker(elf: ELF) -> Optional[OS]:
# search for recognizable dynamic linkers (interpreters)
# for example, on linux, we see file paths like: /lib64/ld-linux-x86-64.so.2
@@ -907,10 +851,8 @@ def guess_os_from_abi_versions_needed(elf: ELF) -> Optional[OS]:
return OS.HURD
else:
# in practice, Hurd isn't a common/viable OS,
# so this is almost certain to be Linux,
# so lets just make that guess.
return OS.LINUX
# we don't have any good guesses based on versions needed
pass
return None
@@ -923,8 +865,6 @@ def guess_os_from_needed_dependencies(elf: ELF) -> Optional[OS]:
return OS.HURD
if needed.startswith("libandroid.so"):
return OS.ANDROID
if needed.startswith("liblog.so"):
return OS.ANDROID
return None
@@ -987,13 +927,6 @@ def detect_elf_os(f) -> str:
logger.warning("Error guessing OS from section header notes: %s", e)
sh_notes_guess = None
try:
ident_guess = guess_os_from_ident_directive(elf)
logger.debug("guess: .ident: %s", ident_guess)
except Exception as e:
logger.warning("Error guessing OS from .ident directive: %s", e)
ident_guess = None
try:
linker_guess = guess_os_from_linker(elf)
logger.debug("guess: linker: %s", linker_guess)
@@ -1045,11 +978,6 @@ def detect_elf_os(f) -> str:
elif symtab_guess:
ret = symtab_guess
elif ident_guess:
# at the bottom because we don't trust this too much
# due to potential for bugs with cross-compilation.
ret = ident_guess
return ret.value if ret is not None else "unknown"

View File

@@ -127,10 +127,8 @@ def extract_file_strings() -> Iterator[Tuple[Feature, Address]]:
"""extract ASCII and UTF-16 LE strings"""
for block in currentProgram().getMemory().getBlocks(): # type: ignore [name-defined] # noqa: F821
if not block.isInitialized():
continue
p_bytes = capa.features.extractors.ghidra.helpers.get_block_bytes(block)
if block.isInitialized():
p_bytes = capa.features.extractors.ghidra.helpers.get_block_bytes(block)
for s in capa.features.extractors.strings.extract_ascii_strings(p_bytes):
offset = block.getStart().getOffset() + s.offset

View File

@@ -275,27 +275,3 @@ def dereference_ptr(insn: ghidra.program.database.code.InstructionDB):
return addr
else:
return to_deref
def find_data_references_from_insn(insn, max_depth: int = 10):
"""yield data references from given instruction"""
for reference in insn.getReferencesFrom():
if not reference.getReferenceType().isData():
# only care about data references
continue
to_addr = reference.getToAddress()
for _ in range(max_depth - 1):
data = getDataAt(to_addr) # type: ignore [name-defined] # noqa: F821
if data and data.isPointer():
ptr_value = data.getValue()
if ptr_value is None:
break
to_addr = ptr_value
else:
break
yield to_addr

View File

@@ -23,9 +23,6 @@ from capa.features.extractors.base_extractor import BBHandle, InsnHandle, Functi
SECURITY_COOKIE_BYTES_DELTA = 0x40
OPERAND_TYPE_DYNAMIC_ADDRESS = OperandType.DYNAMIC | OperandType.ADDRESS
def get_imports(ctx: Dict[str, Any]) -> Dict[int, Any]:
"""Populate the import cache for this context"""
if "imports_cache" not in ctx:
@@ -85,7 +82,7 @@ def check_for_api_call(
if not capa.features.extractors.ghidra.helpers.check_addr_for_api(addr_ref, fakes, imports, externs):
return
ref = addr_ref.getOffset()
elif ref_type == OPERAND_TYPE_DYNAMIC_ADDRESS or ref_type == OperandType.DYNAMIC:
elif ref_type == OperandType.DYNAMIC | OperandType.ADDRESS or ref_type == OperandType.DYNAMIC:
return # cannot resolve dynamics statically
else:
# pure address does not need to get dereferenced/ handled
@@ -198,39 +195,46 @@ def extract_insn_offset_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandl
if insn.getMnemonicString().startswith("LEA"):
return
if capa.features.extractors.ghidra.helpers.is_stack_referenced(insn):
# ignore stack references
return
# Ghidra stores operands in 2D arrays if they contain offsets
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.DYNAMIC: # e.g. [esi + 4]
# manual extraction, since the default api calls only work on the 1st dimension of the array
op_objs = insn.getOpObjects(i)
if not op_objs:
continue
if isinstance(op_objs[-1], ghidra.program.model.scalar.Scalar):
op_off = op_objs[-1].getValue()
else:
op_off = 0
yield Offset(op_off), ih.address
yield OperandOffset(i, op_off), ih.address
# ignore any stack references
if not capa.features.extractors.ghidra.helpers.is_stack_referenced(insn):
# Ghidra stores operands in 2D arrays if they contain offsets
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.DYNAMIC: # e.g. [esi + 4]
# manual extraction, since the default api calls only work on the 1st dimension of the array
op_objs = insn.getOpObjects(i)
if isinstance(op_objs[-1], ghidra.program.model.scalar.Scalar):
op_off = op_objs[-1].getValue()
yield Offset(op_off), ih.address
yield OperandOffset(i, op_off), ih.address
else:
yield Offset(0), ih.address
yield OperandOffset(i, 0), ih.address
def extract_insn_bytes_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandle) -> Iterator[Tuple[Feature, Address]]:
"""
parse referenced byte sequences
example:
push offset iid_004118d4_IShellLinkA ; riid
"""
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
if data and not data.hasStringValue():
extracted_bytes = capa.features.extractors.ghidra.helpers.get_bytes(addr, MAX_BYTES_FEATURE_SIZE)
insn: ghidra.program.database.code.InstructionDB = ih.inner
if capa.features.extractors.ghidra.helpers.is_call_or_jmp(insn):
return
ref = insn.getAddress() # init to insn addr
for i in range(insn.getNumOperands()):
if OperandType.isAddress(insn.getOperandType(i)):
ref = insn.getAddress(i) # pulls pointer if there is one
if ref != insn.getAddress(): # bail out if there's no pointer
ghidra_dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if (
ghidra_dat and not ghidra_dat.hasStringValue() and not ghidra_dat.isPointer()
): # avoid if the data itself is a pointer
extracted_bytes = capa.features.extractors.ghidra.helpers.get_bytes(ref, MAX_BYTES_FEATURE_SIZE)
if extracted_bytes and not capa.features.extractors.helpers.all_zeros(extracted_bytes):
# don't extract byte features for obvious strings
yield Bytes(extracted_bytes), ih.address
@@ -241,10 +245,24 @@ def extract_insn_string_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandl
example:
push offset aAcr ; "ACR > "
"""
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
if data and data.hasStringValue():
yield String(data.getValue()), ih.address
insn: ghidra.program.database.code.InstructionDB = ih.inner
dyn_addr = OperandType.DYNAMIC | OperandType.ADDRESS
ref = insn.getAddress()
for i in range(insn.getNumOperands()):
if OperandType.isScalarAsAddress(insn.getOperandType(i)):
ref = insn.getAddress(i)
# strings are also referenced dynamically via pointers & arrays, so we need to deref them
if insn.getOperandType(i) == dyn_addr:
ref = insn.getAddress(i)
dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if dat and dat.isPointer():
ref = dat.getValue()
if ref != insn.getAddress():
ghidra_dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if ghidra_dat and ghidra_dat.hasStringValue():
yield String(ghidra_dat.getValue()), ih.address
def extract_insn_mnemonic_features(
@@ -341,7 +359,7 @@ def extract_insn_cross_section_cflow(
ref = capa.features.extractors.ghidra.helpers.dereference_ptr(insn)
if capa.features.extractors.ghidra.helpers.check_addr_for_api(ref, fakes, imports, externs):
return
elif ref_type == OPERAND_TYPE_DYNAMIC_ADDRESS or ref_type == OperandType.DYNAMIC:
elif ref_type == OperandType.DYNAMIC | OperandType.ADDRESS or ref_type == OperandType.DYNAMIC:
return # cannot resolve dynamics statically
else:
# pure address does not need to get dereferenced/ handled

View File

@@ -9,7 +9,6 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import json
import zlib
import logging
@@ -22,7 +21,6 @@ from pydantic import Field, BaseModel, ConfigDict
# https://github.com/mandiant/capa/issues/1699
from typing_extensions import TypeAlias
import capa.loader
import capa.helpers
import capa.version
import capa.features.file
@@ -55,6 +53,8 @@ class AddressType(str, Enum):
FILE = "file"
DN_TOKEN = "dn token"
DN_TOKEN_OFFSET = "dn token offset"
DEX_METHOD_INDEX = "dex method index"
DEX_CLASS_INDEX = "dex class index"
PROCESS = "process"
THREAD = "thread"
CALL = "call"
@@ -82,6 +82,12 @@ class Address(HashableModel):
elif isinstance(a, capa.features.address.DNTokenOffsetAddress):
return cls(type=AddressType.DN_TOKEN_OFFSET, value=(a.token, a.offset))
elif isinstance(a, capa.features.address.DexMethodAddress):
return cls(type=AddressType.DEX_METHOD_INDEX, value=int(a))
elif isinstance(a, capa.features.address.DexClassAddress):
return cls(type=AddressType.DEX_CLASS_INDEX, value=int(a))
elif isinstance(a, capa.features.address.ProcessAddress):
return cls(type=AddressType.PROCESS, value=(a.ppid, a.pid))
@@ -127,6 +133,14 @@ class Address(HashableModel):
assert isinstance(offset, int)
return capa.features.address.DNTokenOffsetAddress(token, offset)
elif self.type is AddressType.DEX_METHOD_INDEX:
assert isinstance(self.value, int)
return capa.features.address.DexMethodAddress(self.value)
elif self.type is AddressType.DEX_CLASS_INDEX:
assert isinstance(self.value, int)
return capa.features.address.DexClassAddress(self.value)
elif self.type is AddressType.PROCESS:
assert isinstance(self.value, tuple)
ppid, pid = self.value
@@ -683,18 +697,14 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="save capa features to a file")
capa.main.install_common_args(parser, {"input_file", "format", "backend", "os", "signatures"})
capa.main.install_common_args(parser, {"sample", "format", "backend", "os", "signatures"})
parser.add_argument("output", type=str, help="Path to output file")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
return e.status_code
sigpaths = capa.main.get_signatures(args.signatures)
extractor = capa.main.get_extractor(args.sample, args.format, args.os, args.backend, sigpaths, False)
Path(args.output).write_bytes(dump(extractor))

View File

@@ -2,46 +2,23 @@
<img src="/doc/img/ghidra_backend_logo.png" width=300 height=175>
</div>
The Ghidra feature extractor is an application of the FLARE team's open-source project, Ghidrathon, to integrate capa with Ghidra using Python 3. capa is a framework that uses a well-defined collection of rules to identify capabilities in a program. You can run capa against a PE file, ELF file, or shellcode and it tells you what it thinks the program can do. For example, it might suggest that the program is a backdoor, can install services, or relies on HTTP to communicate. The Ghidra feature extractor can be used to run capa analysis on your Ghidra databases without needing access to the original binary file. As a part of this integration, we've developed two scripts, [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py), to display capa results directly in Ghidra.
The Ghidra feature extractor is an application of the FLARE team's open-source project, Ghidrathon, to integrate capa with Ghidra using Python 3. capa is a framework that uses a well-defined collection of rules to identify capabilities in a program. You can run capa against a PE file, ELF file, or shellcode and it tells you what it thinks the program can do. For example, it might suggest that the program is a backdoor, can install services, or relies on HTTP to communicate. The Ghidra feature extractor can be used to run capa analysis on your Ghidra databases without needing access to the original binary file.
### Using `capa_explorer.py`
`capa_explorer.py` integrates capa results directly into Ghidra's UI. In the Symbol Tree Window, under the Namespaces section, you can find the matched rules as well as the corresponding functions that contain the matched features:
![image](https://github.com/mandiant/capa/assets/66766340/eeae33f4-99d4-42dc-a5e8-4c1b8c661492)
Labeled functions may be clicked in the Symbol Tree Window to navigate Ghidra's Disassembly Listing and Decompilation windows to the function locations. A comment listing each matched capa rule is inserted at the beginning of the function and a comment for each matched capa feature is added at the matched address within the function. These comments can be viewed using Ghidra's Disassembly Listing and Decompilation windows:
![image](https://github.com/mandiant/capa/assets/66766340/bb2b4170-7fd4-45fc-8c7b-ff8f2e2f101b)
The script also adds bookmarks for capa matches that are categorized under MITRE ATT&CK and Malware Behavior Catalog. These may be found and navigated using Ghidra's Bookmarks Window:
![image](https://github.com/mandiant/capa/assets/66766340/7f9a66a9-7be7-4223-91c6-4b8fc4651336)
### Using `capa_ghidra.py`
`capa_ghidra.py` displays capa results in Ghidra's Console window and can be executed using Ghidra's Headless Analyzer. The following is an example of running `capa_ghidra.py` using the Ghidra Script Manager:
Selecting capa rules:
<img src="/doc/img/ghidra_script_mngr_rules.png">
Choosing output format:
<img src="/doc/img/ghidra_script_mngr_verbosity.png">
Viewing results in Ghidra Console Window:
<img src="/doc/img/ghidra_script_mngr_output.png">
## Installation
## Getting Started
### Requirements
### Installation
| Tool | Version | Source |
Please ensure that you have the following dependencies installed before continuing:
| Dependency | Version | Source |
|------------|---------|--------|
| Ghidrathon | `>= 3.0.0` | https://github.com/mandiant/Ghidrathon/releases |
| Ghidra | `>= 10.3.2` | https://github.com/NationalSecurityAgency/ghidra/releases |
| Python | `>= 3.8.0` | https://www.python.org/downloads |
| Ghidrathon | `>= 3.0.0` | https://github.com/mandiant/Ghidrathon |
| Python | `>= 3.8` | https://www.python.org/downloads |
| Ghidra | `>= 10.2` | https://ghidra-sre.org |
You can run capa in Ghidra by completing the following steps using the Python 3 interpreter that you have configured for your Ghidrathon installation:
In order to run capa using using Ghidra, you must install capa as a library, obtain the official capa rules that match the capa version you have installed, and configure the Python 3 script [capa_ghidra.py](/capa/ghidra/capa_ghidra.py). You can do this by completing the following steps using the Python 3 interpreter that you have configured for your Ghidrathon installation:
1. Install capa and its dependencies from PyPI using the following command:
```bash
@@ -55,52 +32,63 @@ OR
$ capa --version
```
3. Copy [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) to your `$USER_HOME/ghidra_scripts` directory or manually add the absolute path of each script to the Ghidra Script Manager.
3. Copy [capa_ghidra.py](/capa/ghidra/capa_ghidra.py) to your `$USER_HOME/ghidra_scripts` directory or manually add `</path/to/ghidra_capa.py/>` to the Ghidra Script Manager.
## Usage
After completing the installation steps you can execute [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using the Ghidra Script Manager. You can also execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidra's Headless Analyzer.
After completing the installation steps you can execute `capa_ghidra.py` using the Ghidra Script Manager or Headless Analyzer.
### Ghidra Script Manager
Use the following steps to execute [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using Ghidra's Script Manager:
1. Open the Ghidra Script Manager by navigating to `Window > Script Manager`
2. Locate [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) by selecting the `Python 3 > capa` category or using the Ghidra Script Manager search functionality
3. Double-click [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) or [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) to execute the script
To execute `capa_ghidra.py` using the Ghidra Script Manager, first open the Ghidra Script Manager by navigating to `Window > Script Manager` in the Ghidra Code Browser. Next, locate `capa_ghidra.py` by selecting the `Python 3 > capa` category or using the Ghidra Script Manager search funtionality. Finally, double-click `capa_ghidra.py` to execute the script. If you don't see `capa_ghidra.py`, make sure you have copied the script to your `$USER_HOME/ghidra_scripts` directory or manually added `</path/to/ghidra_capa.py/>` to the Ghidra Script Manager
If you don't see [capa_explorer.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_explorer.py) and [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) make sure you have copied these scripts to your `$USER_HOME/ghidra_scripts` directory or manually added the absolute path of each script to the Ghidra Script Manager.
When executed, `capa_ghidra.py` asks you to provide your capa rules directory and preferred output format. `capa_ghidra.py` supports `default`, `verbose`, and `vverbose` output formats when executed from the Ghidra Script Manager. `capa_ghidra.py` writes output to the Ghidra Console Window.
Both scripts ask you to provide the path of your capa rules directory. [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) also asks you to select `default`, `verbose`, and `vverbose` output formats used when writing output to the Ghidra Console Window.
#### Example
The following is an example of running `capa_ghidra.py` using the Ghidra Script Manager:
Selecting capa rules:
<img src="/doc/img/ghidra_script_mngr_rules.png">
Choosing output format:
<img src="/doc/img/ghidra_script_mngr_verbosity.png">
Viewing results in Ghidra Console Window:
<img src="/doc/img/ghidra_script_mngr_output.png">
### Ghidra Headless Analyzer
To execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using the Ghidra Headless Analyzer, you can use the Ghidra `analyzeHeadless` script located in your `<ghidra_install_path>/support` directory. You will need to provide the following arguments to the Ghidra `analyzeHeadless` script:
To execute `capa_ghidra.py` using the Ghidra Headless Analyzer, you can use the Ghidra `analyzeHeadless` script located in your `$GHIDRA_HOME/support` directory. You will need to provide the following arguments to the Ghidra `analyzeHeadless` script:
1. `<ghidra_project_path>`: path to Ghidra project
1. `</path/to/ghidra/project/>`: path to Ghidra project
2. `<ghidra_project_name>`: name of Ghidra Project
3. `-process <sample_name>`: name of sample `<sample_name>`
4. `-ScriptPath <capa_ghidra_path>`: OPTIONAL argument specifying the absolute path of [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py)
5. `-PostScript capa_ghidra.py`: execute [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) as post-analysis script
6. `"<capa_args>"`: single, quoted string containing capa arguments that must specify capa rules directory and output format, e.g. `"<capa_rules_path> --verbose"`. [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) supports `default`, `verbose`, `vverbose` and `json` formats when executed using the Ghidra Headless Analyzer. [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) writes output to the console window used to execute the Ghidra `analyzeHeadless` script.
4. `-ScriptPath </path/to/capa_ghidra/>`: OPTIONAL argument specifying path `</path/to/capa_ghidra/>` to `capa_ghidra.py`
5. `-PostScript capa_ghidra.py`: executes `capa_ghidra.py` as post-analysis script
6. `"<capa_args>"`: single, quoted string containing capa arguments that must specify capa rules directory and output format, e.g. `"<path/to/capa/rules> --verbose"`. `capa_ghidra.py` supports `default`, `verbose`, `vverbose` and `json` formats when executed using the Ghidra Headless Analyzer. `capa_ghidra.py` writes output to the console window used to execute the Ghidra `analyzeHeadless` script.
7. `-processor <languageID>`: required ONLY if sample `<sample_name>` is shellcode. More information on specifying the `<languageID>` can be found in the `$GHIDRA_HOME/support/analyzeHeadlessREADME.html` documentation.
The following is an example of combining these arguments into a single `analyzeHeadless` script command:
```
<ghidra_install_path>/support/analyzeHeadless <ghidra_project_path> <ghidra_project_name> -process <sample_name> -PostScript capa_ghidra.py "<capa_rules_path> --verbose"
$GHIDRA_HOME/support/analyzeHeadless </path/to/ghidra/project/> <ghidra_project_name> -process <sample_name> -PostScript capa_ghidra.py "/path/to/capa/rules/ --verbose"
```
You may also want to run capa against a sample that you have not yet imported into your Ghidra project. The following is an example of importing a sample and running [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) using a single `analyzeHeadless` script command:
You may also want to run capa against a sample that you have not yet imported into your Ghidra project. The following is an example of importing a sample and running `capa_ghidra.py` using a single `analyzeHeadless` script command:
```
<ghidra_install_path>/support/analyzeHeadless <ghidra_project_path> <ghidra_project_name> -Import <sample_path> -PostScript capa_ghidra.py "<capa_rules_path> --verbose"
$GHIDRA_HOME/support/analyzeHeadless </path/to/ghidra/project/> <ghidra_project_name> -Import </path/to/sample> -PostScript capa_ghidra.py "/path/to/capa/rules/ --verbose"
```
You can also provide [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) the single argument `"help"` to view supported arguments when running the script using the Ghidra Headless Analyzer:
You can also provide `capa_ghidra.py` the single argument `"help"` to view supported arguments when running the script using the Ghidra Headless Analyzer:
```
<ghidra_install_path>/support/analyzeHeadless <ghidra_project_path> <ghidra_project_name> -process <sample_name> -PostScript capa_ghidra.py "help"
$GHIDRA_HOME/support/analyzeHeadless </path/to/ghidra/project/> <ghidra_project_name> -process <sample_name> -PostScript capa_ghidra.py "help"
```
The following is an example of running [capa_ghidra.py](https://raw.githubusercontent.com/mandiant/capa/master/capa/ghidra/capa_ghidra.py) against a shellcode sample using the Ghidra `analyzeHeadless` script:
#### Example
The following is an example of running `capa_ghidra.py` against a shellcode sample using the Ghidra `analyzeHeadless` script:
```
$ analyzeHeadless /home/wumbo/Desktop/ghidra_projects/ capa_test -process 499c2a85f6e8142c3f48d4251c9c7cd6.raw32 -processor x86:LE:32:default -PostScript capa_ghidra.py "/home/wumbo/capa/rules -vv"
[...]

View File

@@ -1,378 +0,0 @@
# Integrate capa results with Ghidra UI
# @author Colton Gabertan (gabertan.colton@gmail.com)
# @category Python 3.capa
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
import json
import logging
import pathlib
from typing import Any, Dict, List
from ghidra.app.cmd.label import AddLabelCmd, CreateNamespacesCmd
from ghidra.program.model.symbol import Namespace, SourceType, SymbolType
import capa
import capa.main
import capa.rules
import capa.render.json
import capa.ghidra.helpers
import capa.capabilities.common
import capa.features.extractors.ghidra.extractor
logger = logging.getLogger("capa_explorer")
def add_bookmark(addr, txt, category="CapaExplorer"):
"""create bookmark at addr"""
currentProgram().getBookmarkManager().setBookmark(addr, "Info", category, txt) # type: ignore [name-defined] # noqa: F821
def create_namespace(namespace_str):
"""create new Ghidra namespace for each capa namespace"""
cmd = CreateNamespacesCmd(namespace_str, SourceType.USER_DEFINED)
cmd.applyTo(currentProgram()) # type: ignore [name-defined] # noqa: F821
return cmd.getNamespace()
def create_label(ghidra_addr, name, capa_namespace):
"""custom label cmd to overlay symbols under capa-generated namespaces"""
# prevent duplicate labels under the same capa-generated namespace
symbol_table = currentProgram().getSymbolTable() # type: ignore [name-defined] # noqa: F821
for sym in symbol_table.getSymbols(ghidra_addr):
if sym.getName(True) == capa_namespace.getName(True) + Namespace.DELIMITER + name:
return
# create SymbolType.LABEL at addr
# prioritize capa-generated namespace (duplicate match @ new addr), else put under global Ghidra one (new match)
cmd = AddLabelCmd(ghidra_addr, name, True, SourceType.USER_DEFINED)
cmd.applyTo(currentProgram()) # type: ignore [name-defined] # noqa: F821
# assign new match overlay label to capa-generated namespace
cmd.getSymbol().setNamespace(capa_namespace)
return
class CapaMatchData:
def __init__(
self,
namespace,
scope,
capability,
matches,
attack: List[Dict[Any, Any]],
mbc: List[Dict[Any, Any]],
):
self.namespace = namespace
self.scope = scope
self.capability = capability
self.matches = matches
self.attack = attack
self.mbc = mbc
def bookmark_functions(self):
"""create bookmarks for MITRE ATT&CK & MBC mappings"""
if self.attack == [] and self.mbc == []:
return
for key in self.matches.keys():
addr = toAddr(hex(key)) # type: ignore [name-defined] # noqa: F821
func = getFunctionContaining(addr) # type: ignore [name-defined] # noqa: F821
# bookmark & tag MITRE ATT&CK tactics & MBC @ function scope
if func is not None:
func_addr = func.getEntryPoint()
if self.attack != []:
for item in self.attack:
attack_txt = ""
for part in item.get("parts", {}):
attack_txt = attack_txt + part + Namespace.DELIMITER
attack_txt = attack_txt + item.get("id", {})
add_bookmark(func_addr, attack_txt, "CapaExplorer::MITRE ATT&CK")
if self.mbc != []:
for item in self.mbc:
mbc_txt = ""
for part in item.get("parts", {}):
mbc_txt = mbc_txt + part + Namespace.DELIMITER
mbc_txt = mbc_txt + item.get("id", {})
add_bookmark(func_addr, mbc_txt, "CapaExplorer::MBC")
def set_plate_comment(self, ghidra_addr):
"""set plate comments at matched functions"""
comment = getPlateComment(ghidra_addr) # type: ignore [name-defined] # noqa: F821
rule_path = self.namespace.replace(Namespace.DELIMITER, "/")
# 2 calls to avoid duplicate comments via subsequent script runs
if comment is None:
# first comment @ function
comment = rule_path + "\n"
setPlateComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
elif rule_path not in comment:
comment = comment + rule_path + "\n"
setPlateComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
else:
return
def set_pre_comment(self, ghidra_addr, sub_type, description):
"""set pre comments at subscoped matches of main rules"""
comment = getPreComment(ghidra_addr) # type: ignore [name-defined] # noqa: F821
if comment is None:
comment = "capa: " + sub_type + "(" + description + ")" + ' matched in "' + self.capability + '"\n'
setPreComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
elif self.capability not in comment:
comment = (
comment + "capa: " + sub_type + "(" + description + ")" + ' matched in "' + self.capability + '"\n'
)
setPreComment(ghidra_addr, comment) # type: ignore [name-defined] # noqa: F821
else:
return
def label_matches(self):
"""label findings at function scopes and comment on subscope matches"""
capa_namespace = create_namespace(self.namespace)
symbol_table = currentProgram().getSymbolTable() # type: ignore [name-defined] # noqa: F821
# handle function main scope of matched rule
# these will typically contain further matches within
if self.scope == "function":
for addr in self.matches.keys():
ghidra_addr = toAddr(hex(addr)) # type: ignore [name-defined] # noqa: F821
# classify new function label under capa-generated namespace
sym = symbol_table.getPrimarySymbol(ghidra_addr)
if sym is not None:
if sym.getSymbolType() == SymbolType.FUNCTION:
create_label(ghidra_addr, sym.getName(), capa_namespace)
self.set_plate_comment(ghidra_addr)
# parse the corresponding nodes, and pre-comment subscope matched features
# under the encompassing function(s)
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = toAddr(hex(loc)) # type: ignore [name-defined] # noqa: F821
if sub_ghidra_addr == ghidra_addr:
# skip duplicates
continue
# precomment subscope matches under the function
if node != {}:
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# resolve the encompassing function for the capa namespace
# of non-function scoped main matches
for addr in self.matches.keys():
ghidra_addr = toAddr(hex(addr)) # type: ignore [name-defined] # noqa: F821
# basic block / insn scoped main matches
# Ex. See "Create Process on Windows" Rule
func = getFunctionContaining(ghidra_addr) # type: ignore [name-defined] # noqa: F821
if func is not None:
func_addr = func.getEntryPoint()
create_label(func_addr, func.getName(), capa_namespace)
self.set_plate_comment(func_addr)
# create subscope match precomments
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = toAddr(hex(loc)) # type: ignore [name-defined] # noqa: F821
if node != {}:
if func is not None:
# basic block/ insn scope under resolved function
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
for sub_type, description in parse_node(node):
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
def get_capabilities():
rules_dir: str = ""
try:
selected_dir = askDirectory("Choose capa rules directory", "Ok") # type: ignore [name-defined] # noqa: F821
if selected_dir:
rules_dir = selected_dir.getPath()
except RuntimeError:
# RuntimeError thrown when user selects "Cancel"
pass
if not rules_dir:
logger.info("You must choose a capa rules directory before running capa.")
return "" # return empty str to avoid handling both int and str types
rules_path: pathlib.Path = pathlib.Path(rules_dir)
logger.info("running capa using rules from %s", str(rules_path))
rules = capa.rules.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
capabilities, counts = capa.capabilities.common.find_capabilities(rules, extractor, True)
if capa.capabilities.common.has_file_limitation(rules, capabilities, is_standalone=False):
popup("capa explorer encountered warnings during analysis. Please check the console output for more information.") # type: ignore [name-defined] # noqa: F821
logger.info("capa encountered warnings during analysis")
return capa.render.json.render(meta, rules, capabilities)
def get_locations(match_dict):
"""recursively collect match addresses and associated nodes"""
for loc in match_dict.get("locations", {}):
# either an rva (absolute)
# or an offset into a file (file)
if loc.get("type", "") in ("absolute", "file"):
yield loc.get("value"), match_dict.get("node")
for child in match_dict.get("children", {}):
yield from get_locations(child)
def parse_node(node_data):
"""pull match descriptions and sub features by parsing node dicts"""
node = node_data.get(node_data.get("type"))
if "description" in node:
yield "description", node.get("description")
data = node.get(node.get("type"))
if isinstance(data, (str, int)):
feat_type = node.get("type")
if isinstance(data, int):
data = hex(data)
yield feat_type, data
def parse_json(capa_data):
"""Parse json produced by capa"""
for rule, capability in capa_data.get("rules", {}).items():
# structure to contain rule match address & supporting feature data
# {rule match addr:[{feature addr:{node_data}}]}
rule_matches: Dict[Any, List[Any]] = {}
for i in range(len(capability.get("matches"))):
# grab rule match location
match_loc = capability.get("matches")[i][0].get("value")
if match_loc is None:
# Ex. See "Reference Base64 string"
# {'type':'no address'}
match_loc = i
rule_matches[match_loc] = []
# grab extracted feature locations & corresponding node data
# feature[0]: location
# feature[1]: node
features = capability.get("matches")[i][1]
feat_dict = {}
for feature in get_locations(features):
feat_dict[feature[0]] = feature[1]
rule_matches[match_loc].append(feat_dict)
# dict data of currently matched rule
meta = capability["meta"]
# get MITRE ATT&CK and MBC
attack = meta.get("attack")
if attack is None:
attack = []
mbc = meta.get("mbc")
if mbc is None:
mbc = []
# scope match for the rule
scope = meta["scopes"].get("static")
fmt_rule = Namespace.DELIMITER + rule.replace(" ", "-")
if "namespace" in meta:
# split into list to help define child namespaces
# this requires the correct delimiter used by Ghidra
# Ex. 'communication/named-pipe/create/create pipe' -> capa::communication::named-pipe::create::create-pipe
namespace_str = Namespace.DELIMITER.join(meta["namespace"].split("/"))
namespace = "capa" + Namespace.DELIMITER + namespace_str + fmt_rule
else:
# lib rules via the official rules repo will not contain data
# for the "namespaces" key, so format using rule itself
# Ex. 'contain loop' -> capa::lib::contain-loop
namespace = "capa" + Namespace.DELIMITER + "lib" + fmt_rule
yield CapaMatchData(namespace, scope, rule, rule_matches, attack, mbc)
def main():
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
if isRunningHeadless(): # type: ignore [name-defined] # noqa: F821
logger.error("unsupported Ghidra execution mode")
return capa.main.E_UNSUPPORTED_GHIDRA_EXECUTION_MODE
if not capa.ghidra.helpers.is_supported_ghidra_version():
logger.error("unsupported Ghidra version")
return capa.main.E_UNSUPPORTED_GHIDRA_VERSION
if not capa.ghidra.helpers.is_supported_file_type():
logger.error("unsupported file type")
return capa.main.E_INVALID_FILE_TYPE
if not capa.ghidra.helpers.is_supported_arch_type():
logger.error("unsupported file architecture")
return capa.main.E_INVALID_FILE_ARCH
# capa_data will always contain {'meta':..., 'rules':...}
# if the 'rules' key contains no values, then there were no matches
capa_data = json.loads(get_capabilities())
if capa_data.get("rules") is None:
logger.info("capa explorer found no matches")
popup("capa explorer found no matches.") # type: ignore [name-defined] # noqa: F821
return capa.main.E_EMPTY_REPORT
for item in parse_json(capa_data):
item.bookmark_functions()
item.label_matches()
logger.info("capa explorer analysis complete")
popup("capa explorer analysis complete.\nPlease see results in the Bookmarks Window and Namespaces section of the Symbol Tree Window.") # type: ignore [name-defined] # noqa: F821
return 0
if __name__ == "__main__":
if sys.version_info < (3, 8):
from capa.exceptions import UnsupportedRuntimeError
raise UnsupportedRuntimeError("This version of capa can only be used with Python 3.8+")
exit_code = main()
if exit_code != 0:
popup("capa explorer encountered errors during analysis. Please check the console output for more information.") # type: ignore [name-defined] # noqa: F821
sys.exit(exit_code)

View File

@@ -69,7 +69,7 @@ def run_headless():
rules_path = pathlib.Path(args.rules)
logger.debug("rule path: %s", rules_path)
rules = capa.rules.get_rules([rules_path])
rules = capa.main.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
@@ -78,7 +78,7 @@ def run_headless():
meta.analysis.feature_counts = counts["feature_counts"]
meta.analysis.library_functions = counts["library_functions"]
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities)
meta.analysis.layout = capa.main.compute_layout(rules, extractor, capabilities)
if capa.capabilities.common.has_file_limitation(rules, capabilities, is_standalone=True):
logger.info("capa encountered warnings during analysis")
@@ -119,7 +119,7 @@ def run_ui():
rules_path: pathlib.Path = pathlib.Path(rules_dir)
logger.info("running capa using rules from %s", str(rules_path))
rules = capa.rules.get_rules([rules_path])
rules = capa.main.get_rules([rules_path])
meta = capa.ghidra.helpers.collect_metadata([rules_path])
extractor = capa.features.extractors.ghidra.extractor.GhidraFeatureExtractor()
@@ -128,7 +128,7 @@ def run_ui():
meta.analysis.feature_counts = counts["feature_counts"]
meta.analysis.library_functions = counts["library_functions"]
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities)
meta.analysis.layout = capa.main.compute_layout(rules, extractor, capabilities)
if capa.capabilities.common.has_file_limitation(rules, capabilities, is_standalone=False):
logger.info("capa encountered warnings during analysis")

View File

@@ -5,7 +5,6 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
import json
import inspect
import logging
@@ -17,22 +16,12 @@ from pathlib import Path
import tqdm
from capa.exceptions import UnsupportedFormatError
from capa.features.common import (
FORMAT_PE,
FORMAT_CAPE,
FORMAT_SC32,
FORMAT_SC64,
FORMAT_DOTNET,
FORMAT_FREEZE,
FORMAT_UNKNOWN,
Format,
)
from capa.features.common import FORMAT_PE, FORMAT_CAPE, FORMAT_SC32, FORMAT_SC64, FORMAT_DOTNET, FORMAT_UNKNOWN, Format
EXTENSIONS_SHELLCODE_32 = ("sc32", "raw32")
EXTENSIONS_SHELLCODE_64 = ("sc64", "raw64")
EXTENSIONS_DYNAMIC = ("json", "json_")
EXTENSIONS_ELF = "elf_"
EXTENSIONS_FREEZE = "frz"
logger = logging.getLogger("capa")
@@ -92,8 +81,6 @@ def get_format_from_extension(sample: Path) -> str:
format_ = FORMAT_SC64
elif sample.name.endswith(EXTENSIONS_DYNAMIC):
format_ = get_format_from_report(sample)
elif sample.name.endswith(EXTENSIONS_FREEZE):
format_ = FORMAT_FREEZE
return format_
@@ -169,7 +156,7 @@ def log_unsupported_format_error():
def log_unsupported_cape_report_error(error: str):
logger.error("-" * 80)
logger.error(" Input file is not a valid CAPE report: %s", error)
logger.error("Input file is not a valid CAPE report: %s", error)
logger.error(" ")
logger.error(" capa currently only supports analyzing standard CAPE reports in JSON format.")
logger.error(
@@ -214,16 +201,3 @@ def log_unsupported_runtime_error():
" If you're seeing this message on the command line, please ensure you're running a supported Python version."
)
logger.error("-" * 80)
def is_running_standalone() -> bool:
"""
are we running from a PyInstaller'd executable?
if so, then we'll be able to access `sys._MEIPASS` for the packaged resources.
"""
# typically we only expect capa.main to be packaged via PyInstaller.
# therefore, this *should* be in capa.main; however,
# the Binary Ninja extractor uses this to resolve the BN API code,
# so we keep this in a common area.
# generally, other library code should not use this function.
return hasattr(sys, "frozen") and hasattr(sys, "_MEIPASS")

View File

@@ -636,7 +636,7 @@ class CapaExplorerForm(idaapi.PluginForm):
if ida_kernwin.user_cancelled():
raise UserCancelledError("user cancelled")
return capa.rules.get_rules([rule_path], on_load_rule=on_load_rule)
return capa.main.get_rules([rule_path], on_load_rule=on_load_rule)
except UserCancelledError:
logger.info("User cancelled analysis.")
return None
@@ -775,7 +775,7 @@ class CapaExplorerForm(idaapi.PluginForm):
meta.analysis.feature_counts = counts["feature_counts"]
meta.analysis.library_functions = counts["library_functions"]
meta.analysis.layout = capa.loader.compute_layout(ruleset, self.feature_extractor, capabilities)
meta.analysis.layout = capa.main.compute_layout(ruleset, self.feature_extractor, capabilities)
except UserCancelledError:
logger.info("User cancelled analysis.")
return False
@@ -932,9 +932,9 @@ class CapaExplorerForm(idaapi.PluginForm):
update_wait_box("verifying cached results")
try:
results: Optional[capa.render.result_document.ResultDocument] = (
capa.ida.helpers.load_and_verify_cached_results()
)
results: Optional[
capa.render.result_document.ResultDocument
] = capa.ida.helpers.load_and_verify_cached_results()
except Exception as e:
capa.ida.helpers.inform_user_ida_ui("Failed to verify cached results, reanalyzing program")
logger.exception("Failed to verify cached results (error: %s)", e)
@@ -1073,7 +1073,9 @@ class CapaExplorerForm(idaapi.PluginForm):
self.view_rulegen_features.load_features(all_file_features, all_function_features)
self.set_view_status_label(f"capa rules: {settings.user[CAPA_SETTINGS_RULE_PATH]}")
self.set_view_status_label(
f"capa rules: {settings.user[CAPA_SETTINGS_RULE_PATH]} ({settings.user[CAPA_SETTINGS_RULE_PATH]} rules)"
)
except Exception as e:
logger.exception("Failed to render views (error: %s)", e)
return False
@@ -1322,17 +1324,10 @@ class CapaExplorerForm(idaapi.PluginForm):
idaapi.info("No rule to save.")
return
rule_file_path = self.ask_user_capa_rule_file()
if not rule_file_path:
# dialog canceled
path = Path(self.ask_user_capa_rule_file())
if not path.exists():
return
path = Path(rule_file_path)
if not path.parent.exists():
logger.warning("Failed to save file: parent directory '%s' does not exist.", path.parent)
return
logger.info("Saving rule to %s.", path)
write_file(path, s)
def slot_checkbox_limit_by_changed(self, state):

View File

@@ -194,17 +194,13 @@ class CapaExplorerRulegenPreview(QtWidgets.QTextEdit):
" namespace: <insert_namespace>",
" authors:",
f" - {author}",
" scopes:",
f" static: {scope}",
" dynamic: unspecified",
f" scope: {scope}",
" references:",
" - <insert_references>",
" examples:",
(
f" - {capa.ida.helpers.get_file_md5().upper()}:{hex(ea)}"
if ea
else f" - {capa.ida.helpers.get_file_md5().upper()}"
),
f" - {capa.ida.helpers.get_file_md5().upper()}:{hex(ea)}"
if ea
else f" - {capa.ida.helpers.get_file_md5().upper()}",
" features:",
]
self.setText("\n".join(metadata_default))

View File

@@ -1,544 +0,0 @@
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
import json
import logging
import datetime
from typing import Set, Dict, List, Optional
from pathlib import Path
import halo
from typing_extensions import assert_never
import capa.perf
import capa.rules
import capa.engine
import capa.helpers
import capa.version
import capa.render.json
import capa.rules.cache
import capa.render.default
import capa.render.verbose
import capa.features.common
import capa.features.freeze as frz
import capa.render.vverbose
import capa.features.extractors
import capa.render.result_document
import capa.render.result_document as rdoc
import capa.features.extractors.common
import capa.features.extractors.pefile
import capa.features.extractors.elffile
import capa.features.extractors.dotnetfile
import capa.features.extractors.base_extractor
import capa.features.extractors.cape.extractor
from capa.rules import RuleSet
from capa.engine import MatchResults
from capa.exceptions import UnsupportedOSError, UnsupportedArchError, UnsupportedFormatError
from capa.features.common import (
OS_AUTO,
FORMAT_PE,
FORMAT_ELF,
FORMAT_AUTO,
FORMAT_CAPE,
FORMAT_SC32,
FORMAT_SC64,
FORMAT_DOTNET,
)
from capa.features.address import Address
from capa.features.extractors.base_extractor import (
SampleHashes,
FeatureExtractor,
StaticFeatureExtractor,
DynamicFeatureExtractor,
)
logger = logging.getLogger(__name__)
BACKEND_VIV = "vivisect"
BACKEND_DOTNET = "dotnet"
BACKEND_BINJA = "binja"
BACKEND_PEFILE = "pefile"
BACKEND_CAPE = "cape"
BACKEND_FREEZE = "freeze"
def is_supported_format(sample: Path) -> bool:
"""
Return if this is a supported file based on magic header values
"""
taste = sample.open("rb").read(0x100)
return len(list(capa.features.extractors.common.extract_format(taste))) == 1
def is_supported_arch(sample: Path) -> bool:
buf = sample.read_bytes()
return len(list(capa.features.extractors.common.extract_arch(buf))) == 1
def get_arch(sample: Path) -> str:
buf = sample.read_bytes()
for feature, _ in capa.features.extractors.common.extract_arch(buf):
assert isinstance(feature.value, str)
return feature.value
return "unknown"
def is_supported_os(sample: Path) -> bool:
buf = sample.read_bytes()
return len(list(capa.features.extractors.common.extract_os(buf))) == 1
def get_os(sample: Path) -> str:
buf = sample.read_bytes()
for feature, _ in capa.features.extractors.common.extract_os(buf):
assert isinstance(feature.value, str)
return feature.value
return "unknown"
def get_meta_str(vw):
"""
Return workspace meta information string
"""
meta = []
for k in ["Format", "Platform", "Architecture"]:
if k in vw.metadata:
meta.append(f"{k.lower()}: {vw.metadata[k]}")
return f"{', '.join(meta)}, number of functions: {len(vw.getFunctions())}"
def get_workspace(path: Path, input_format: str, sigpaths: List[Path]):
"""
load the program at the given path into a vivisect workspace using the given format.
also apply the given FLIRT signatures.
supported formats:
- pe
- elf
- shellcode 32-bit
- shellcode 64-bit
- auto
this creates and analyzes the workspace; however, it does *not* save the workspace.
this is the responsibility of the caller.
"""
# lazy import enables us to not require viv if user wants another backend.
import viv_utils
import viv_utils.flirt
logger.debug("generating vivisect workspace for: %s", path)
if input_format == FORMAT_AUTO:
if not is_supported_format(path):
raise UnsupportedFormatError()
# don't analyze, so that we can add our Flirt function analyzer first.
vw = viv_utils.getWorkspace(str(path), analyze=False, should_save=False)
elif input_format in {FORMAT_PE, FORMAT_ELF}:
vw = viv_utils.getWorkspace(str(path), analyze=False, should_save=False)
elif input_format == FORMAT_SC32:
# these are not analyzed nor saved.
vw = viv_utils.getShellcodeWorkspaceFromFile(str(path), arch="i386", analyze=False)
elif input_format == FORMAT_SC64:
vw = viv_utils.getShellcodeWorkspaceFromFile(str(path), arch="amd64", analyze=False)
else:
raise ValueError("unexpected format: " + input_format)
viv_utils.flirt.register_flirt_signature_analyzers(vw, [str(s) for s in sigpaths])
vw.analyze()
logger.debug("%s", get_meta_str(vw))
return vw
def get_extractor(
input_path: Path,
input_format: str,
os_: str,
backend: str,
sigpaths: List[Path],
should_save_workspace=False,
disable_progress=False,
sample_path: Optional[Path] = None,
) -> FeatureExtractor:
"""
raises:
UnsupportedFormatError
UnsupportedArchError
UnsupportedOSError
"""
if backend == BACKEND_CAPE:
import capa.features.extractors.cape.extractor
report = json.loads(input_path.read_text(encoding="utf-8"))
return capa.features.extractors.cape.extractor.CapeExtractor.from_report(report)
elif backend == BACKEND_DOTNET:
import capa.features.extractors.dnfile.extractor
if input_format not in (FORMAT_PE, FORMAT_DOTNET):
raise UnsupportedFormatError()
return capa.features.extractors.dnfile.extractor.DnfileFeatureExtractor(input_path)
elif backend == BACKEND_BINJA:
import capa.helpers
from capa.features.extractors.binja.find_binja_api import find_binja_path
# When we are running as a standalone executable, we cannot directly import binaryninja
# We need to fist find the binja API installation path and add it into sys.path
if capa.helpers.is_running_standalone():
bn_api = find_binja_path()
if bn_api.exists():
sys.path.append(str(bn_api))
try:
import binaryninja
from binaryninja import BinaryView
except ImportError:
raise RuntimeError(
"Cannot import binaryninja module. Please install the Binary Ninja Python API first: "
+ "https://docs.binary.ninja/dev/batch.html#install-the-api)."
)
import capa.features.extractors.binja.extractor
if input_format not in (FORMAT_SC32, FORMAT_SC64):
if not is_supported_format(input_path):
raise UnsupportedFormatError()
if not is_supported_arch(input_path):
raise UnsupportedArchError()
if os_ == OS_AUTO and not is_supported_os(input_path):
raise UnsupportedOSError()
with halo.Halo(text="analyzing program", spinner="simpleDots", stream=sys.stderr, enabled=not disable_progress):
bv: BinaryView = binaryninja.load(str(input_path))
if bv is None:
raise RuntimeError(f"Binary Ninja cannot open file {input_path}")
return capa.features.extractors.binja.extractor.BinjaFeatureExtractor(bv)
elif backend == BACKEND_PEFILE:
import capa.features.extractors.pefile
return capa.features.extractors.pefile.PefileFeatureExtractor(input_path)
elif backend == BACKEND_VIV:
import capa.features.extractors.viv.extractor
if input_format not in (FORMAT_SC32, FORMAT_SC64):
if not is_supported_format(input_path):
raise UnsupportedFormatError()
if not is_supported_arch(input_path):
raise UnsupportedArchError()
if os_ == OS_AUTO and not is_supported_os(input_path):
raise UnsupportedOSError()
with halo.Halo(text="analyzing program", spinner="simpleDots", stream=sys.stderr, enabled=not disable_progress):
vw = get_workspace(input_path, input_format, sigpaths)
if should_save_workspace:
logger.debug("saving workspace")
try:
vw.saveWorkspace()
except IOError:
# see #168 for discussion around how to handle non-writable directories
logger.info("source directory is not writable, won't save intermediate workspace")
else:
logger.debug("CAPA_SAVE_WORKSPACE unset, not saving workspace")
return capa.features.extractors.viv.extractor.VivisectFeatureExtractor(vw, input_path, os_)
elif backend == BACKEND_FREEZE:
return frz.load(input_path.read_bytes())
else:
raise ValueError("unexpected backend: " + backend)
def get_file_extractors(input_file: Path, input_format: str) -> List[FeatureExtractor]:
file_extractors: List[FeatureExtractor] = []
if input_format == FORMAT_PE:
file_extractors.append(capa.features.extractors.pefile.PefileFeatureExtractor(input_file))
elif input_format == FORMAT_DOTNET:
file_extractors.append(capa.features.extractors.pefile.PefileFeatureExtractor(input_file))
file_extractors.append(capa.features.extractors.dotnetfile.DotnetFileFeatureExtractor(input_file))
elif input_format == FORMAT_ELF:
file_extractors.append(capa.features.extractors.elffile.ElfFeatureExtractor(input_file))
elif input_format == FORMAT_CAPE:
report = json.loads(input_file.read_text(encoding="utf-8"))
file_extractors.append(capa.features.extractors.cape.extractor.CapeExtractor.from_report(report))
return file_extractors
def get_signatures(sigs_path: Path) -> List[Path]:
if not sigs_path.exists():
raise IOError(f"signatures path {sigs_path} does not exist or cannot be accessed")
paths: List[Path] = []
if sigs_path.is_file():
paths.append(sigs_path)
elif sigs_path.is_dir():
logger.debug("reading signatures from directory %s", sigs_path.resolve())
for file in sigs_path.rglob("*"):
if file.is_file() and file.suffix.lower() in (".pat", ".pat.gz", ".sig"):
paths.append(file)
# Convert paths to their absolute and normalized forms
paths = [path.resolve().absolute() for path in paths]
# load signatures in deterministic order: the alphabetic sorting of filename.
# this means that `0_sigs.pat` loads before `1_sigs.pat`.
paths = sorted(paths, key=lambda path: path.name)
for path in paths:
logger.debug("found signature file: %s", path)
return paths
def get_sample_analysis(format_, arch, os_, extractor, rules_path, counts):
if isinstance(extractor, StaticFeatureExtractor):
return rdoc.StaticAnalysis(
format=format_,
arch=arch,
os=os_,
extractor=extractor.__class__.__name__,
rules=tuple(rules_path),
base_address=frz.Address.from_capa(extractor.get_base_address()),
layout=rdoc.StaticLayout(
functions=(),
# this is updated after capabilities have been collected.
# will look like:
#
# "functions": { 0x401000: { "matched_basic_blocks": [ 0x401000, 0x401005, ... ] }, ... }
),
feature_counts=counts["feature_counts"],
library_functions=counts["library_functions"],
)
elif isinstance(extractor, DynamicFeatureExtractor):
return rdoc.DynamicAnalysis(
format=format_,
arch=arch,
os=os_,
extractor=extractor.__class__.__name__,
rules=tuple(rules_path),
layout=rdoc.DynamicLayout(
processes=(),
),
feature_counts=counts["feature_counts"],
)
else:
raise ValueError("invalid extractor type")
def collect_metadata(
argv: List[str],
input_path: Path,
input_format: str,
os_: str,
rules_path: List[Path],
extractor: FeatureExtractor,
counts: dict,
) -> rdoc.Metadata:
# if it's a binary sample we hash it, if it's a report
# we fetch the hashes from the report
sample_hashes: SampleHashes = extractor.get_sample_hashes()
md5, sha1, sha256 = sample_hashes.md5, sample_hashes.sha1, sample_hashes.sha256
global_feats = list(extractor.extract_global_features())
extractor_format = [f.value for (f, _) in global_feats if isinstance(f, capa.features.common.Format)]
extractor_arch = [f.value for (f, _) in global_feats if isinstance(f, capa.features.common.Arch)]
extractor_os = [f.value for (f, _) in global_feats if isinstance(f, capa.features.common.OS)]
input_format = (
str(extractor_format[0]) if extractor_format else "unknown" if input_format == FORMAT_AUTO else input_format
)
arch = str(extractor_arch[0]) if extractor_arch else "unknown"
os_ = str(extractor_os[0]) if extractor_os else "unknown" if os_ == OS_AUTO else os_
if isinstance(extractor, StaticFeatureExtractor):
meta_class: type = rdoc.StaticMetadata
elif isinstance(extractor, DynamicFeatureExtractor):
meta_class = rdoc.DynamicMetadata
else:
assert_never(extractor)
rules = tuple(r.resolve().absolute().as_posix() for r in rules_path)
return meta_class(
timestamp=datetime.datetime.now(),
version=capa.version.__version__,
argv=tuple(argv) if argv else None,
sample=rdoc.Sample(
md5=md5,
sha1=sha1,
sha256=sha256,
path=input_path.resolve().as_posix(),
),
analysis=get_sample_analysis(
input_format,
arch,
os_,
extractor,
rules,
counts,
),
)
def compute_dynamic_layout(
rules: RuleSet, extractor: DynamicFeatureExtractor, capabilities: MatchResults
) -> rdoc.DynamicLayout:
"""
compute a metadata structure that links threads
to the processes in which they're found.
only collect the threads at which some rule matched.
otherwise, we may pollute the json document with
a large amount of un-referenced data.
"""
assert isinstance(extractor, DynamicFeatureExtractor)
matched_calls: Set[Address] = set()
def result_rec(result: capa.features.common.Result):
for loc in result.locations:
if isinstance(loc, capa.features.address.DynamicCallAddress):
matched_calls.add(loc)
for child in result.children:
result_rec(child)
for matches in capabilities.values():
for _, result in matches:
result_rec(result)
names_by_process: Dict[Address, str] = {}
names_by_call: Dict[Address, str] = {}
matched_processes: Set[Address] = set()
matched_threads: Set[Address] = set()
threads_by_process: Dict[Address, List[Address]] = {}
calls_by_thread: Dict[Address, List[Address]] = {}
for p in extractor.get_processes():
threads_by_process[p.address] = []
for t in extractor.get_threads(p):
calls_by_thread[t.address] = []
for c in extractor.get_calls(p, t):
if c.address in matched_calls:
names_by_call[c.address] = extractor.get_call_name(p, t, c)
calls_by_thread[t.address].append(c.address)
if calls_by_thread[t.address]:
matched_threads.add(t.address)
threads_by_process[p.address].append(t.address)
if threads_by_process[p.address]:
matched_processes.add(p.address)
names_by_process[p.address] = extractor.get_process_name(p)
layout = rdoc.DynamicLayout(
processes=tuple(
rdoc.ProcessLayout(
address=frz.Address.from_capa(p),
name=names_by_process[p],
matched_threads=tuple(
rdoc.ThreadLayout(
address=frz.Address.from_capa(t),
matched_calls=tuple(
rdoc.CallLayout(
address=frz.Address.from_capa(c),
name=names_by_call[c],
)
for c in calls_by_thread[t]
if c in matched_calls
),
)
for t in threads
if t in matched_threads
), # this object is open to extension in the future,
# such as with the function name, etc.
)
for p, threads in threads_by_process.items()
if p in matched_processes
)
)
return layout
def compute_static_layout(rules: RuleSet, extractor: StaticFeatureExtractor, capabilities) -> rdoc.StaticLayout:
"""
compute a metadata structure that links basic blocks
to the functions in which they're found.
only collect the basic blocks at which some rule matched.
otherwise, we may pollute the json document with
a large amount of un-referenced data.
"""
functions_by_bb: Dict[Address, Address] = {}
bbs_by_function: Dict[Address, List[Address]] = {}
for f in extractor.get_functions():
bbs_by_function[f.address] = []
for bb in extractor.get_basic_blocks(f):
functions_by_bb[bb.address] = f.address
bbs_by_function[f.address].append(bb.address)
matched_bbs = set()
for rule_name, matches in capabilities.items():
rule = rules[rule_name]
if capa.rules.Scope.BASIC_BLOCK in rule.scopes:
for addr, _ in matches:
assert addr in functions_by_bb
matched_bbs.add(addr)
layout = rdoc.StaticLayout(
functions=tuple(
rdoc.FunctionLayout(
address=frz.Address.from_capa(f),
matched_basic_blocks=tuple(
rdoc.BasicBlockLayout(address=frz.Address.from_capa(bb)) for bb in bbs if bb in matched_bbs
), # this object is open to extension in the future,
# such as with the function name, etc.
)
for f, bbs in bbs_by_function.items()
if len([bb for bb in bbs if bb in matched_bbs]) > 0
)
)
return layout
def compute_layout(rules: RuleSet, extractor, capabilities) -> rdoc.Layout:
if isinstance(extractor, StaticFeatureExtractor):
return compute_static_layout(rules, extractor, capabilities)
elif isinstance(extractor, DynamicFeatureExtractor):
return compute_dynamic_layout(rules, extractor, capabilities)
else:
raise ValueError("extractor must be either a static or dynamic extracotr")

File diff suppressed because it is too large Load Diff

View File

@@ -33,7 +33,7 @@ def render_meta(doc: rd.ResultDocument, ostream: StringIO):
(width("md5", 22), width(doc.meta.sample.md5, 82)),
("sha1", doc.meta.sample.sha1),
("sha256", doc.meta.sample.sha256),
("analysis", doc.meta.flavor.value),
("analysis", doc.meta.flavor),
("os", doc.meta.analysis.os),
("format", doc.meta.analysis.format),
("arch", doc.meta.analysis.arch),

View File

@@ -1,7 +1,5 @@
syntax = "proto3";
package mandiant.capa;
message APIFeature {
string type = 1;
string api = 2;

File diff suppressed because one or more lines are too long

View File

@@ -160,7 +160,8 @@ class CompoundStatementType:
OPTIONAL = "optional"
class StatementModel(FrozenModel): ...
class StatementModel(FrozenModel):
...
class CompoundStatement(StatementModel):
@@ -649,9 +650,9 @@ class ResultDocument(FrozenModel):
return ResultDocument(meta=meta, rules=rule_matches)
def to_capa(self) -> Tuple[Metadata, Dict]:
capabilities: Dict[str, List[Tuple[capa.features.address.Address, capa.features.common.Result]]] = (
collections.defaultdict(list)
)
capabilities: Dict[
str, List[Tuple[capa.features.address.Address, capa.features.common.Result]]
] = collections.defaultdict(list)
# this doesn't quite work because we don't have the rule source for rules that aren't matched.
rules_by_name = {

View File

@@ -22,7 +22,6 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
from typing import cast
import tabulate
@@ -55,6 +54,12 @@ def format_address(address: frz.Address) -> str:
assert isinstance(token, int)
assert isinstance(offset, int)
return f"token({capa.helpers.hex(token)})+{capa.helpers.hex(offset)}"
elif address.type == frz.AddressType.DEX_METHOD_INDEX:
assert isinstance(address.value, int)
return f"method({capa.helpers.hex(address.value)})"
elif address.type == frz.AddressType.DEX_CLASS_INDEX:
assert isinstance(address.value, int)
return f"class({capa.helpers.hex(address.value)})"
elif address.type == frz.AddressType.PROCESS:
assert isinstance(address.value, tuple)
ppid, pid = address.value

View File

@@ -7,8 +7,9 @@
# See the License for the specific language governing permissions and limitations under the License.
import io
import os
import re
import gzip
import json
import uuid
import codecs
import logging
@@ -26,7 +27,7 @@ except ImportError:
# https://github.com/python/mypy/issues/1153
from backports.functools_lru_cache import lru_cache # type: ignore
from typing import Any, Set, Dict, List, Tuple, Union, Callable, Iterator, Optional
from typing import Any, Set, Dict, List, Tuple, Union, Iterator, Optional
from dataclasses import asdict, dataclass
import yaml
@@ -38,13 +39,11 @@ import capa.perf
import capa.engine as ceng
import capa.features
import capa.optimizer
import capa.features.com
import capa.features.file
import capa.features.insn
import capa.features.common
import capa.features.basicblock
from capa.engine import Statement, FeatureSet
from capa.features.com import ComType
from capa.features.common import MAX_BYTES_FEATURE_SIZE, Feature
from capa.features.address import Address
@@ -329,16 +328,42 @@ def ensure_feature_valid_for_scopes(scopes: Scopes, feature: Union[Feature, Stat
raise InvalidRule(f"feature {feature} not supported for scopes {scopes}")
def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Statement:
com_db = capa.features.com.load_com_database(com_type)
guids: Optional[List[str]] = com_db.get(com_name)
if not guids:
class ComType(Enum):
CLASS = "class"
INTERFACE = "interface"
# COM data source https://github.com/stevemk14ebr/COM-Code-Helper/tree/master
VALID_COM_TYPES = {
ComType.CLASS: {"db_path": "assets/classes.json.gz", "prefix": "CLSID_"},
ComType.INTERFACE: {"db_path": "assets/interfaces.json.gz", "prefix": "IID_"},
}
@lru_cache(maxsize=None)
def load_com_database(com_type: ComType) -> Dict[str, List[str]]:
com_db_path: Path = capa.main.get_default_root() / VALID_COM_TYPES[com_type]["db_path"]
if not com_db_path.exists():
raise IOError(f"COM database path '{com_db_path}' does not exist or cannot be accessed")
try:
with gzip.open(com_db_path, "rb") as gzfile:
return json.loads(gzfile.read().decode("utf-8"))
except Exception as e:
raise IOError(f"Error loading COM database from '{com_db_path}'") from e
def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Or:
com_db = load_com_database(com_type)
guid_strings: Optional[List[str]] = com_db.get(com_name)
if guid_strings is None or len(guid_strings) == 0:
logger.error(" %s doesn't exist in COM %s database", com_name, com_type)
raise InvalidRule(f"'{com_name}' doesn't exist in COM {com_type} database")
com_features: List[Feature] = []
for guid in guids:
hex_chars = guid.replace("-", "")
com_features: List = []
for guid_string in guid_strings:
hex_chars = guid_string.replace("-", "")
h = [hex_chars[i : i + 2] for i in range(0, len(hex_chars), 2)]
reordered_hex_pairs = [
h[3],
@@ -359,10 +384,9 @@ def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Statement:
h[15],
]
guid_bytes = bytes.fromhex("".join(reordered_hex_pairs))
prefix = capa.features.com.COM_PREFIXES[com_type]
symbol = prefix + com_name
com_features.append(capa.features.common.String(guid, f"{symbol} as GUID string"))
com_features.append(capa.features.common.Bytes(guid_bytes, f"{symbol} as bytes"))
prefix = VALID_COM_TYPES[com_type]["prefix"]
com_features.append(capa.features.common.StringFactory(guid_string, f"{prefix+com_name} as GUID string"))
com_features.append(capa.features.common.Bytes(guid_bytes, f"{prefix+com_name} as bytes"))
return ceng.Or(com_features)
@@ -578,9 +602,7 @@ def trim_dll_part(api: str) -> str:
# kernel32.CreateFileA
if api.count(".") == 1:
if "::" not in api:
# skip System.Convert::FromBase64String
api = api.split(".")[1]
api = api.split(".")[1]
return api
@@ -800,13 +822,11 @@ def build_statements(d, scopes: Scopes):
return feature
elif key.startswith("com/"):
com_type_name = str(key[len("com/") :])
try:
com_type = ComType(com_type_name)
except ValueError:
raise InvalidRule(f"unexpected COM type: {com_type_name}")
com_type = str(key[len("com/") :]).upper()
if com_type not in [item.name for item in ComType]:
raise InvalidRule(f"unexpected COM type: {com_type}")
value, description = parse_description(d[key], key, d.get("description"))
return translate_com_feature(value, com_type)
return translate_com_feature(value, ComType[com_type])
else:
Feature = parse_feature(key)
@@ -1692,105 +1712,3 @@ class RuleSet:
matches.update(hard_matches)
return (features3, matches)
def is_nursery_rule_path(path: Path) -> bool:
"""
The nursery is a spot for rules that have not yet been fully polished.
For example, they may not have references to public example of a technique.
Yet, we still want to capture and report on their matches.
The nursery is currently a subdirectory of the rules directory with that name.
When nursery rules are loaded, their metadata section should be updated with:
`nursery=True`.
"""
return "nursery" in path.parts
def collect_rule_file_paths(rule_paths: List[Path]) -> List[Path]:
"""
collect all rule file paths, including those in subdirectories.
"""
rule_file_paths = []
for rule_path in rule_paths:
if not rule_path.exists():
raise IOError(f"rule path {rule_path} does not exist or cannot be accessed")
if rule_path.is_file():
rule_file_paths.append(rule_path)
elif rule_path.is_dir():
logger.debug("reading rules from directory %s", rule_path)
for root, _, files in os.walk(rule_path):
if ".git" in root:
# the .github directory contains CI config in capa-rules
# this includes some .yml files
# these are not rules
# additionally, .git has files that are not .yml and generate the warning
# skip those too
continue
for file in files:
if not file.endswith(".yml"):
if not (file.startswith(".git") or file.endswith((".git", ".md", ".txt"))):
# expect to see .git* files, readme.md, format.md, and maybe a .git directory
# other things maybe are rules, but are mis-named.
logger.warning("skipping non-.yml file: %s", file)
continue
rule_file_paths.append(Path(root) / file)
return rule_file_paths
# TypeAlias. note: using `foo: TypeAlias = bar` is Python 3.10+
RulePath = Path
def on_load_rule_default(_path: RulePath, i: int, _total: int) -> None:
return
def get_rules(
rule_paths: List[RulePath],
cache_dir=None,
on_load_rule: Callable[[RulePath, int, int], None] = on_load_rule_default,
) -> RuleSet:
"""
args:
rule_paths: list of paths to rules files or directories containing rules files
cache_dir: directory to use for caching rules, or will use the default detected cache directory if None
on_load_rule: callback to invoke before a rule is loaded, use for progress or cancellation
"""
if cache_dir is None:
cache_dir = capa.rules.cache.get_default_cache_directory()
# rule_paths may contain directory paths,
# so search for file paths recursively.
rule_file_paths = collect_rule_file_paths(rule_paths)
# this list is parallel to `rule_file_paths`:
# rule_file_paths[i] corresponds to rule_contents[i].
rule_contents = [file_path.read_bytes() for file_path in rule_file_paths]
ruleset = capa.rules.cache.load_cached_ruleset(cache_dir, rule_contents)
if ruleset is not None:
return ruleset
rules: List[Rule] = []
total_rule_count = len(rule_file_paths)
for i, (path, content) in enumerate(zip(rule_file_paths, rule_contents)):
on_load_rule(path, i, total_rule_count)
try:
rule = capa.rules.Rule.from_yaml(content.decode("utf-8"))
except capa.rules.InvalidRule:
raise
else:
rule.meta["capa/path"] = path.as_posix()
rule.meta["capa/nursery"] = is_nursery_rule_path(path)
rules.append(rule)
logger.debug("loaded rule: '%s' with scope: %s", rule.name, rule.scopes)
ruleset = capa.rules.RuleSet(rules)
capa.rules.cache.cache_ruleset(cache_dir, ruleset)
return ruleset

View File

@@ -5,7 +5,7 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
__version__ = "7.0.0"
__version__ = "6.1.0"
def get_major_version():

View File

@@ -36,8 +36,8 @@ dependencies = [
"pyyaml==6.0.1",
"tabulate==0.9.0",
"colorama==0.4.6",
"termcolor==2.4.0",
"wcwidth==0.2.13",
"termcolor==2.3.0",
"wcwidth==0.2.12",
"ida-settings==2.1.0",
"viv-utils[flirt]==0.7.9",
"halo==0.0.31",
@@ -50,25 +50,25 @@ dependencies = [
"dncil==1.0.2",
"pydantic==2.4.0",
"protobuf==4.23.4",
"dexparser==1.2.0",
]
dynamic = ["version"]
[tool.setuptools.dynamic]
version = {attr = "capa.version.__version__"}
[tool.setuptools.packages.find]
include = ["capa*"]
namespaces = false
[tool.setuptools]
packages = ["capa"]
[project.optional-dependencies]
dev = [
"pre-commit==3.5.0",
"pytest==8.0.0",
"pytest==7.4.3",
"pytest-sugar==0.9.7",
"pytest-instafail==0.5.0",
"pytest-cov==4.1.0",
"flake8==7.0.0",
"flake8-bugbear==24.1.17",
"flake8==6.1.0",
"flake8-bugbear==23.11.26",
"flake8-encodings==0.5.1",
"flake8-comprehensions==3.14.0",
"flake8-logging-format==0.9.0",
@@ -78,10 +78,10 @@ dev = [
"flake8-simplify==0.21.0",
"flake8-use-pathlib==0.3.0",
"flake8-copyright==0.2.4",
"ruff==0.1.14",
"black==24.1.1",
"isort==5.13.2",
"mypy==1.8.0",
"ruff==0.1.6",
"black==23.11.0",
"isort==5.11.4",
"mypy==1.7.1",
"psutil==5.9.2",
"stix2==3.0.1",
"requests==2.31.0",
@@ -90,15 +90,15 @@ dev = [
"types-backports==0.1.3",
"types-colorama==0.4.15.11",
"types-PyYAML==6.0.8",
"types-tabulate==0.9.0.20240106",
"types-tabulate==0.9.0.3",
"types-termcolor==1.1.4",
"types-psutil==5.8.23",
"types_requests==2.31.0.20240125",
"types_requests==2.31.0.10",
"types-protobuf==4.23.0.3",
]
build = [
"pyinstaller==6.3.0",
"setuptools==69.0.3",
"pyinstaller==6.2.0",
"setuptools==69.0.2",
"build==1.0.3"
]

2
rules

Submodule rules updated: 48dfd001d8...57b3911a72

View File

@@ -36,7 +36,7 @@ example:
usage:
usage: bulk-process.py [-h] [-r RULES] [-d] [-q] [-n PARALLELISM] [--no-mp]
input_directory
input
detect capabilities in programs.
@@ -62,6 +62,7 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import os
import sys
import json
import logging
@@ -73,10 +74,10 @@ from pathlib import Path
import capa
import capa.main
import capa.rules
import capa.loader
import capa.render.json
import capa.capabilities.common
import capa.render.result_document as rd
from capa.features.common import OS_AUTO
logger = logging.getLogger("capa")
@@ -86,8 +87,11 @@ def get_capa_results(args):
run capa against the file at the given path, using the given rules.
args is a tuple, containing:
rules, signatures, format, backend, os, input_file
as provided via the CLI arguments.
rules (capa.rules.RuleSet): the rules to match
signatures (List[str]): list of file system paths to signature files
format (str): the name of the sample file format
os (str): the name of the operating system
path (str): the file system path to the sample to process
args is a tuple because i'm not quite sure how to unpack multiple arguments using `map`.
@@ -102,58 +106,44 @@ def get_capa_results(args):
meta (dict): the meta analysis results
capabilities (dict): the matched capabilities and their result objects
"""
rules, signatures, format_, backend, os_, input_file = args
parser = argparse.ArgumentParser(description="detect capabilities in programs.")
capa.main.install_common_args(parser, wanted={"rules", "signatures", "format", "os", "backend", "input_file"})
argv = [
"--signatures",
signatures,
"--format",
format_,
"--backend",
backend,
"--os",
os_,
input_file,
]
if rules:
argv += ["--rules", rules]
args = parser.parse_args(args=argv)
rules, sigpaths, format, os_, path = args
should_save_workspace = os.environ.get("CAPA_SAVE_WORKSPACE") not in ("0", "no", "NO", "n", None)
logger.info("computing capa results for: %s", path)
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
rules = capa.main.get_rules_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
sample_path = capa.main.get_sample_path_from_cli(args, backend)
if sample_path is None:
os_ = "unknown"
else:
os_ = capa.loader.get_os(sample_path)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
# i'm not 100% sure if multiprocessing will reliably raise exceptions across process boundaries.
extractor = capa.main.get_extractor(
path, format, os_, capa.main.BACKEND_VIV, sigpaths, should_save_workspace, disable_progress=True
)
except capa.exceptions.UnsupportedFormatError:
# i'm 100% sure if multiprocessing will reliably raise exceptions across process boundaries.
# so instead, return an object with explicit success/failure status.
#
# if success, then status=ok, and results found in property "ok"
# if error, then status=error, and human readable message in property "error"
return {"path": input_file, "status": "error", "error": str(e), "status_code": e.status_code}
return {
"path": path,
"status": "error",
"error": f"input file does not appear to be a PE file: {path}",
}
except capa.exceptions.UnsupportedRuntimeError:
return {
"path": path,
"status": "error",
"error": "unsupported runtime or Python interpreter",
}
except Exception as e:
return {
"path": input_file,
"path": path,
"status": "error",
"error": f"unexpected error: {e}",
}
capabilities, counts = capa.capabilities.common.find_capabilities(rules, extractor, disable_progress=True)
meta = capa.loader.collect_metadata(argv, args.input_file, format_, os_, [], extractor, counts)
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities)
meta = capa.main.collect_metadata([], path, format, os_, [], extractor, counts)
meta.analysis.layout = capa.main.compute_layout(rules, extractor, capabilities)
doc = rd.ResultDocument.from_capa(meta, rules, capabilities)
return {"path": input_file, "status": "ok", "ok": doc.model_dump()}
return {"path": path, "status": "ok", "ok": doc.model_dump()}
def main(argv=None):
@@ -161,16 +151,30 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="detect capabilities in programs.")
capa.main.install_common_args(parser, wanted={"rules", "signatures", "format", "os", "backend"})
parser.add_argument("input_directory", type=str, help="Path to directory of files to recursively analyze")
capa.main.install_common_args(parser, wanted={"rules", "signatures", "format", "os"})
parser.add_argument("input", type=str, help="Path to directory of files to recursively analyze")
parser.add_argument(
"-n", "--parallelism", type=int, default=multiprocessing.cpu_count(), help="parallelism factor"
)
parser.add_argument("--no-mp", action="store_true", help="disable subprocesses")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
try:
rules = capa.main.get_rules(args.rules)
logger.info("successfully loaded %s rules", len(rules))
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
try:
sig_paths = capa.main.get_signatures(args.signatures)
except IOError as e:
logger.error("%s", str(e))
return -1
samples = []
for file in Path(args.input_directory).rglob("*"):
for file in Path(args.input).rglob("*"):
samples.append(file)
cpu_count = multiprocessing.cpu_count()
@@ -199,22 +203,18 @@ def main(argv=None):
logger.debug("using process mapper")
mapper = pmap
rules = args.rules
if rules == [capa.main.RULES_PATH_DEFAULT_STRING]:
rules = None
results = {}
for result in mapper(
get_capa_results,
[(rules, args.signatures, args.format, args.backend, args.os, str(sample)) for sample in samples],
[(rules, sig_paths, "pe", OS_AUTO, sample) for sample in samples],
parallelism=args.parallelism,
):
if result["status"] == "error":
logger.warning(result["error"])
elif result["status"] == "ok":
doc = rd.ResultDocument.model_validate(result["ok"]).model_dump_json(exclude_none=True)
results[result["path"]] = json.loads(doc)
results[result["path"].as_posix()] = rd.ResultDocument.model_validate(result["ok"]).model_dump_json(
exclude_none=True
)
else:
raise ValueError(f"unexpected status: {result['status']}")

View File

@@ -15,7 +15,6 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import sys
import logging
import argparse
@@ -37,27 +36,20 @@ def main(argv=None):
parser = argparse.ArgumentParser(description="Cache ruleset.")
capa.main.install_common_args(parser)
parser.add_argument("rules", type=str, help="Path to rules directory")
parser.add_argument("rules", type=str, action="append", help="Path to rules")
parser.add_argument("cache", type=str, help="Path to cache directory")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
# don't use capa.main.handle_common_args
# because it expects a different format for the --rules argument
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
if args.debug:
logging.getLogger("capa").setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
logging.getLogger("capa").setLevel(logging.ERROR)
try:
cache_dir = Path(args.cache)
cache_dir.mkdir(parents=True, exist_ok=True)
rules = capa.rules.get_rules([Path(args.rules)], cache_dir)
rules = capa.main.get_rules(args.rules, cache_dir)
logger.info("successfully loaded %s rules", len(rules))
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))

View File

@@ -61,22 +61,7 @@ var_names = ["".join(letters) for letters in itertools.product(string.ascii_lowe
# this have to be the internal names used by capa.py which are sometimes different to the ones written out in the rules, e.g. "2 or more" is "Some", count is Range
unsupported = [
"characteristic",
"mnemonic",
"offset",
"subscope",
"Range",
"os",
"property",
"format",
"class",
"operand[0].number",
"operand[1].number",
"substring",
"arch",
"namespace",
]
unsupported = ["characteristic", "mnemonic", "offset", "subscope", "Range"]
# further idea: shorten this list, possible stuff:
# - 2 or more strings: e.g.
# -- https://github.com/mandiant/capa-rules/blob/master/collection/file-managers/gather-direct-ftp-information.yml
@@ -105,7 +90,8 @@ condition_header = """
condition_rule = """
private rule capa_pe_file : CAPA {
meta:
description = "Match in PE files. Used by other CAPA rules"
description = "match in PE files. used by all further CAPA rules"
author = "Arnim Rupp"
condition:
uint16be(0) == 0x4d5a
or uint16be(0) == 0x558b
@@ -723,33 +709,36 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Capa to YARA rule converter")
capa.main.install_common_args(parser, wanted={"tag"})
parser.add_argument("rules", type=str, help="Path to rules")
parser.add_argument("--private", "-p", action="store_true", help="Create private rules", default=False)
parser.add_argument("rules", type=str, help="Path to rules directory")
capa.main.install_common_args(parser, wanted={"tag"})
args = parser.parse_args(args=argv)
make_priv = args.private
# don't use capa.main.handle_common_args
# because it expects a different format for the --rules argument
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
if args.verbose:
level = logging.DEBUG
elif args.quiet:
level = logging.ERROR
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
level = logging.INFO
logging.basicConfig(level=level)
logging.getLogger("capa2yara").setLevel(level)
try:
rules = capa.rules.get_rules([Path(args.rules)])
logger.info("successfully loaded %s rules", len(rules))
rules = capa.main.get_rules([Path(args.rules)])
namespaces = capa.rules.index_rules_by_namespace(list(rules.rules.values()))
logger.info("successfully loaded %d rules (including subscope rules which will be ignored)", len(rules))
if args.tag:
rules = rules.filter_rules_by_meta(args.tag)
logger.debug("selected %d rules", len(rules))
for i, r in enumerate(rules.rules, 1):
logger.debug(" %d. %s", i, r)
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
namespaces = capa.rules.index_rules_by_namespace(list(rules.rules.values()))
output_yar(
"// Rules from Mandiant's https://github.com/mandiant/capa-rules converted to YARA using https://github.com/mandiant/capa/blob/master/scripts/capa2yara.py by Arnim Rupp"
)
@@ -777,10 +766,10 @@ def main(argv=None):
cround += 1
logger.info("doing convert_rules(), round: %d", cround)
num_rules = len(converted_rules)
count_incomplete += convert_rules(rules, namespaces, cround, args.private)
count_incomplete += convert_rules(rules, namespaces, cround, make_priv)
# one last round to collect all unconverted rules
count_incomplete += convert_rules(rules, namespaces, 9000, args.private)
count_incomplete += convert_rules(rules, namespaces, 9000, make_priv)
stats = "\n// converted rules : " + str(len(converted_rules))
stats += "\n// among those are incomplete : " + str(count_incomplete)

View File

@@ -15,7 +15,6 @@ from pathlib import Path
import capa.main
import capa.rules
import capa.engine
import capa.loader
import capa.features
import capa.render.json
import capa.render.utils as rutils
@@ -169,19 +168,19 @@ def render_dictionary(doc: rd.ResultDocument) -> Dict[str, Any]:
# ==== render dictionary helpers
def capa_details(rules_path: Path, input_file: Path, output_format="dictionary"):
def capa_details(rules_path: Path, file_path: Path, output_format="dictionary"):
# load rules from disk
rules = capa.rules.get_rules([rules_path])
rules = capa.main.get_rules([rules_path])
# extract features and find capabilities
extractor = capa.loader.get_extractor(
input_file, FORMAT_AUTO, OS_AUTO, capa.main.BACKEND_VIV, [], should_save_workspace=False, disable_progress=True
extractor = capa.main.get_extractor(
file_path, FORMAT_AUTO, OS_AUTO, capa.main.BACKEND_VIV, [], False, disable_progress=True
)
capabilities, counts = capa.capabilities.common.find_capabilities(rules, extractor, disable_progress=True)
# collect metadata (used only to make rendering more complete)
meta = capa.loader.collect_metadata([], input_file, FORMAT_AUTO, OS_AUTO, [rules_path], extractor, counts)
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities)
meta = capa.main.collect_metadata([], file_path, FORMAT_AUTO, OS_AUTO, [rules_path], extractor, counts)
meta.analysis.layout = capa.main.compute_layout(rules, extractor, capabilities)
capa_output: Any = False
@@ -207,7 +206,7 @@ if __name__ == "__main__":
RULES_PATH = capa.main.get_default_root() / "rules"
parser = argparse.ArgumentParser(description="Extract capabilities from a file")
parser.add_argument("input_file", help="file to extract capabilities from")
parser.add_argument("file", help="file to extract capabilities from")
parser.add_argument("--rules", help="path to rules directory", default=RULES_PATH)
parser.add_argument(
"--output", help="output format", choices=["dictionary", "json", "texttable"], default="dictionary"
@@ -215,5 +214,5 @@ if __name__ == "__main__":
args = parser.parse_args()
if args.rules != RULES_PATH:
args.rules = Path(args.rules)
print(capa_details(args.rules, Path(args.input_file), args.output))
print(capa_details(args.rules, Path(args.file), args.output))
sys.exit(0)

View File

@@ -14,13 +14,11 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import sys
import logging
import argparse
from pathlib import Path
import capa.main
import capa.rules
logger = logging.getLogger("capafmt")
@@ -31,7 +29,6 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Capa rule formatter.")
capa.main.install_common_args(parser)
parser.add_argument("path", type=str, help="Path to rule to format")
parser.add_argument(
"-i",
@@ -40,6 +37,8 @@ def main(argv=None):
dest="in_place",
help="Format the rule in place, otherwise, write formatted rule to STDOUT",
)
parser.add_argument("-v", "--verbose", action="store_true", help="Enable debug logging")
parser.add_argument("-q", "--quiet", action="store_true", help="Disable all output but errors")
parser.add_argument(
"-c",
"--check",
@@ -48,10 +47,15 @@ def main(argv=None):
)
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
except capa.main.ShouldExitError as e:
return e.status_code
if args.verbose:
level = logging.DEBUG
elif args.quiet:
level = logging.ERROR
else:
level = logging.INFO
logging.basicConfig(level=level)
logging.getLogger("capafmt").setLevel(level)
rule = capa.rules.Rule.from_yaml_file(args.path, use_ruamel=True)
reformatted_rule = rule.to_yaml()

View File

@@ -17,8 +17,8 @@ import logging
import argparse
import contextlib
from typing import BinaryIO
from pathlib import Path
import capa.main
import capa.helpers
import capa.features.extractors.elf
@@ -36,16 +36,28 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Detect the underlying OS for the given ELF file")
capa.main.install_common_args(parser, wanted={"input_file"})
parser.add_argument("sample", type=str, help="path to ELF file")
logging_group = parser.add_argument_group("logging arguments")
logging_group.add_argument("-d", "--debug", action="store_true", help="enable debugging output on STDERR")
logging_group.add_argument(
"-q", "--quiet", action="store_true", help="disable all status output except fatal errors"
)
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
except capa.main.ShouldExitError as e:
return e.status_code
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
f = args.input_file.open("rb")
f = Path(args.sample).open("rb")
with contextlib.closing(f):
try:

View File

@@ -48,7 +48,7 @@ def find_overlapping_rules(new_rule_path, rules_path):
overlapping_rules = []
# capa.rules.RuleSet stores all rules in given paths
ruleset = capa.rules.get_rules(rules_path)
ruleset = capa.main.get_rules(rules_path)
for rule_name, rule in ruleset.rules.items():
rule_features = rule.extract_all_features()

View File

@@ -28,7 +28,6 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import logging
import binascii
from pathlib import Path

View File

@@ -13,7 +13,6 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import gc
import os
import re
@@ -40,7 +39,6 @@ import tqdm.contrib.logging
import capa.main
import capa.rules
import capa.engine
import capa.loader
import capa.helpers
import capa.features.insn
import capa.capabilities.common
@@ -309,8 +307,9 @@ class InvalidAttckOrMbcTechnique(Lint):
with data_path.open("rb") as fd:
self.data = json.load(fd)
self.enabled_frameworks = self.data.keys()
except (FileNotFoundError, json.decoder.JSONDecodeError):
# linter-data.json missing, or JSON error: log an error and skip this lint
except BaseException:
# If linter-data.json is not present, or if an error happen
# we log an error and lint nothing.
logger.warning(
"Could not load 'scripts/linter-data.json'. The att&ck and mbc information will not be linted."
)
@@ -356,20 +355,16 @@ def get_sample_capabilities(ctx: Context, path: Path) -> Set[str]:
logger.debug("found cached results: %s: %d capabilities", nice_path, len(ctx.capabilities_by_sample[path]))
return ctx.capabilities_by_sample[path]
if nice_path.name.endswith(capa.helpers.EXTENSIONS_SHELLCODE_32):
format_ = "sc32"
elif nice_path.name.endswith(capa.helpers.EXTENSIONS_SHELLCODE_64):
format_ = "sc64"
else:
format_ = capa.helpers.get_auto_format(nice_path)
logger.debug("analyzing sample: %s", nice_path)
args = argparse.Namespace(input_file=nice_path, format=capa.main.FORMAT_AUTO, backend=capa.main.BACKEND_AUTO)
format_ = capa.main.get_input_format_from_cli(args)
backend = capa.main.get_backend_from_cli(args, format_)
extractor = capa.loader.get_extractor(
nice_path,
format_,
OS_AUTO,
backend,
DEFAULT_SIGNATURES,
should_save_workspace=False,
disable_progress=True,
extractor = capa.main.get_extractor(
nice_path, format_, OS_AUTO, capa.main.BACKEND_VIV, DEFAULT_SIGNATURES, False, disable_progress=True
)
capabilities, _ = capa.capabilities.common.find_capabilities(ctx.rules, extractor, disable_progress=True)
@@ -654,6 +649,16 @@ class FeatureNtdllNtoskrnlApi(Lint):
return False
class FormatLineFeedEOL(Lint):
name = "line(s) end with CRLF (\\r\\n)"
recommendation = "convert line endings to LF (\\n) for example using dos2unix"
def check_rule(self, ctx: Context, rule: Rule):
if len(rule.definition.split("\r\n")) > 0:
return False
return True
class FormatSingleEmptyLineEOF(Lint):
name = "EOF format"
recommendation = "end file with a single empty line"
@@ -669,14 +674,16 @@ class FormatIncorrect(Lint):
recommendation_template = "use scripts/capafmt.py or adjust as follows\n{:s}"
def check_rule(self, ctx: Context, rule: Rule):
# EOL depends on Git and our .gitattributes defines text=auto (Git handles files it thinks is best)
# we prefer LF only, but enforcing across OSs seems tedious and unnecessary
actual = rule.definition.replace("\r\n", "\n")
actual = rule.definition
expected = capa.rules.Rule.from_yaml(rule.definition, use_ruamel=True).to_yaml()
if actual != expected:
diff = difflib.ndiff(actual.splitlines(1), expected.splitlines(True))
recommendation_template = self.recommendation_template
if "\r\n" in actual:
recommendation_template = (
self.recommendation_template + "\nplease make sure that the file uses LF (\\n) line endings only"
)
self.recommendation = recommendation_template.format("".join(diff))
return True
@@ -790,6 +797,7 @@ def lint_features(ctx: Context, rule: Rule):
FORMAT_LINTS = (
FormatLineFeedEOL(),
FormatSingleEmptyLineEOF(),
FormatStringQuotesIncorrect(),
FormatIncorrect(),
@@ -982,11 +990,7 @@ def main(argv=None):
help="Enable thorough linting - takes more time, but does a better job",
)
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
except capa.main.ShouldExitError as e:
return e.status_code
capa.main.handle_common_args(args)
if args.debug:
logging.getLogger("capa").setLevel(logging.DEBUG)
@@ -998,9 +1002,16 @@ def main(argv=None):
time0 = time.time()
try:
rules = capa.main.get_rules_from_cli(args)
except capa.main.ShouldExitError as e:
return e.status_code
rules = capa.main.get_rules(args.rules)
logger.info("successfully loaded %s rules", rules.source_rule_count)
if args.tag:
rules = rules.filter_rules_by_meta(args.tag)
logger.debug("selected %s rules", len(rules))
for i, r in enumerate(rules.rules, 1):
logger.debug(" %d. %s", i, r)
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
logger.info("collecting potentially referenced samples")
samples_path = Path(args.samples)

View File

@@ -43,8 +43,7 @@
"T1598": "Phishing for Information",
"T1598.001": "Phishing for Information::Spearphishing Service",
"T1598.002": "Phishing for Information::Spearphishing Attachment",
"T1598.003": "Phishing for Information::Spearphishing Link",
"T1598.004": "Phishing for Information::Spearphishing Voice"
"T1598.003": "Phishing for Information::Spearphishing Link"
},
"Resource Development": {
"T1583": "Acquire Infrastructure",
@@ -112,9 +111,7 @@
"T1566": "Phishing",
"T1566.001": "Phishing::Spearphishing Attachment",
"T1566.002": "Phishing::Spearphishing Link",
"T1566.003": "Phishing::Spearphishing via Service",
"T1566.004": "Phishing::Spearphishing Voice",
"T1659": "Content Injection"
"T1566.003": "Phishing::Spearphishing via Service"
},
"Execution": {
"T1047": "Windows Management Instrumentation",
@@ -178,7 +175,6 @@
"T1098.003": "Account Manipulation::Additional Cloud Roles",
"T1098.004": "Account Manipulation::SSH Authorized Keys",
"T1098.005": "Account Manipulation::Device Registration",
"T1098.006": "Account Manipulation::Additional Container Cluster Roles",
"T1133": "External Remote Services",
"T1136": "Create Account",
"T1136.001": "Create Account::Local Account",
@@ -268,8 +264,7 @@
"T1574.010": "Hijack Execution Flow::Services File Permissions Weakness",
"T1574.011": "Hijack Execution Flow::Services Registry Permissions Weakness",
"T1574.012": "Hijack Execution Flow::COR_PROFILER",
"T1574.013": "Hijack Execution Flow::KernelCallbackTable",
"T1653": "Power Settings"
"T1574.013": "Hijack Execution Flow::KernelCallbackTable"
},
"Privilege Escalation": {
"T1037": "Boot or Logon Initialization Scripts",
@@ -303,13 +298,6 @@
"T1078.002": "Valid Accounts::Domain Accounts",
"T1078.003": "Valid Accounts::Local Accounts",
"T1078.004": "Valid Accounts::Cloud Accounts",
"T1098": "Account Manipulation",
"T1098.001": "Account Manipulation::Additional Cloud Credentials",
"T1098.002": "Account Manipulation::Additional Email Delegate Permissions",
"T1098.003": "Account Manipulation::Additional Cloud Roles",
"T1098.004": "Account Manipulation::SSH Authorized Keys",
"T1098.005": "Account Manipulation::Device Registration",
"T1098.006": "Account Manipulation::Additional Container Cluster Roles",
"T1134": "Access Token Manipulation",
"T1134.001": "Access Token Manipulation::Token Impersonation/Theft",
"T1134.002": "Access Token Manipulation::Create Process with Token",
@@ -361,7 +349,6 @@
"T1548.002": "Abuse Elevation Control Mechanism::Bypass User Account Control",
"T1548.003": "Abuse Elevation Control Mechanism::Sudo and Sudo Caching",
"T1548.004": "Abuse Elevation Control Mechanism::Elevated Execution with Prompt",
"T1548.005": "Abuse Elevation Control Mechanism::Temporary Elevated Cloud Access",
"T1574": "Hijack Execution Flow",
"T1574.001": "Hijack Execution Flow::DLL Search Order Hijacking",
"T1574.002": "Hijack Execution Flow::DLL Side-Loading",
@@ -392,7 +379,6 @@
"T1027.009": "Obfuscated Files or Information::Embedded Payloads",
"T1027.010": "Obfuscated Files or Information::Command Obfuscation",
"T1027.011": "Obfuscated Files or Information::Fileless Storage",
"T1027.012": "Obfuscated Files or Information::LNK Icon Smuggling",
"T1036": "Masquerading",
"T1036.001": "Masquerading::Invalid Code Signature",
"T1036.002": "Masquerading::Right-to-Left Override",
@@ -402,7 +388,6 @@
"T1036.006": "Masquerading::Space after Filename",
"T1036.007": "Masquerading::Double File Extension",
"T1036.008": "Masquerading::Masquerade File Type",
"T1036.009": "Masquerading::Break Process Trees",
"T1055": "Process Injection",
"T1055.001": "Process Injection::Dynamic-link Library Injection",
"T1055.002": "Process Injection::Portable Executable Injection",
@@ -490,7 +475,6 @@
"T1548.002": "Abuse Elevation Control Mechanism::Bypass User Account Control",
"T1548.003": "Abuse Elevation Control Mechanism::Sudo and Sudo Caching",
"T1548.004": "Abuse Elevation Control Mechanism::Elevated Execution with Prompt",
"T1548.005": "Abuse Elevation Control Mechanism::Temporary Elevated Cloud Access",
"T1550": "Use Alternate Authentication Material",
"T1550.001": "Use Alternate Authentication Material::Application Access Token",
"T1550.002": "Use Alternate Authentication Material::Pass the Hash",
@@ -519,11 +503,10 @@
"T1562.004": "Impair Defenses::Disable or Modify System Firewall",
"T1562.006": "Impair Defenses::Indicator Blocking",
"T1562.007": "Impair Defenses::Disable or Modify Cloud Firewall",
"T1562.008": "Impair Defenses::Disable or Modify Cloud Logs",
"T1562.008": "Impair Defenses::Disable Cloud Logs",
"T1562.009": "Impair Defenses::Safe Mode Boot",
"T1562.010": "Impair Defenses::Downgrade Attack",
"T1562.011": "Impair Defenses::Spoof Security Alerting",
"T1562.012": "Impair Defenses::Disable or Modify Linux Audit System",
"T1564": "Hide Artifacts",
"T1564.001": "Hide Artifacts::Hidden Files and Directories",
"T1564.002": "Hide Artifacts::Hidden Users",
@@ -535,7 +518,6 @@
"T1564.008": "Hide Artifacts::Email Hiding Rules",
"T1564.009": "Hide Artifacts::Resource Forking",
"T1564.010": "Hide Artifacts::Process Argument Spoofing",
"T1564.011": "Hide Artifacts::Ignore Process Interrupts",
"T1574": "Hijack Execution Flow",
"T1574.001": "Hijack Execution Flow::DLL Search Order Hijacking",
"T1574.002": "Hijack Execution Flow::DLL Side-Loading",
@@ -554,7 +536,6 @@
"T1578.002": "Modify Cloud Compute Infrastructure::Create Cloud Instance",
"T1578.003": "Modify Cloud Compute Infrastructure::Delete Cloud Instance",
"T1578.004": "Modify Cloud Compute Infrastructure::Revert Cloud Instance",
"T1578.005": "Modify Cloud Compute Infrastructure::Modify Cloud Compute Configurations",
"T1599": "Network Boundary Bridging",
"T1599.001": "Network Boundary Bridging::Network Address Translation Traversal",
"T1600": "Weaken Encryption",
@@ -567,8 +548,7 @@
"T1612": "Build Image on Host",
"T1620": "Reflective Code Loading",
"T1622": "Debugger Evasion",
"T1647": "Plist File Modification",
"T1656": "Impersonation"
"T1647": "Plist File Modification"
},
"Credential Access": {
"T1003": "OS Credential Dumping",
@@ -611,7 +591,6 @@
"T1555.003": "Credentials from Password Stores::Credentials from Web Browsers",
"T1555.004": "Credentials from Password Stores::Windows Credential Manager",
"T1555.005": "Credentials from Password Stores::Password Managers",
"T1555.006": "Credentials from Password Stores::Cloud Secrets Management Stores",
"T1556": "Modify Authentication Process",
"T1556.001": "Modify Authentication Process::Domain Controller Authentication",
"T1556.002": "Modify Authentication Process::Password Filter DLL",
@@ -642,7 +621,6 @@
"T1012": "Query Registry",
"T1016": "System Network Configuration Discovery",
"T1016.001": "System Network Configuration Discovery::Internet Connection Discovery",
"T1016.002": "System Network Configuration Discovery::Wi-Fi Discovery",
"T1018": "Remote System Discovery",
"T1033": "System Owner/User Discovery",
"T1040": "Network Sniffing",
@@ -681,8 +659,7 @@
"T1615": "Group Policy Discovery",
"T1619": "Cloud Storage Object Discovery",
"T1622": "Debugger Evasion",
"T1652": "Device Driver Discovery",
"T1654": "Log Enumeration"
"T1652": "Device Driver Discovery"
},
"Lateral Movement": {
"T1021": "Remote Services",
@@ -693,7 +670,6 @@
"T1021.005": "Remote Services::VNC",
"T1021.006": "Remote Services::Windows Remote Management",
"T1021.007": "Remote Services::Cloud Services",
"T1021.008": "Remote Services::Direct Cloud VM Connections",
"T1072": "Software Deployment Tools",
"T1080": "Taint Shared Content",
"T1091": "Replication Through Removable Media",
@@ -787,8 +763,7 @@
"T1572": "Protocol Tunneling",
"T1573": "Encrypted Channel",
"T1573.001": "Encrypted Channel::Symmetric Cryptography",
"T1573.002": "Encrypted Channel::Asymmetric Cryptography",
"T1659": "Content Injection"
"T1573.002": "Encrypted Channel::Asymmetric Cryptography"
},
"Exfiltration": {
"T1011": "Exfiltration Over Other Network Medium",
@@ -808,8 +783,7 @@
"T1567": "Exfiltration Over Web Service",
"T1567.001": "Exfiltration Over Web Service::Exfiltration to Code Repository",
"T1567.002": "Exfiltration Over Web Service::Exfiltration to Cloud Storage",
"T1567.003": "Exfiltration Over Web Service::Exfiltration to Text Storage Sites",
"T1567.004": "Exfiltration Over Web Service::Exfiltration Over Webhook"
"T1567.003": "Exfiltration Over Web Service::Exfiltration to Text Storage Sites"
},
"Impact": {
"T1485": "Data Destruction",
@@ -837,8 +811,7 @@
"T1565": "Data Manipulation",
"T1565.001": "Data Manipulation::Stored Data Manipulation",
"T1565.002": "Data Manipulation::Transmitted Data Manipulation",
"T1565.003": "Data Manipulation::Runtime Data Manipulation",
"T1657": "Financial Theft"
"T1565.003": "Data Manipulation::Runtime Data Manipulation"
}
},
"mbc": {

View File

@@ -62,7 +62,6 @@ import capa.engine
import capa.helpers
import capa.features
import capa.features.freeze
from capa.loader import BACKEND_VIV
logger = logging.getLogger("capa.match-function-id")
@@ -72,53 +71,61 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="FLIRT match each function")
capa.main.install_common_args(parser, wanted={"input_file", "signatures", "format"})
parser.add_argument("sample", type=str, help="Path to sample to analyze")
parser.add_argument(
"-F",
"--function",
type=lambda x: int(x, 0x10),
help="match a specific function by VA, rather than add functions",
)
parser.add_argument(
"--signature",
action="append",
dest="signatures",
type=str,
default=[],
help="use the given signatures to identify library functions, file system paths to .sig/.pat files.",
)
parser.add_argument("-d", "--debug", action="store_true", help="Enable debugging output on STDERR")
parser.add_argument("-q", "--quiet", action="store_true", help="Disable all output but errors")
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
sig_paths = capa.main.get_signatures_from_cli(args, input_format, BACKEND_VIV)
except capa.main.ShouldExitError as e:
return e.status_code
if args.quiet:
logging.basicConfig(level=logging.ERROR)
logging.getLogger().setLevel(logging.ERROR)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
# disable vivisect-related logging, it's verbose and not relevant for capa users
capa.main.set_vivisect_log_level(logging.CRITICAL)
analyzers = []
for sigpath in sig_paths:
sigs = viv_utils.flirt.load_flirt_signature(str(sigpath))
for sigpath in args.signatures:
sigs = viv_utils.flirt.load_flirt_signature(sigpath)
with capa.main.timing("flirt: compiling sigs"):
matcher = flirt.compile(sigs)
analyzer = viv_utils.flirt.FlirtFunctionAnalyzer(matcher, str(sigpath))
analyzer = viv_utils.flirt.FlirtFunctionAnalyzer(matcher, sigpath)
logger.debug("registering viv function analyzer: %s", repr(analyzer))
analyzers.append(analyzer)
vw = viv_utils.getWorkspace(str(args.input_file), analyze=True, should_save=False)
vw = viv_utils.getWorkspace(args.sample, analyze=True, should_save=False)
functions = vw.getFunctions()
if args.function:
functions = [args.function]
seen = set()
for function in functions:
logger.debug("matching function: 0x%04x", function)
for analyzer in analyzers:
viv_utils.flirt.match_function_flirt_signatures(analyzer.matcher, vw, function)
name = viv_utils.get_function_name(vw, function)
name = viv_utils.flirt.match_function_flirt_signatures(analyzer.matcher, vw, function)
if name:
key = (function, name)
if key in seen:
continue
else:
print(f"0x{function:04x}: {name}")
seen.add(key)
print(f"0x{function:04x}: {name}")
return 0

View File

@@ -41,6 +41,7 @@ import timeit
import logging
import argparse
import subprocess
from pathlib import Path
import tqdm
import tabulate
@@ -49,7 +50,6 @@ import capa.main
import capa.perf
import capa.rules
import capa.engine
import capa.loader
import capa.helpers
import capa.features
import capa.features.common
@@ -74,22 +74,42 @@ def main(argv=None):
label += " (dirty)"
parser = argparse.ArgumentParser(description="Profile capa performance")
capa.main.install_common_args(parser, wanted={"format", "os", "input_file", "signatures", "rules"})
capa.main.install_common_args(parser, wanted={"format", "os", "sample", "signatures", "rules"})
parser.add_argument("--number", type=int, default=3, help="batch size of profile collection")
parser.add_argument("--repeat", type=int, default=30, help="batch count of profile collection")
parser.add_argument("--label", type=str, default=label, help="description of the profile collection")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
try:
taste = capa.helpers.get_file_taste(Path(args.sample))
except IOError as e:
logger.error("%s", str(e))
return -1
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
with capa.main.timing("load rules"):
rules = capa.main.get_rules_from_cli(args)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
return e.status_code
rules = capa.main.get_rules(args.rules)
except IOError as e:
logger.error("%s", str(e))
return -1
try:
sig_paths = capa.main.get_signatures(args.signatures)
except IOError as e:
logger.error("%s", str(e))
return -1
if (args.format == "freeze") or (
args.format == capa.features.common.FORMAT_AUTO and capa.features.freeze.is_freeze(taste)
):
extractor = capa.features.freeze.load(Path(args.sample).read_bytes())
else:
extractor = capa.main.get_extractor(
args.sample, args.format, args.os, capa.main.BACKEND_VIV, sig_paths, should_save_workspace=False
)
with tqdm.tqdm(total=args.number * args.repeat, leave=False) as pbar:

View File

@@ -33,7 +33,6 @@ import logging
import argparse
from pathlib import Path
import capa.main
import capa.render.proto
import capa.render.result_document
@@ -45,14 +44,26 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Convert a capa JSON result document into the protobuf format")
capa.main.install_common_args(parser)
parser.add_argument("json", type=str, help="path to JSON result document file, produced by `capa --json`")
logging_group = parser.add_argument_group("logging arguments")
logging_group.add_argument("-d", "--debug", action="store_true", help="enable debugging output on STDERR")
logging_group.add_argument(
"-q", "--quiet", action="store_true", help="disable all status output except fatal errors"
)
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
except capa.main.ShouldExitError as e:
return e.status_code
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
rd = capa.render.result_document.ResultDocument.from_file(Path(args.json))
pb = capa.render.proto.doc_to_pb2(rd)

View File

@@ -36,7 +36,6 @@ import logging
import argparse
from pathlib import Path
import capa.main
import capa.render.json
import capa.render.proto
import capa.render.proto.capa_pb2
@@ -50,16 +49,28 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Convert a capa protobuf result document into the JSON format")
capa.main.install_common_args(parser)
parser.add_argument(
"pb", type=str, help="path to protobuf result document file, produced by `proto-from-results.py`"
)
logging_group = parser.add_argument_group("logging arguments")
logging_group.add_argument("-d", "--debug", action="store_true", help="enable debugging output on STDERR")
logging_group.add_argument(
"-q", "--quiet", action="store_true", help="disable all status output except fatal errors"
)
args = parser.parse_args(args=argv)
try:
capa.main.handle_common_args(args)
except capa.main.ShouldExitError as e:
return e.status_code
if args.quiet:
logging.basicConfig(level=logging.WARNING)
logging.getLogger().setLevel(logging.WARNING)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
pb = Path(args.pb).read_bytes()

View File

@@ -178,8 +178,11 @@ def main(args: argparse.Namespace) -> None:
data["mbc"] = MbcExtractor().run()
logging.info("Writing results to %s", args.output)
with Path(args.output).open("w", encoding="utf-8") as jf:
json.dump(data, jf, indent=2)
try:
with Path(args.output).open("w", encoding="utf-8") as jf:
json.dump(data, jf, indent=2)
except BaseException as e:
logging.error("Exception encountered when writing results: %s", e)
if __name__ == "__main__":

View File

@@ -55,11 +55,13 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import os
import sys
import logging
import argparse
import collections
from typing import Dict
from pathlib import Path
import colorama
@@ -74,7 +76,10 @@ import capa.render.verbose
import capa.features.freeze
import capa.capabilities.common
import capa.render.result_document as rd
from capa.helpers import get_file_taste
from capa.features.common import FORMAT_AUTO
from capa.features.freeze import Address
from capa.features.extractors.base_extractor import FeatureExtractor, StaticFeatureExtractor
logger = logging.getLogger("capa.show-capabilities-by-function")
@@ -137,37 +142,67 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="detect capabilities in programs.")
capa.main.install_common_args(
parser, wanted={"format", "os", "backend", "input_file", "signatures", "rules", "tag"}
)
capa.main.install_common_args(parser, wanted={"format", "os", "backend", "sample", "signatures", "rules", "tag"})
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
rules = capa.main.get_rules_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
sample_path = capa.main.get_sample_path_from_cli(args, backend)
if sample_path is None:
os_ = "unknown"
else:
os_ = capa.loader.get_os(sample_path)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
return e.status_code
taste = get_file_taste(Path(args.sample))
except IOError as e:
logger.error("%s", str(e))
return -1
try:
rules = capa.main.get_rules(args.rules)
logger.info("successfully loaded %s rules", len(rules))
if args.tag:
rules = rules.filter_rules_by_meta(args.tag)
logger.info("selected %s rules", len(rules))
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
try:
sig_paths = capa.main.get_signatures(args.signatures)
except IOError as e:
logger.error("%s", str(e))
return -1
if (args.format == "freeze") or (args.format == FORMAT_AUTO and capa.features.freeze.is_freeze(taste)):
format_ = "freeze"
extractor: FeatureExtractor = capa.features.freeze.load(Path(args.sample).read_bytes())
else:
format_ = args.format
should_save_workspace = os.environ.get("CAPA_SAVE_WORKSPACE") not in ("0", "no", "NO", "n", None)
try:
extractor = capa.main.get_extractor(
args.sample, args.format, args.os, args.backend, sig_paths, should_save_workspace
)
assert isinstance(extractor, StaticFeatureExtractor)
except capa.exceptions.UnsupportedFormatError:
capa.helpers.log_unsupported_format_error()
return -1
except capa.exceptions.UnsupportedRuntimeError:
capa.helpers.log_unsupported_runtime_error()
return -1
capabilities, counts = capa.capabilities.common.find_capabilities(rules, extractor)
meta = capa.loader.collect_metadata(argv, args.input_file, input_format, os_, args.rules, extractor, counts)
meta.analysis.layout = capa.loader.compute_layout(rules, extractor, capabilities)
meta = capa.main.collect_metadata(argv, args.sample, format_, args.os, args.rules, extractor, counts)
meta.analysis.layout = capa.main.compute_layout(rules, extractor, capabilities)
if capa.capabilities.common.has_file_limitation(rules, capabilities):
# bail if capa encountered file limitation e.g. a packed binary
# do show the output in verbose mode, though.
if not (args.verbose or args.vverbose or args.json):
return capa.main.E_FILE_LIMITATION
return -1
# colorama will detect:
# - when on Windows console, and fixup coloring, and
# - when not an interactive session, and disable coloring
# renderers should use coloring and assume it will be stripped out if necessary.
colorama.init()
doc = rd.ResultDocument.from_capa(meta, rules, capabilities)
print(render_matches_by_function(doc))
colorama.deinit()

View File

@@ -64,15 +64,16 @@ Example::
insn: 0x10001027: mnemonic(shl)
...
"""
import os
import sys
import logging
import argparse
from typing import Tuple
from pathlib import Path
import capa.main
import capa.rules
import capa.engine
import capa.loader
import capa.helpers
import capa.features
import capa.exceptions
@@ -80,9 +81,17 @@ import capa.render.verbose as v
import capa.features.freeze
import capa.features.address
import capa.features.extractors.pefile
from capa.helpers import assert_never
from capa.helpers import get_auto_format, log_unsupported_runtime_error
from capa.features.insn import API, Number
from capa.features.common import String, Feature, is_global_feature
from capa.features.common import (
FORMAT_AUTO,
FORMAT_CAPE,
FORMAT_FREEZE,
DYNAMIC_FORMATS,
String,
Feature,
is_global_feature,
)
from capa.features.extractors.base_extractor import FunctionHandle, StaticFeatureExtractor, DynamicFeatureExtractor
logger = logging.getLogger("capa.show-features")
@@ -97,33 +106,56 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Show the features that capa extracts from the given sample")
capa.main.install_common_args(parser, wanted={"input_file", "format", "os", "signatures", "backend"})
capa.main.install_common_args(parser, wanted={"format", "os", "sample", "signatures", "backend"})
parser.add_argument("-F", "--function", type=str, help="Show features for specific function")
parser.add_argument("-P", "--process", type=str, help="Show features for specific process name")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
if args.function and args.backend == "pefile":
print("pefile backend does not support extracting function features")
return -1
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
_ = capa.helpers.get_file_taste(Path(args.sample))
except IOError as e:
logger.error("%s", str(e))
return -1
if args.function and args.backend == "pefile":
print("pefile backend does not support extracting function features")
try:
sig_paths = capa.main.get_signatures(args.signatures)
except IOError as e:
logger.error("%s", str(e))
return -1
format_ = args.format if args.format != FORMAT_AUTO else get_auto_format(args.sample)
if format_ == FORMAT_FREEZE:
# this should be moved above the previous if clause after implementing
# feature freeze for the dynamic analysis flavor
extractor = capa.features.freeze.load(Path(args.sample).read_bytes())
else:
should_save_workspace = os.environ.get("CAPA_SAVE_WORKSPACE") not in ("0", "no", "NO", "n", None)
try:
extractor = capa.main.get_extractor(
args.sample, format_, args.os, args.backend, sig_paths, should_save_workspace
)
except capa.exceptions.UnsupportedFormatError as e:
if format_ == FORMAT_CAPE:
capa.helpers.log_unsupported_cape_report_error(str(e))
else:
capa.helpers.log_unsupported_format_error()
return -1
except capa.exceptions.UnsupportedRuntimeError:
log_unsupported_runtime_error()
return -1
input_format = capa.main.get_input_format_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
return e.status_code
if isinstance(extractor, DynamicFeatureExtractor):
if format_ in DYNAMIC_FORMATS:
assert isinstance(extractor, DynamicFeatureExtractor)
print_dynamic_analysis(extractor, args)
elif isinstance(extractor, StaticFeatureExtractor):
print_static_analysis(extractor, args)
else:
assert_never(extractor)
assert isinstance(extractor, StaticFeatureExtractor)
print_static_analysis(extractor, args)
return 0

View File

@@ -8,11 +8,13 @@ Unless required by applicable law or agreed to in writing, software distributed
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import os
import sys
import typing
import logging
import argparse
from typing import Set, Tuple
from pathlib import Path
from collections import Counter
import tabulate
@@ -29,7 +31,8 @@ import capa.features.freeze
import capa.features.address
import capa.features.extractors.pefile
import capa.features.extractors.base_extractor
from capa.features.common import FORMAT_FREEZE, Feature
from capa.helpers import log_unsupported_runtime_error
from capa.features.common import Feature
from capa.features.extractors.base_extractor import FunctionHandle, StaticFeatureExtractor
logger = logging.getLogger("show-unused-features")
@@ -39,9 +42,10 @@ def format_address(addr: capa.features.address.Address) -> str:
return v.format_address(capa.features.freeze.Address.from_capa((addr)))
def get_rules_feature_set(rules: capa.rules.RuleSet) -> Set[Feature]:
def get_rules_feature_set(rules_path) -> Set[Feature]:
ruleset = capa.main.get_rules(rules_path)
rules_feature_set: Set[Feature] = set()
for _, rule in rules.rules.items():
for _, rule in ruleset.rules.items():
rules_feature_set.update(rule.extract_all_features())
return rules_feature_set
@@ -102,23 +106,44 @@ def main(argv=None):
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="Show the features that capa doesn't have rules for yet")
capa.main.install_common_args(parser, wanted={"format", "os", "input_file", "signatures", "backend", "rules"})
capa.main.install_common_args(parser, wanted={"format", "os", "sample", "signatures", "backend", "rules"})
parser.add_argument("-F", "--function", type=str, help="Show features for specific function")
args = parser.parse_args(args=argv)
capa.main.handle_common_args(args)
if args.function and args.backend == "pefile":
print("pefile backend does not support extracting function features")
return -1
try:
capa.main.handle_common_args(args)
capa.main.ensure_input_exists_from_cli(args)
rules = capa.main.get_rules_from_cli(args)
input_format = capa.main.get_input_format_from_cli(args)
backend = capa.main.get_backend_from_cli(args, input_format)
extractor = capa.main.get_extractor_from_cli(args, input_format, backend)
except capa.main.ShouldExitError as e:
return e.status_code
taste = capa.helpers.get_file_taste(Path(args.sample))
except IOError as e:
logger.error("%s", str(e))
return -1
try:
sig_paths = capa.main.get_signatures(args.signatures)
except IOError as e:
logger.error("%s", str(e))
return -1
if (args.format == "freeze") or (
args.format == capa.features.common.FORMAT_AUTO and capa.features.freeze.is_freeze(taste)
):
extractor = capa.features.freeze.load(Path(args.sample).read_bytes())
else:
should_save_workspace = os.environ.get("CAPA_SAVE_WORKSPACE") not in ("0", "no", "NO", "n", None)
try:
extractor = capa.main.get_extractor(
args.sample, args.format, args.os, args.backend, sig_paths, should_save_workspace
)
except capa.exceptions.UnsupportedFormatError:
capa.helpers.log_unsupported_format_error()
return -1
except capa.exceptions.UnsupportedRuntimeError:
log_unsupported_runtime_error()
return -1
assert isinstance(extractor, StaticFeatureExtractor), "only static analysis supported today"
@@ -134,7 +159,7 @@ def main(argv=None):
function_handles = tuple(extractor.get_functions())
if args.function:
if input_format == FORMAT_FREEZE:
if args.format == "freeze":
function_handles = tuple(filter(lambda fh: fh.address == args.function, function_handles))
else:
function_handles = tuple(filter(lambda fh: format_address(fh.address) == args.function, function_handles))
@@ -149,7 +174,7 @@ def main(argv=None):
feature_map.update(get_file_features(function_handles, extractor))
rules_feature_set = get_rules_feature_set(rules)
rules_feature_set = get_rules_feature_set(args.rules)
print_unused_features(feature_map, rules_feature_set)
return 0
@@ -181,8 +206,7 @@ def ida_main():
feature_map.update(get_file_features(function_handles, extractor))
rules_path = capa.main.get_default_root() / "rules"
rules = capa.rules.get_rules([rules_path])
rules_feature_set = get_rules_feature_set(rules)
rules_feature_set = get_rules_feature_set([rules_path])
print_unused_features(feature_map, rules_feature_set)

69
scripts/vivisect-py2-vs-py3.sh Executable file
View File

@@ -0,0 +1,69 @@
#!/usr/bin/env bash
int() {
int=$(bc <<< "scale=0; ($1 + 0.5)/1")
}
export TIMEFORMAT='%3R'
threshold_time=90
threshold_py3_time=60 # Do not warn if it doesn't take at least 1 minute to run
rm tests/data/*.viv 2>/dev/null
mkdir results
for file in tests/data/*
do
file=$(printf %q "$file") # Handle names with white spaces
file_name=$(basename $file)
echo $file_name
rm "$file.viv" 2>/dev/null
py3_time=$(sh -c "time python3 scripts/show-features.py $file >> results/p3-$file_name.out 2>/dev/null" 2>&1)
rm "$file.viv" 2>/dev/null
py2_time=$(sh -c "time python2 scripts/show-features.py $file >> results/p2-$file_name.out 2>/dev/null" 2>&1)
int $py3_time
if (($int > $threshold_py3_time))
then
percentage=$(bc <<< "scale=3; $py2_time/$py3_time*100 + 0.5")
int $percentage
if (($int < $threshold_py3_time))
then
echo -n " SLOWER ($percentage): "
fi
fi
echo " PY2($py2_time) PY3($py3_time)"
done
threshold_features=98
counter=0
average=0
results_for() {
py3=$(cat "results/p3-$file_name.out" | grep "$1" | wc -l)
py2=$(cat "results/p2-$file_name.out" | grep "$1" | wc -l)
if (($py2 > 0))
then
percentage=$(bc <<< "scale=2; 100*$py3/$py2")
average=$(bc <<< "scale=2; $percentage + $average")
count=$(($count + 1))
int $percentage
if (($int < $threshold_features))
then
echo -e "$1: py2($py2) py3($py3) $percentage% - $file_name"
fi
fi
}
rm tests/data/*.viv 2>/dev/null
echo -e '\nRESULTS:'
for file in tests/data/*
do
file_name=$(basename $file)
if test -f "results/p2-$file_name.out"; then
results_for 'insn'
results_for 'file'
results_for 'func'
results_for 'bb'
fi
done
average=$(bc <<< "scale=2; $average/$count")
echo "TOTAL: $average"

View File

@@ -106,11 +106,11 @@ def get_viv_extractor(path: Path):
]
if "raw32" in path.name:
vw = capa.loader.get_workspace(path, "sc32", sigpaths=sigpaths)
vw = capa.main.get_workspace(path, "sc32", sigpaths=sigpaths)
elif "raw64" in path.name:
vw = capa.loader.get_workspace(path, "sc64", sigpaths=sigpaths)
vw = capa.main.get_workspace(path, "sc64", sigpaths=sigpaths)
else:
vw = capa.loader.get_workspace(path, FORMAT_AUTO, sigpaths=sigpaths)
vw = capa.main.get_workspace(path, FORMAT_AUTO, sigpaths=sigpaths)
vw.saveWorkspace()
extractor = capa.features.extractors.viv.extractor.VivisectFeatureExtractor(vw, path, OS_AUTO)
fixup_viv(path, extractor)
@@ -393,10 +393,6 @@ def get_data_path_by_name(name) -> Path:
return CD / "data" / "ea2876e9175410b6f6719f80ee44b9553960758c7d0f7bed73c0fe9a78d8e669.dll_"
elif name.startswith("1038a2"):
return CD / "data" / "1038a23daad86042c66bfe6c9d052d27048de9653bde5750dc0f240c792d9ac8.elf_"
elif name.startswith("nested_typedef"):
return CD / "data" / "dotnet" / "dd9098ff91717f4906afe9dafdfa2f52.exe_"
elif name.startswith("nested_typeref"):
return CD / "data" / "dotnet" / "2c7d60f77812607dec5085973ff76cea.dll_"
else:
raise ValueError(f"unexpected sample fixture: {name}")
@@ -1278,114 +1274,6 @@ FEATURE_PRESENCE_TESTS_DOTNET = sorted(
), # MemberRef method
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0/myclass_inner0_0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0/myclass_inner0_1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_0/myclass_inner_inner"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner_inner"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner1_0"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner1_1"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner0_0"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner0_1"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.OS.Build/VERSION::SdkInt"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.Media.Image/Plane::Buffer"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.Provider.Telephony/Sent/Sent::ContentUri"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.OS.Build::SdkInt"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Plane::Buffer"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Sent::ContentUri"),
False,
),
],
# order tests by (file, item)
# so that our LRU cache is most effective.

View File

@@ -6,13 +6,10 @@
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import gzip
from typing import Type
from pathlib import Path
import pytest
import fixtures
from capa.exceptions import EmptyReportError, UnsupportedFormatError
from capa.features.extractors.cape.models import Call, CapeReport
CD = Path(__file__).resolve().parent
@@ -44,35 +41,6 @@ def test_cape_model_can_load(version: str, filename: str):
assert report is not None
@fixtures.parametrize(
"version,filename,exception",
[
("v2.2", "0000a65749f5902c4d82ffa701198038f0b4870b00a27cfca109f8f933476d82.json.gz", None),
("v2.2", "55dcd38773f4104b95589acc87d93bf8b4a264b4a6d823b73fb6a7ab8144c08b.json.gz", None),
("v2.2", "77c961050aa252d6d595ec5120981abf02068c968f4a5be5958d10e87aa6f0e8.json.gz", EmptyReportError),
("v2.2", "d46900384c78863420fb3e297d0a2f743cd2b6b3f7f82bf64059a168e07aceb7.json.gz", None),
("v2.4", "36d218f384010cce9f58b8193b7d8cc855d1dff23f80d16e13a883e152d07921.json.gz", UnsupportedFormatError),
("v2.4", "41ce492f04accef7931b84b8548a6ca717ffabb9bedc4f624de2d37a5345036c.json.gz", UnsupportedFormatError),
("v2.4", "515a6269965ccdf1005008e017ec87fafb97fd2464af1c393ad93b438f6f33fe.json.gz", UnsupportedFormatError),
("v2.4", "5d61700feabba201e1ba98df3c8210a3090c8c9f9adbf16cb3d1da3aaa2a9d96.json.gz", UnsupportedFormatError),
("v2.4", "5effaf6795932d8b36755f89f99ce7436421ea2bd1ed5bc55476530c1a22009f.json.gz", UnsupportedFormatError),
("v2.4", "873275144af88e9b95ea2c59ece39b8ce5a9d7fe09774b683050098ac965054d.json.gz", UnsupportedFormatError),
("v2.4", "8b9aaf4fad227cde7a7dabce7ba187b0b923301718d9d40de04bdd15c9b22905.json.gz", UnsupportedFormatError),
("v2.4", "b1c4aa078880c579961dc5ec899b2c2e08ae5db80b4263e4ca9607a68e2faef9.json.gz", UnsupportedFormatError),
("v2.4", "fb7ade52dc5a1d6128b9c217114a46d0089147610f99f5122face29e429a1e74.json.gz", None),
],
)
def test_cape_extractor(version: str, filename: str, exception: Type[BaseException]):
path = CAPE_DIR / version / filename
if exception:
with pytest.raises(exception):
_ = fixtures.get_cape_extractor(path)
else:
cr = fixtures.get_cape_extractor(path)
assert cr is not None
def test_cape_model_argument():
call = Call.model_validate_json(
"""

View File

@@ -949,7 +949,6 @@ def test_count_api():
features:
- or:
- count(api(kernel32.CreateFileA)): 1
- count(api(System.Convert::FromBase64String)): 1
"""
)
r = capa.rules.Rule.from_yaml(rule)
@@ -958,7 +957,6 @@ def test_count_api():
assert bool(r.evaluate({API("kernel32.CreateFile"): set()})) is False
assert bool(r.evaluate({API("CreateFile"): {ADDR1}})) is False
assert bool(r.evaluate({API("CreateFileA"): {ADDR1}})) is True
assert bool(r.evaluate({API("System.Convert::FromBase64String"): {ADDR1}})) is True
def test_invalid_number():

View File

@@ -40,10 +40,7 @@ def get_rule_path():
[
pytest.param("capa2yara.py", [get_rules_path()]),
pytest.param("capafmt.py", [get_rule_path()]),
# testing some variations of linter script
pytest.param("lint.py", ["-t", "create directory", get_rules_path()]),
# `create directory` rule has native and .NET example PEs
pytest.param("lint.py", ["--thorough", "-t", "create directory", get_rules_path()]),
# not testing lint.py as it runs regularly anyway
pytest.param("match-function-id.py", [get_file_path()]),
pytest.param("show-capabilities-by-function.py", [get_file_path()]),
pytest.param("show-features.py", [get_file_path()]),