Compare commits

..

44 Commits

Author SHA1 Message Date
mr-tz
481ae685e1 move sigs to capa directory 2024-01-18 12:31:55 +01:00
Moritz
12b628318d Merge pull request #1930 from mandiant/dependabot/pip/pytest-7.4.4
build(deps-dev): bump pytest from 7.4.3 to 7.4.4
2024-01-18 10:17:21 +01:00
Moritz
be30117030 Merge pull request #1931 from mandiant/dependabot/pip/ruff-0.1.13
build(deps-dev): bump ruff from 0.1.9 to 0.1.13
2024-01-18 10:17:05 +01:00
Capa Bot
6b41e02d63 Sync capa rules submodule 2024-01-17 08:22:01 +00:00
Capa Bot
d2ca130060 Sync capa rules submodule 2024-01-17 08:10:13 +00:00
Moritz
50dcf7ca20 Merge pull request #1932 from mandiant/update-lint-data-20241
update lint data
2024-01-17 09:07:48 +01:00
mr-tz
9bc04ec612 update data via script 2024-01-16 15:29:25 +01:00
dependabot[bot]
966976d97c build(deps-dev): bump ruff from 0.1.9 to 0.1.13
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.1.9 to 0.1.13.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.1.9...v0.1.13)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-15 14:08:54 +00:00
dependabot[bot]
05d7083890 build(deps-dev): bump pytest from 7.4.3 to 7.4.4
Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.3 to 7.4.4.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.4.3...7.4.4)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-15 14:08:38 +00:00
Willi Ballenthin
1dc72a3183 elf: detect linux via GCC .ident directives (#1928)
* elf: detect linux via GCC .ident directives

* changelog

* pep8
2024-01-11 16:15:26 +01:00
Capa Bot
efc26be196 Sync capa rules submodule 2024-01-11 14:20:33 +00:00
Willi Ballenthin
f3bc132565 render: show human readable flavor name (#1925) 2024-01-11 14:06:39 +01:00
Willi Ballenthin
ad46b33bb7 com: move database into python files (#1924)
* com: move database into python files

* com: pep8 and lints

* com: fix generated string feature type

* pyinstaller: remove reference to old assets directory
2024-01-11 14:06:24 +01:00
dependabot[bot]
9e5cc07a48 build(deps-dev): bump types-tabulate from 0.9.0.3 to 0.9.0.20240106 (#1923)
Bumps [types-tabulate](https://github.com/python/typeshed) from 0.9.0.3 to 0.9.0.20240106.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-tabulate
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-09 12:51:03 -07:00
Moritz
f4fecf43bf Merge pull request #1922 from mandiant/dependabot/pip/types-requests-2.31.0.20240106
build(deps-dev): bump types-requests from 2.31.0.10 to 2.31.0.20240106
2024-01-09 16:20:10 +01:00
Moritz
7426574741 Merge pull request #1921 from mandiant/dependabot/pip/flake8-7.0.0
build(deps-dev): bump flake8 from 6.1.0 to 7.0.0
2024-01-09 16:19:57 +01:00
Moritz
9ab7a24153 Merge pull request #1920 from mandiant/dependabot/pip/wcwidth-0.2.13
build(deps-dev): bump wcwidth from 0.2.12 to 0.2.13
2024-01-09 16:19:42 +01:00
Mike Hunhoff
f37b598010 fix: do not trim api names that include :: (#1897) 2024-01-08 10:59:24 -07:00
dependabot[bot]
5ca59634f3 build(deps-dev): bump types-requests from 2.31.0.10 to 2.31.0.20240106
Bumps [types-requests](https://github.com/python/typeshed) from 2.31.0.10 to 2.31.0.20240106.
- [Commits](https://github.com/python/typeshed/commits)

---
updated-dependencies:
- dependency-name: types-requests
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-08 14:46:29 +00:00
dependabot[bot]
42c1a307f3 build(deps-dev): bump flake8 from 6.1.0 to 7.0.0
Bumps [flake8](https://github.com/pycqa/flake8) from 6.1.0 to 7.0.0.
- [Commits](https://github.com/pycqa/flake8/compare/6.1.0...7.0.0)

---
updated-dependencies:
- dependency-name: flake8
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-08 14:46:23 +00:00
dependabot[bot]
ef5063171b build(deps-dev): bump wcwidth from 0.2.12 to 0.2.13
Bumps [wcwidth](https://github.com/jquast/wcwidth) from 0.2.12 to 0.2.13.
- [Release notes](https://github.com/jquast/wcwidth/releases)
- [Commits](https://github.com/jquast/wcwidth/compare/0.2.12...0.2.13)

---
updated-dependencies:
- dependency-name: wcwidth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-01-08 14:46:19 +00:00
Blas
7584e4a5e6 dotnet: emit enclosing class information for nested classes (#1913)
* Update helpers.py

* Update helpers.py

* TypeRef correction in helpers.py

* Fixed TypeRef to proper functionality

* Accounts for TypeRef updated tuple

* Corrected TypeDef tuple creation in helpers.py

* Update types.py

* Update types.py

* Create helpers_draft.py

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update helper functions, variables, and draft further implementations

* Update helpers.py

* Update types.py

* Directly access TypeDef and TypeRef tables

* Update helpers.py

* Update helpers.py

* Delete capa/features/extractors/dnfile/helpers_draft.py

* Update types.py

* Update dotnetfile.py

* Update types.py comment

* Clean extract_file_class_features in dotnetfile.py

* Cleaned up callers, var names, and other small items

* Update dotnetfile.py

* Clean up caller logic in dotnetfile.py

* Clean up callers and update helper logic in helpers.py

* Linter corrections for types.py

* Linter corrections for dotnetfile.py

* Linter corrections and caller functions cleanup for helpers.py

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update capa/features/extractors/dnfile/helpers.py

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>

* Update helpers.py

* Update dotnetfile.py

* Update tuple type in types.py

* Update dotnetfile.py

* Update return value annotations in helpers.py

* Linting update types.py

* Linting update dotnetfile.py

* Added unit tests to fixtures.py

* Update types.py

* Linting fix for types.py

* Update CHANGELOG.md

* Small changes to return types in helpers.py

---------

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
2024-01-05 10:09:38 -07:00
Capa Bot
62474c764a Sync capa-testfiles submodule 2024-01-05 14:24:40 +00:00
Capa Bot
1fc26b4f27 Sync capa rules submodule 2024-01-04 13:07:27 +00:00
Capa Bot
037a97381c Sync capa-testfiles submodule 2024-01-04 08:16:43 +00:00
Capa Bot
ef65f14260 Sync capa-testfiles submodule 2024-01-03 16:36:36 +00:00
Capa Bot
3214ecf0ee Sync capa rules submodule 2024-01-03 16:32:40 +00:00
dependabot[bot]
23c5e6797f build(deps-dev): bump ruff from 0.1.7 to 0.1.9 (#1915)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.1.7 to 0.1.9.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.1.7...v0.1.9)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-02 10:31:29 -07:00
dependabot[bot]
e940890c29 build(deps-dev): bump mypy from 1.7.1 to 1.8.0 (#1916)
Bumps [mypy](https://github.com/python/mypy) from 1.7.1 to 1.8.0.
- [Changelog](https://github.com/python/mypy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/python/mypy/compare/v1.7.1...v1.8.0)

---
updated-dependencies:
- dependency-name: mypy
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-02 09:05:49 -07:00
dependabot[bot]
21b76fc91e build(deps-dev): bump setuptools from 69.0.2 to 69.0.3 (#1917)
Bumps [setuptools](https://github.com/pypa/setuptools) from 69.0.2 to 69.0.3.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](https://github.com/pypa/setuptools/compare/v69.0.2...v69.0.3)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-02 09:05:27 -07:00
dependabot[bot]
05ef952129 build(deps-dev): bump black from 23.12.0 to 23.12.1 (#1918)
Bumps [black](https://github.com/psf/black) from 23.12.0 to 23.12.1.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.12.0...23.12.1)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-02 09:05:09 -07:00
Mike Hunhoff
22f4251ad6 ghidra: improve instruction string and bytes feature extraction (#1885)
* ghidra: improve instruction string and bytes feature extraction

* focus on data references only

* remove unneeded check
2023-12-24 18:24:54 -08:00
dependabot[bot]
92478d2469 build(deps-dev): bump black from 23.11.0 to 23.12.0 (#1911)
Bumps [black](https://github.com/psf/black) from 23.11.0 to 23.12.0.
- [Release notes](https://github.com/psf/black/releases)
- [Changelog](https://github.com/psf/black/blob/main/CHANGES.md)
- [Commits](https://github.com/psf/black/compare/23.11.0...23.12.0)

---
updated-dependencies:
- dependency-name: black
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 12:29:40 -07:00
dependabot[bot]
2aaba6ef16 build(deps-dev): bump isort from 5.13.0 to 5.13.2 (#1910)
Bumps [isort](https://github.com/pycqa/isort) from 5.13.0 to 5.13.2.
- [Release notes](https://github.com/pycqa/isort/releases)
- [Changelog](https://github.com/PyCQA/isort/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pycqa/isort/compare/5.13.0...5.13.2)

---
updated-dependencies:
- dependency-name: isort
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 11:04:49 -07:00
dependabot[bot]
8120fb796e build(deps-dev): bump flake8-bugbear from 23.11.26 to 23.12.2 (#1892)
Bumps [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear) from 23.11.26 to 23.12.2.
- [Release notes](https://github.com/PyCQA/flake8-bugbear/releases)
- [Commits](https://github.com/PyCQA/flake8-bugbear/compare/23.11.26...23.12.2)

---
updated-dependencies:
- dependency-name: flake8-bugbear
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 11:03:51 -07:00
dependabot[bot]
f3c38ae300 build(deps-dev): bump termcolor from 2.3.0 to 2.4.0 (#1891)
Bumps [termcolor](https://github.com/termcolor/termcolor) from 2.3.0 to 2.4.0.
- [Release notes](https://github.com/termcolor/termcolor/releases)
- [Changelog](https://github.com/termcolor/termcolor/blob/main/CHANGES.md)
- [Commits](https://github.com/termcolor/termcolor/compare/2.3.0...2.4.0)

---
updated-dependencies:
- dependency-name: termcolor
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-19 11:03:39 -07:00
Capa Bot
bf56ee0311 Sync capa rules submodule 2023-12-18 06:54:41 +00:00
Capa Bot
4a84660e76 Sync capa rules submodule 2023-12-18 06:54:07 +00:00
Mike Hunhoff
382c20cd58 ghidra: fix UnboundLocalError exception (#1881) 2023-12-15 17:03:43 -08:00
Mike Hunhoff
2dbac05716 ghidra: fix IndexError exception (#1879)
* ghidra: fix IndexError exception
2023-12-15 16:23:19 -08:00
dependabot[bot]
3f449f3c0f build(deps-dev): bump isort from 5.11.4 to 5.13.0 (#1900)
Bumps [isort](https://github.com/pycqa/isort) from 5.11.4 to 5.13.0.
- [Release notes](https://github.com/pycqa/isort/releases)
- [Changelog](https://github.com/PyCQA/isort/blob/main/CHANGELOG.md)
- [Commits](https://github.com/pycqa/isort/compare/5.11.4...5.13.0)

---
updated-dependencies:
- dependency-name: isort
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-13 15:56:24 +01:00
dependabot[bot]
51b63b465b build(deps-dev): bump ruff from 0.1.6 to 0.1.7 (#1902)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.1.6 to 0.1.7.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](https://github.com/astral-sh/ruff/compare/v0.1.6...v0.1.7)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-13 15:56:16 +01:00
dependabot[bot]
afb3426e96 build(deps-dev): bump pyinstaller from 6.2.0 to 6.3.0 (#1901)
Bumps [pyinstaller](https://github.com/pyinstaller/pyinstaller) from 6.2.0 to 6.3.0.
- [Release notes](https://github.com/pyinstaller/pyinstaller/releases)
- [Changelog](https://github.com/pyinstaller/pyinstaller/blob/develop/doc/CHANGES.rst)
- [Commits](https://github.com/pyinstaller/pyinstaller/compare/v6.2.0...v6.3.0)

---
updated-dependencies:
- dependency-name: pyinstaller
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-13 15:56:04 +01:00
Arnim Rupp
1d3ae1f216 Update capa2yara.py (#1904)
Extend unhandled strings to allow capa2yara to run through
2023-12-13 15:51:56 +01:00
36 changed files with 32462 additions and 661 deletions

View File

@@ -17,9 +17,8 @@ a = Analysis(
# when invoking pyinstaller from the project root,
# this gets invoked from the directory of the spec file,
# i.e. ./.github/pyinstaller
("../../assets", "assets"),
("../../rules", "rules"),
("../../sigs", "sigs"),
("../../capa/sigs", "sigs"),
("../../cache", "cache"),
# capa.render.default uses tabulate that depends on wcwidth.
# it seems wcwidth uses a json file `version.json`

View File

@@ -13,6 +13,7 @@
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6
- add com class/interface features #322 @Aayush-goel-04
- dotnet: emit enclosing class information for nested classes #1780 #1913 @bkojusner @mike-hunhoff
### Breaking Changes
@@ -22,7 +23,7 @@
- update freeze format to v3, adding support for dynamic analysis @williballenthin
- extractor: ignore DLL name for api features #1815 @mr-tz
### New Rules (34)
### New Rules (39)
- nursery/get-ntoskrnl-base-address @mr-tz
- host-interaction/network/connectivity/set-tcp-connection-state @johnk3r
@@ -57,6 +58,11 @@
- data-manipulation/compression/create-cabinet-on-windows michael.hunhoff@mandiant.com jakub.jozwiak@mandiant.com
- data-manipulation/compression/extract-cabinet-on-windows jakub.jozwiak@mandiant.com
- lib/create-file-decompression-interface-context-on-windows jakub.jozwiak@mandiant.com
- nursery/enumerate-files-in-dotnet moritz.raabe@mandiant.com anushka.virgaonkar@mandiant.com
- nursery/get-mac-address-in-dotnet moritz.raabe@mandiant.com michael.hunhoff@mandiant.com echernofsky@google.com
- nursery/get-current-process-command-line william.ballenthin@mandiant.com
- nursery/get-current-process-file-path william.ballenthin@mandiant.com
- nursery/hook-routines-via-dlsym-rtld_next william.ballenthin@mandiant.com
-
### Bug Fixes
@@ -64,10 +70,12 @@
- binja: improve function call site detection @xusheng6
- binja: use `binaryninja.load` to open files @xusheng6
- binja: bump binja version to 3.5 #1789 @xusheng6
- elf: better detect ELF OS via GCC .ident directives #1928 @williballenthin
### capa explorer IDA Pro plugin
### Development
- update ATT&CK/MBC data for linting #1932 @mr-tz
### Raw diffs
- [capa v6.1.0...master](https://github.com/mandiant/capa/compare/v6.1.0...master)
@@ -1626,4 +1634,4 @@ Download a standalone binary below and checkout the readme [here on GitHub](http
### Raw diffs
- [capa v1.0.0...v1.1.0](https://github.com/mandiant/capa/compare/v1.0.0...v1.1.0)
- [capa-rules v1.0.0...v1.1.0](https://github.com/mandiant/capa-rules/compare/v1.0.0...v1.1.0)
- [capa-rules v1.0.0...v1.1.0](https://github.com/mandiant/capa-rules/compare/v1.0.0...v1.1.0)

View File

@@ -2,7 +2,7 @@
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/flare-capa)](https://pypi.org/project/flare-capa)
[![Last release](https://img.shields.io/github/v/release/mandiant/capa)](https://github.com/mandiant/capa/releases)
[![Number of rules](https://img.shields.io/badge/rules-859-blue.svg)](https://github.com/mandiant/capa-rules)
[![Number of rules](https://img.shields.io/badge/rules-864-blue.svg)](https://github.com/mandiant/capa-rules)
[![CI status](https://github.com/mandiant/capa/workflows/CI/badge.svg)](https://github.com/mandiant/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
[![Downloads](https://img.shields.io/github/downloads/mandiant/capa/total)](https://github.com/mandiant/capa/releases)
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)

Binary file not shown.

Binary file not shown.

View File

@@ -177,34 +177,6 @@ class DNTokenOffsetAddress(Address):
return self.token + self.offset
class DexMethodAddress(int, Address):
def __new__(cls, offset: int):
return int.__new__(cls, offset)
def __repr__(self):
return f"DexMethodAddress(offset={hex(self)})"
def __str__(self) -> str:
return repr(self)
def __hash__(self):
return int.__hash__(self)
class DexClassAddress(int, Address):
def __new__(cls, offset: int):
return int.__new__(cls, offset)
def __repr__(self):
return f"DexClassAddress(offset={hex(self)})"
def __str__(self) -> str:
return repr(self)
def __hash__(self):
return int.__hash__(self)
class _NoAddress(Address):
def __eq__(self, other):
return True

View File

@@ -0,0 +1,36 @@
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from enum import Enum
from typing import Dict, List
from capa.helpers import assert_never
class ComType(Enum):
CLASS = "class"
INTERFACE = "interface"
COM_PREFIXES = {
ComType.CLASS: "CLSID_",
ComType.INTERFACE: "IID_",
}
def load_com_database(com_type: ComType) -> Dict[str, List[str]]:
# lazy load these python files since they are so large.
# that is, don't load them unless a COM feature is being handled.
import capa.features.com.classes
import capa.features.com.interfaces
if com_type == ComType.CLASS:
return capa.features.com.classes.COM_CLASSES
elif com_type == ComType.INTERFACE:
return capa.features.com.interfaces.COM_INTERFACES
else:
assert_never(com_type)

3696
capa/features/com/classes.py Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -409,9 +409,7 @@ ARCH_I386 = "i386"
ARCH_AMD64 = "amd64"
# dotnet
ARCH_ANY = "any"
# dex
ARCH_DALVIK = "dalvik"
VALID_ARCH = (ARCH_I386, ARCH_AMD64, ARCH_ANY, ARCH_DALVIK)
VALID_ARCH = (ARCH_I386, ARCH_AMD64, ARCH_ANY)
class Arch(Feature):
@@ -423,11 +421,10 @@ class Arch(Feature):
OS_WINDOWS = "windows"
OS_LINUX = "linux"
OS_MACOS = "macos"
OS_ANDROID = "android"
# dotnet
OS_ANY = "any"
VALID_OS = {os.value for os in capa.features.extractors.elf.OS}
VALID_OS.update({OS_WINDOWS, OS_LINUX, OS_MACOS, OS_ANY, OS_ANDROID})
VALID_OS.update({OS_WINDOWS, OS_LINUX, OS_MACOS, OS_ANY})
# internal only, not to be used in rules
OS_AUTO = "auto"
@@ -455,8 +452,7 @@ class OS(Feature):
FORMAT_PE = "pe"
FORMAT_ELF = "elf"
FORMAT_DOTNET = "dotnet"
FORMAT_DEX = "dex"
VALID_FORMAT = (FORMAT_PE, FORMAT_ELF, FORMAT_DOTNET, FORMAT_DEX)
VALID_FORMAT = (FORMAT_PE, FORMAT_ELF, FORMAT_DOTNET)
# internal only, not to be used in rules
FORMAT_AUTO = "auto"
FORMAT_SC32 = "sc32"
@@ -468,7 +464,6 @@ STATIC_FORMATS = {
FORMAT_PE,
FORMAT_ELF,
FORMAT_DOTNET,
FORMAT_DEX,
}
DYNAMIC_FORMATS = {
FORMAT_CAPE,

View File

@@ -24,11 +24,8 @@ from capa.features.common import (
OS_AUTO,
ARCH_ANY,
FORMAT_PE,
FORMAT_DEX,
FORMAT_ELF,
OS_ANDROID,
OS_WINDOWS,
ARCH_DALVIK,
FORMAT_FREEZE,
FORMAT_RESULT,
Arch,
@@ -44,7 +41,6 @@ logger = logging.getLogger(__name__)
# match strings for formats
MATCH_PE = b"MZ"
MATCH_ELF = b"\x7fELF"
MATCH_DEX = b"dex\n"
MATCH_RESULT = b'{"meta":'
MATCH_JSON_OBJECT = b'{"'
@@ -65,8 +61,6 @@ def extract_format(buf) -> Iterator[Tuple[Feature, Address]]:
yield Format(FORMAT_PE), NO_ADDRESS
elif buf.startswith(MATCH_ELF):
yield Format(FORMAT_ELF), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield Format(FORMAT_DEX), NO_ADDRESS
elif is_freeze(buf):
yield Format(FORMAT_FREEZE), NO_ADDRESS
elif buf.startswith(MATCH_RESULT):
@@ -102,9 +96,6 @@ def extract_arch(buf) -> Iterator[Tuple[Feature, Address]]:
yield Arch(arch), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield Arch(ARCH_DALVIK), NO_ADDRESS
else:
# we likely end up here:
# 1. handling shellcode, or
@@ -138,9 +129,6 @@ def extract_os(buf, os=OS_AUTO) -> Iterator[Tuple[Feature, Address]]:
yield OS(os), NO_ADDRESS
elif len(buf) > 8 and buf.startswith(MATCH_DEX) and buf[7] == 0x00:
yield OS(OS_ANDROID), NO_ADDRESS
else:
# we likely end up here:
# 1. handling shellcode, or

View File

@@ -1,421 +0,0 @@
# Copyright (C) 2023 Mandiant, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import struct
import logging
from typing import Set, Dict, List, Tuple, Iterator, Optional, TypedDict
from pathlib import Path
from dataclasses import dataclass
import dexparser.disassembler as disassembler
from dexparser import DEXParser, uleb128_value
from capa.features.file import Import, FunctionName
from capa.features.common import (
OS,
FORMAT_DEX,
OS_ANDROID,
ARCH_DALVIK,
Arch,
Class,
Format,
String,
Feature,
Namespace,
)
from capa.features.address import NO_ADDRESS, Address, DexClassAddress, DexMethodAddress, FileOffsetAddress
from capa.features.extractors.base_extractor import (
BBHandle,
InsnHandle,
SampleHashes,
FunctionHandle,
StaticFeatureExtractor,
)
logger = logging.getLogger(__name__)
# Reference: https://source.android.com/docs/core/runtime/dex-format
class DexProtoId(TypedDict):
shorty_idx: int
return_type_idx: int
param_off: int
class DexMethodId(TypedDict):
class_idx: int
proto_idx: int
name_idx: int
@dataclass
class DexAnalyzedMethod:
class_type: str
name: str
shorty_descriptor: str
return_type: str
parameters: List[str]
id_offset: int = 0
code_offset: int = 0
access_flags: Optional[int] = None
@property
def address(self):
# NOTE: Some methods do not have code, in that case we use the method_id offset
if self.has_code:
return self.code_offset
else:
return self.id_offset
@property
def has_code(self):
# NOTE: code_offset is zero if the method is abstract/native or not defined in a class
return self.code_offset != 0
@property
def has_definition(self):
# NOTE: access_flags is only known if the method is defined in a class
return self.access_flags is not None
@property
def qualified_name(self):
return f"{self.class_type}::{self.name}"
class DexFieldId(TypedDict):
class_idx: int
type_idx: int
name_idx: int
class DexClassDef(TypedDict):
class_idx: int
access_flags: int
superclass_idx: int
interfaces_off: int
source_file_idx: int
annotations_off: int
class_data_off: int
static_values_off: int
class DexFieldDef(TypedDict):
diff: int
access_flags: int
class DexMethodDef(TypedDict):
diff: int
access_flags: int
code_off: int
class DexClassData(TypedDict):
static_fields: List[DexFieldDef]
instance_fields: List[DexFieldDef]
direct_methods: List[DexMethodDef]
virtual_methods: List[DexMethodDef]
@dataclass
class DexAnalyzedClass:
offset: int
class_type: str
superclass_type: str
interfaces: List[str]
source_file: str
data: Optional[DexClassData]
class DexAnnotation(TypedDict):
visibility: int
type_idx_diff: int
size_diff: int
name_idx_diff: int
value_type: int
encoded_value: int
class DexAnalysis:
def get_strings(self):
# NOTE: Copied from dexparser, upstream later
strings: List[Tuple[int, bytes]] = []
string_ids_off = self.dex.header_data["string_ids_off"]
for i in range(self.dex.header_data["string_ids_size"]):
offset = struct.unpack("<L", self.dex.data[string_ids_off + (i * 4) : string_ids_off + (i * 4) + 4])[0]
c_size, size_offset = uleb128_value(self.dex.data, offset)
c_char = self.dex.data[offset + size_offset : offset + size_offset + c_size]
strings.append((offset, c_char))
return strings
def __init__(self, dex: DEXParser):
self.dex = dex
self.strings = self.get_strings()
self.strings_utf8: List[str] = []
for _, data in self.strings:
# NOTE: This is technically incorrect
# Reference: https://source.android.com/devices/tech/dalvik/dex-format#mutf-8
self.strings_utf8.append(data.decode("utf-8", errors="backslashreplace"))
self.type_ids: List[int] = dex.get_typeids()
self.method_ids: List[DexMethodId] = dex.get_methods()
self.proto_ids: List[DexProtoId] = dex.get_protoids()
self.field_ids: List[DexFieldId] = dex.get_fieldids()
self.class_defs: List[DexClassDef] = dex.get_classdef_data()
self._is_analyzing = True
self.used_classes: Set[str] = set()
self.classes = self._analyze_classes()
self.methods = self._analyze_methods()
self.methods_by_address: Dict[int, DexAnalyzedMethod] = {m.address: m for m in self.methods}
self.namespaces: Set[str] = set()
for class_type in self.used_classes:
idx = class_type.rfind(".")
if idx != -1:
self.namespaces.add(class_type[:idx])
for class_type in self.classes:
self.used_classes.remove(class_type)
# Only available after code analysis
self._is_analyzing = False
def analyze_code(self):
# Loop over the classes and analyze them
# self.classes: List[DexClass] = self.dex.get_class_data(offset=-1)
# self.annotations: List[DexAnnotation] = dex.get_annotations(offset=-1)
# self.static_values: List[int] = dex.get_static_values(offset=-1)
pass
def get_string(self, index: int) -> str:
return self.strings_utf8[index]
def _decode_descriptor(self, descriptor: str) -> str:
first = descriptor[0]
if first == "L":
pretty = descriptor[1:-1].replace("/", ".")
if self._is_analyzing:
self.used_classes.add(pretty)
elif first == "[":
pretty = self._decode_descriptor(descriptor[1:]) + "[]"
else:
pretty = disassembler.type_descriptor[first]
return pretty
def get_pretty_type(self, index: int) -> str:
if index == 0xFFFFFFFF:
return "<NO_INDEX>"
descriptor = self.get_string(self.type_ids[index])
return self._decode_descriptor(descriptor)
def _analyze_classes(self):
classes: Dict[str, DexAnalyzedClass] = {}
offset = self.dex.header_data["class_defs_off"]
for index, clazz in enumerate(self.class_defs):
class_type = self.get_pretty_type(clazz["class_idx"])
# Superclass
superclass_idx = clazz["superclass_idx"]
if superclass_idx != 0xFFFFFFFF:
superclass_type = self.get_pretty_type(superclass_idx)
else:
superclass_type = ""
# Interfaces
interfaces = []
interfaces_offset = clazz["interfaces_off"]
if interfaces_offset != 0:
size = struct.unpack("<L", self.dex.data[interfaces_offset : interfaces_offset + 4])[0]
for i in range(size):
type_idx = struct.unpack(
"<H", self.dex.data[interfaces_offset + 4 + i * 2 : interfaces_offset + 6 + i * 2]
)[0]
interface_type = self.get_pretty_type(type_idx)
interfaces.append(interface_type)
# Source file
source_file_idx = clazz["source_file_idx"]
if source_file_idx != 0xFFFFFFFF:
source_file = self.get_string(source_file_idx)
else:
source_file = ""
# Data
data_offset = clazz["class_data_off"]
if data_offset != 0:
data = self.dex.get_class_data(data_offset)
else:
data = None
classes[class_type] = DexAnalyzedClass(
offset=offset + index * 32,
class_type=class_type,
superclass_type=superclass_type,
interfaces=interfaces,
source_file=source_file,
data=data,
)
return classes
def _analyze_methods(self):
methods: List[DexAnalyzedMethod] = []
for method_id in self.method_ids:
proto = self.proto_ids[method_id["proto_idx"]]
parameters = []
param_off = proto["param_off"]
if param_off != 0:
size = struct.unpack("<L", self.dex.data[param_off : param_off + 4])[0]
for i in range(size):
type_idx = struct.unpack("<H", self.dex.data[param_off + 4 + i * 2 : param_off + 6 + i * 2])[0]
param_type = self.get_pretty_type(type_idx)
parameters.append(param_type)
methods.append(
DexAnalyzedMethod(
class_type=self.get_pretty_type(method_id["class_idx"]),
name=self.get_string(method_id["name_idx"]),
shorty_descriptor=self.get_string(proto["shorty_idx"]),
return_type=self.get_pretty_type(proto["return_type_idx"]),
parameters=parameters,
)
)
# Fill in the missing method data
for clazz in self.classes.values():
if clazz.data is None:
continue
for method_def in clazz.data["direct_methods"]:
diff = method_def["diff"]
methods[diff].access_flags = method_def["access_flags"]
methods[diff].code_offset = method_def["code_off"]
for method_def in clazz.data["virtual_methods"]:
diff = method_def["diff"]
methods[diff].access_flags = method_def["access_flags"]
methods[diff].code_offset = method_def["code_off"]
# Fill in the missing code offsets with fake data
offset = self.dex.header_data["method_ids_off"]
for index, method in enumerate(methods):
method.id_offset = offset + index * 8
return methods
def extract_file_features(self) -> Iterator[Tuple[Feature, Address]]:
yield Format(FORMAT_DEX), NO_ADDRESS
for i in range(len(self.strings)):
yield String(self.strings_utf8[i]), FileOffsetAddress(self.strings[i][0])
for method in self.methods:
if method.has_definition:
yield FunctionName(method.qualified_name), DexMethodAddress(method.address)
else:
yield Import(method.qualified_name), DexMethodAddress(method.address)
for namespace in self.namespaces:
yield Namespace(namespace), NO_ADDRESS
for clazz in self.classes.values():
yield Class(clazz.class_type), DexClassAddress(clazz.offset)
for class_type in self.used_classes:
yield Class(class_type), NO_ADDRESS
class DexFeatureExtractor(StaticFeatureExtractor):
def __init__(self, path: Path, *, code_analysis: bool):
super().__init__(hashes=SampleHashes.from_bytes(path.read_bytes()))
self.path: Path = path
self.code_analysis = code_analysis
self.dex = DEXParser(filedir=str(path))
self.analysis = DexAnalysis(self.dex)
# Perform more expensive code analysis only when requested
if self.code_analysis:
self.analysis.analyze_code()
def todo(self):
import inspect
message = "[DexparserFeatureExtractor:TODO] " + inspect.stack()[1].function
logger.debug(message)
def get_base_address(self):
return NO_ADDRESS
def extract_global_features(self) -> Iterator[Tuple[Feature, Address]]:
# These are hardcoded global features
yield Format(FORMAT_DEX), NO_ADDRESS
yield OS(OS_ANDROID), NO_ADDRESS
yield Arch(ARCH_DALVIK), NO_ADDRESS
def extract_file_features(self) -> Iterator[Tuple[Feature, Address]]:
yield from self.analysis.extract_file_features()
def is_library_function(self, addr: Address) -> bool:
assert isinstance(addr, DexMethodAddress)
method = self.analysis.methods_by_address[addr]
# exclude androidx/kotlin stuff?
return not method.has_definition
def get_function_name(self, addr: Address) -> str:
assert isinstance(addr, DexMethodAddress)
method = self.analysis.methods_by_address[addr]
return method.qualified_name
def get_functions(self) -> Iterator[FunctionHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
for method in self.analysis.methods:
yield FunctionHandle(DexMethodAddress(method.address), method)
def extract_function_features(self, f: FunctionHandle) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
method: DexAnalyzedMethod = f.inner
if method.has_code:
return self.todo()
yield
def get_basic_blocks(self, f: FunctionHandle) -> Iterator[BBHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
method: DexAnalyzedMethod = f.inner
if method.has_code:
return self.todo()
yield
def extract_basic_block_features(self, f: FunctionHandle, bb: BBHandle) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield
def get_instructions(self, f: FunctionHandle, bb: BBHandle) -> Iterator[InsnHandle]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield
def extract_insn_features(
self, f: FunctionHandle, bb: BBHandle, insn: InsnHandle
) -> Iterator[Tuple[Feature, Address]]:
if not self.code_analysis:
raise Exception("code analysis is disabled")
return self.todo()
yield

View File

@@ -131,10 +131,14 @@ def get_dotnet_managed_imports(pe: dnfile.dnPE) -> Iterator[DnType]:
# remove get_/set_ from MemberRef name
member_ref_name = member_ref_name[4:]
typerefnamespace, typerefname = resolve_nested_typeref_name(
member_ref.Class.row_index, member_ref.Class.row, pe
)
yield DnType(
token,
member_ref.Class.row.TypeName,
namespace=member_ref.Class.row.TypeNamespace,
typerefname,
namespace=typerefnamespace,
member=member_ref_name,
access=access,
)
@@ -188,6 +192,8 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
TypeNamespace (index into String heap)
MethodList (index into MethodDef table; it marks the first of a contiguous run of Methods owned by this Type)
"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
accessor_map: Dict[int, str] = {}
for methoddef, methoddef_access in get_dotnet_methoddef_property_accessors(pe):
accessor_map[methoddef] = methoddef_access
@@ -211,7 +217,9 @@ def get_dotnet_managed_methods(pe: dnfile.dnPE) -> Iterator[DnType]:
# remove get_/set_
method_name = method_name[4:]
yield DnType(token, typedef.TypeName, namespace=typedef.TypeNamespace, member=method_name, access=access)
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
yield DnType(token, typedefname, namespace=typedefnamespace, member=method_name, access=access)
def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
@@ -225,6 +233,8 @@ def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
TypeNamespace (index into String heap)
FieldList (index into Field table; it marks the first of a contiguous run of Fields owned by this Type)
"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
@@ -235,8 +245,11 @@ def get_dotnet_fields(pe: dnfile.dnPE) -> Iterator[DnType]:
if field.row is None:
logger.debug("TypeDef[0x%X] FieldList[0x%X] row is None", rid, idx)
continue
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
token: int = calculate_dotnet_token_value(field.table.number, field.row_index)
yield DnType(token, typedef.TypeName, namespace=typedef.TypeNamespace, member=field.row.Name)
yield DnType(token, typedefname, namespace=typedefnamespace, member=field.row.Name)
def get_dotnet_managed_method_bodies(pe: dnfile.dnPE) -> Iterator[Tuple[int, CilMethodBody]]:
@@ -300,19 +313,119 @@ def get_dotnet_unmanaged_imports(pe: dnfile.dnPE) -> Iterator[DnUnmanagedMethod]
yield DnUnmanagedMethod(token, module, method)
def get_dotnet_table_row(pe: dnfile.dnPE, table_index: int, row_index: int) -> Optional[dnfile.base.MDTableRow]:
assert pe.net is not None
assert pe.net.mdtables is not None
if row_index - 1 <= 0:
return None
try:
table = pe.net.mdtables.tables.get(table_index, [])
return table[row_index - 1]
except IndexError:
return None
def resolve_nested_typedef_name(
nested_class_table: dict, index: int, typedef: dnfile.mdtable.TypeDefRow, pe: dnfile.dnPE
) -> Tuple[str, Tuple[str, ...]]:
"""Resolves all nested TypeDef class names. Returns the namespace as a str and the nested TypeRef name as a tuple"""
if index in nested_class_table:
typedef_name = []
name = typedef.TypeName
# Append the current typedef name
typedef_name.append(name)
while nested_class_table[index] in nested_class_table:
# Iterate through the typedef table to resolve the nested name
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeDef.number, nested_class_table[index])
if table_row is None:
return typedef.TypeNamespace, tuple(typedef_name[::-1])
name = table_row.TypeName
typedef_name.append(name)
index = nested_class_table[index]
# Document the root enclosing details
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeDef.number, nested_class_table[index])
if table_row is None:
return typedef.TypeNamespace, tuple(typedef_name[::-1])
enclosing_name = table_row.TypeName
typedef_name.append(enclosing_name)
return table_row.TypeNamespace, tuple(typedef_name[::-1])
else:
return typedef.TypeNamespace, (typedef.TypeName,)
def resolve_nested_typeref_name(
index: int, typeref: dnfile.mdtable.TypeRefRow, pe: dnfile.dnPE
) -> Tuple[str, Tuple[str, ...]]:
"""Resolves all nested TypeRef class names. Returns the namespace as a str and the nested TypeRef name as a tuple"""
# If the ResolutionScope decodes to a typeRef type then it is nested
if isinstance(typeref.ResolutionScope.table, dnfile.mdtable.TypeRef):
typeref_name = []
name = typeref.TypeName
# Not appending the current typeref name to avoid potential duplicate
# Validate index
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeRef.number, index)
if table_row is None:
return typeref.TypeNamespace, (typeref.TypeName,)
while isinstance(table_row.ResolutionScope.table, dnfile.mdtable.TypeRef):
# Iterate through the typeref table to resolve the nested name
typeref_name.append(name)
name = table_row.TypeName
table_row = get_dotnet_table_row(pe, dnfile.mdtable.TypeRef.number, table_row.ResolutionScope.row_index)
if table_row is None:
return typeref.TypeNamespace, tuple(typeref_name[::-1])
# Document the root enclosing details
typeref_name.append(table_row.TypeName)
return table_row.TypeNamespace, tuple(typeref_name[::-1])
else:
return typeref.TypeNamespace, (typeref.TypeName,)
def get_dotnet_nested_class_table_index(pe: dnfile.dnPE) -> Dict[int, int]:
"""Build index for EnclosingClass based off the NestedClass row index in the nestedclass table"""
nested_class_table = {}
# Used to find nested classes in typedef
for _, nestedclass in iter_dotnet_table(pe, dnfile.mdtable.NestedClass.number):
assert isinstance(nestedclass, dnfile.mdtable.NestedClassRow)
nested_class_table[nestedclass.NestedClass.row_index] = nestedclass.EnclosingClass.row_index
return nested_class_table
def get_dotnet_types(pe: dnfile.dnPE) -> Iterator[DnType]:
"""get .NET types from TypeDef and TypeRef tables"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
typedef_token: int = calculate_dotnet_token_value(dnfile.mdtable.TypeDef.number, rid)
yield DnType(typedef_token, typedef.TypeName, namespace=typedef.TypeNamespace)
yield DnType(typedef_token, typedefname, namespace=typedefnamespace)
for rid, typeref in iter_dotnet_table(pe, dnfile.mdtable.TypeRef.number):
assert isinstance(typeref, dnfile.mdtable.TypeRefRow)
typerefnamespace, typerefname = resolve_nested_typeref_name(typeref.ResolutionScope.row_index, typeref, pe)
typeref_token: int = calculate_dotnet_token_value(dnfile.mdtable.TypeRef.number, rid)
yield DnType(typeref_token, typeref.TypeName, namespace=typeref.TypeNamespace)
yield DnType(typeref_token, typerefname, namespace=typerefnamespace)
def calculate_dotnet_token_value(table: int, rid: int) -> int:

View File

@@ -6,15 +6,17 @@
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from typing import Optional
from typing import Tuple, Optional
class DnType:
def __init__(self, token: int, class_: str, namespace: str = "", member: str = "", access: Optional[str] = None):
def __init__(
self, token: int, class_: Tuple[str, ...], namespace: str = "", member: str = "", access: Optional[str] = None
):
self.token: int = token
self.access: Optional[str] = access
self.namespace: str = namespace
self.class_: str = class_
self.class_: Tuple[str, ...] = class_
if member == ".ctor":
member = "ctor"
@@ -42,9 +44,13 @@ class DnType:
return str(self)
@staticmethod
def format_name(class_: str, namespace: str = "", member: str = ""):
def format_name(class_: Tuple[str, ...], namespace: str = "", member: str = ""):
if len(class_) > 1:
class_str = "/".join(class_) # Concat items in tuple, separated by a "/"
else:
class_str = "".join(class_) # Convert tuple to str
# like File::OpenRead
name: str = f"{class_}::{member}" if member else class_
name: str = f"{class_str}::{member}" if member else class_str
if namespace:
# like System.IO.File::OpenRead
name = f"{namespace}.{name}"

View File

@@ -38,8 +38,11 @@ from capa.features.extractors.dnfile.helpers import (
is_dotnet_mixed_mode,
get_dotnet_managed_imports,
get_dotnet_managed_methods,
resolve_nested_typedef_name,
resolve_nested_typeref_name,
calculate_dotnet_token_value,
get_dotnet_unmanaged_imports,
get_dotnet_nested_class_table_index,
)
logger = logging.getLogger(__name__)
@@ -92,19 +95,25 @@ def extract_file_namespace_features(pe: dnfile.dnPE, **kwargs) -> Iterator[Tuple
def extract_file_class_features(pe: dnfile.dnPE, **kwargs) -> Iterator[Tuple[Class, Address]]:
"""emit class features from TypeRef and TypeDef tables"""
nested_class_table = get_dotnet_nested_class_table_index(pe)
for rid, typedef in iter_dotnet_table(pe, dnfile.mdtable.TypeDef.number):
# emit internal .NET classes
assert isinstance(typedef, dnfile.mdtable.TypeDefRow)
typedefnamespace, typedefname = resolve_nested_typedef_name(nested_class_table, rid, typedef, pe)
token = calculate_dotnet_token_value(dnfile.mdtable.TypeDef.number, rid)
yield Class(DnType.format_name(typedef.TypeName, namespace=typedef.TypeNamespace)), DNTokenAddress(token)
yield Class(DnType.format_name(typedefname, namespace=typedefnamespace)), DNTokenAddress(token)
for rid, typeref in iter_dotnet_table(pe, dnfile.mdtable.TypeRef.number):
# emit external .NET classes
assert isinstance(typeref, dnfile.mdtable.TypeRefRow)
typerefnamespace, typerefname = resolve_nested_typeref_name(typeref.ResolutionScope.row_index, typeref, pe)
token = calculate_dotnet_token_value(dnfile.mdtable.TypeRef.number, rid)
yield Class(DnType.format_name(typeref.TypeName, namespace=typeref.TypeNamespace)), DNTokenAddress(token)
yield Class(DnType.format_name(typerefname, namespace=typerefnamespace)), DNTokenAddress(token)
def extract_file_os(**kwargs) -> Iterator[Tuple[OS, Address]]:

View File

@@ -108,6 +108,9 @@ class Shdr:
buf,
)
def get_name(self, elf: "ELF") -> str:
return elf.shstrtab.buf[self.name :].partition(b"\x00")[0].decode("ascii")
class ELF:
def __init__(self, f: BinaryIO):
@@ -120,6 +123,7 @@ class ELF:
self.e_phnum: int
self.e_shentsize: int
self.e_shnum: int
self.e_shstrndx: int
self.phbuf: bytes
self.shbuf: bytes
@@ -151,11 +155,15 @@ class ELF:
if self.bitness == 32:
e_phoff, e_shoff = struct.unpack_from(self.endian + "II", self.file_header, 0x1C)
self.e_phentsize, self.e_phnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x2A)
self.e_shentsize, self.e_shnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x2E)
self.e_shentsize, self.e_shnum, self.e_shstrndx = struct.unpack_from(
self.endian + "HHH", self.file_header, 0x2E
)
elif self.bitness == 64:
e_phoff, e_shoff = struct.unpack_from(self.endian + "QQ", self.file_header, 0x20)
self.e_phentsize, self.e_phnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x36)
self.e_shentsize, self.e_shnum = struct.unpack_from(self.endian + "HH", self.file_header, 0x3A)
self.e_shentsize, self.e_shnum, self.e_shstrndx = struct.unpack_from(
self.endian + "HHH", self.file_header, 0x3A
)
else:
raise NotImplementedError()
@@ -365,6 +373,10 @@ class ELF:
except ValueError:
continue
@property
def shstrtab(self) -> Shdr:
return self.parse_section_header(self.e_shstrndx)
@property
def linker(self):
PT_INTERP = 0x3
@@ -816,6 +828,48 @@ def guess_os_from_sh_notes(elf: ELF) -> Optional[OS]:
return None
def guess_os_from_ident_directive(elf: ELF) -> Optional[OS]:
# GCC inserts the GNU version via an .ident directive
# that gets stored in a section named ".comment".
# look at the version and recognize common OSes.
#
# assume the GCC version matches the target OS version,
# which I guess could be wrong during cross-compilation?
# therefore, don't rely on this if possible.
#
# https://stackoverflow.com/q/6263425
# https://gcc.gnu.org/onlinedocs/cpp/Other-Directives.html
SHT_PROGBITS = 0x1
for shdr in elf.section_headers:
if shdr.type != SHT_PROGBITS:
continue
if shdr.get_name(elf) != ".comment":
continue
try:
comment = shdr.buf.decode("utf-8")
except ValueError:
continue
if "GCC:" not in comment:
continue
logger.debug(".ident: %s", comment)
# these values come from our testfiles, like:
# rg -a "GCC: " tests/data/
if "Debian" in comment:
return OS.LINUX
elif "Ubuntu" in comment:
return OS.LINUX
elif "Red Hat" in comment:
return OS.LINUX
return None
def guess_os_from_linker(elf: ELF) -> Optional[OS]:
# search for recognizable dynamic linkers (interpreters)
# for example, on linux, we see file paths like: /lib64/ld-linux-x86-64.so.2
@@ -851,8 +905,10 @@ def guess_os_from_abi_versions_needed(elf: ELF) -> Optional[OS]:
return OS.HURD
else:
# we don't have any good guesses based on versions needed
pass
# in practice, Hurd isn't a common/viable OS,
# so this is almost certain to be Linux,
# so lets just make that guess.
return OS.LINUX
return None
@@ -927,6 +983,13 @@ def detect_elf_os(f) -> str:
logger.warning("Error guessing OS from section header notes: %s", e)
sh_notes_guess = None
try:
ident_guess = guess_os_from_ident_directive(elf)
logger.debug("guess: .ident: %s", ident_guess)
except Exception as e:
logger.warning("Error guessing OS from .ident directive: %s", e)
ident_guess = None
try:
linker_guess = guess_os_from_linker(elf)
logger.debug("guess: linker: %s", linker_guess)
@@ -960,6 +1023,10 @@ def detect_elf_os(f) -> str:
if osabi_guess:
ret = osabi_guess
elif ident_guess:
# we don't trust this too much due to non-cross-compilation assumptions
ret = ident_guess
elif ph_notes_guess:
ret = ph_notes_guess

View File

@@ -127,8 +127,10 @@ def extract_file_strings() -> Iterator[Tuple[Feature, Address]]:
"""extract ASCII and UTF-16 LE strings"""
for block in currentProgram().getMemory().getBlocks(): # type: ignore [name-defined] # noqa: F821
if block.isInitialized():
p_bytes = capa.features.extractors.ghidra.helpers.get_block_bytes(block)
if not block.isInitialized():
continue
p_bytes = capa.features.extractors.ghidra.helpers.get_block_bytes(block)
for s in capa.features.extractors.strings.extract_ascii_strings(p_bytes):
offset = block.getStart().getOffset() + s.offset

View File

@@ -275,3 +275,27 @@ def dereference_ptr(insn: ghidra.program.database.code.InstructionDB):
return addr
else:
return to_deref
def find_data_references_from_insn(insn, max_depth: int = 10):
"""yield data references from given instruction"""
for reference in insn.getReferencesFrom():
if not reference.getReferenceType().isData():
# only care about data references
continue
to_addr = reference.getToAddress()
for _ in range(max_depth - 1):
data = getDataAt(to_addr) # type: ignore [name-defined] # noqa: F821
if data and data.isPointer():
ptr_value = data.getValue()
if ptr_value is None:
break
to_addr = ptr_value
else:
break
yield to_addr

View File

@@ -23,6 +23,9 @@ from capa.features.extractors.base_extractor import BBHandle, InsnHandle, Functi
SECURITY_COOKIE_BYTES_DELTA = 0x40
OPERAND_TYPE_DYNAMIC_ADDRESS = OperandType.DYNAMIC | OperandType.ADDRESS
def get_imports(ctx: Dict[str, Any]) -> Dict[int, Any]:
"""Populate the import cache for this context"""
if "imports_cache" not in ctx:
@@ -82,7 +85,7 @@ def check_for_api_call(
if not capa.features.extractors.ghidra.helpers.check_addr_for_api(addr_ref, fakes, imports, externs):
return
ref = addr_ref.getOffset()
elif ref_type == OperandType.DYNAMIC | OperandType.ADDRESS or ref_type == OperandType.DYNAMIC:
elif ref_type == OPERAND_TYPE_DYNAMIC_ADDRESS or ref_type == OperandType.DYNAMIC:
return # cannot resolve dynamics statically
else:
# pure address does not need to get dereferenced/ handled
@@ -195,46 +198,39 @@ def extract_insn_offset_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandl
if insn.getMnemonicString().startswith("LEA"):
return
# ignore any stack references
if not capa.features.extractors.ghidra.helpers.is_stack_referenced(insn):
# Ghidra stores operands in 2D arrays if they contain offsets
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.DYNAMIC: # e.g. [esi + 4]
# manual extraction, since the default api calls only work on the 1st dimension of the array
op_objs = insn.getOpObjects(i)
if isinstance(op_objs[-1], ghidra.program.model.scalar.Scalar):
op_off = op_objs[-1].getValue()
yield Offset(op_off), ih.address
yield OperandOffset(i, op_off), ih.address
else:
yield Offset(0), ih.address
yield OperandOffset(i, 0), ih.address
if capa.features.extractors.ghidra.helpers.is_stack_referenced(insn):
# ignore stack references
return
# Ghidra stores operands in 2D arrays if they contain offsets
for i in range(insn.getNumOperands()):
if insn.getOperandType(i) == OperandType.DYNAMIC: # e.g. [esi + 4]
# manual extraction, since the default api calls only work on the 1st dimension of the array
op_objs = insn.getOpObjects(i)
if not op_objs:
continue
if isinstance(op_objs[-1], ghidra.program.model.scalar.Scalar):
op_off = op_objs[-1].getValue()
else:
op_off = 0
yield Offset(op_off), ih.address
yield OperandOffset(i, op_off), ih.address
def extract_insn_bytes_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandle) -> Iterator[Tuple[Feature, Address]]:
"""
parse referenced byte sequences
example:
push offset iid_004118d4_IShellLinkA ; riid
"""
insn: ghidra.program.database.code.InstructionDB = ih.inner
if capa.features.extractors.ghidra.helpers.is_call_or_jmp(insn):
return
ref = insn.getAddress() # init to insn addr
for i in range(insn.getNumOperands()):
if OperandType.isAddress(insn.getOperandType(i)):
ref = insn.getAddress(i) # pulls pointer if there is one
if ref != insn.getAddress(): # bail out if there's no pointer
ghidra_dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if (
ghidra_dat and not ghidra_dat.hasStringValue() and not ghidra_dat.isPointer()
): # avoid if the data itself is a pointer
extracted_bytes = capa.features.extractors.ghidra.helpers.get_bytes(ref, MAX_BYTES_FEATURE_SIZE)
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
if data and not data.hasStringValue():
extracted_bytes = capa.features.extractors.ghidra.helpers.get_bytes(addr, MAX_BYTES_FEATURE_SIZE)
if extracted_bytes and not capa.features.extractors.helpers.all_zeros(extracted_bytes):
# don't extract byte features for obvious strings
yield Bytes(extracted_bytes), ih.address
@@ -245,24 +241,10 @@ def extract_insn_string_features(fh: FunctionHandle, bb: BBHandle, ih: InsnHandl
example:
push offset aAcr ; "ACR > "
"""
insn: ghidra.program.database.code.InstructionDB = ih.inner
dyn_addr = OperandType.DYNAMIC | OperandType.ADDRESS
ref = insn.getAddress()
for i in range(insn.getNumOperands()):
if OperandType.isScalarAsAddress(insn.getOperandType(i)):
ref = insn.getAddress(i)
# strings are also referenced dynamically via pointers & arrays, so we need to deref them
if insn.getOperandType(i) == dyn_addr:
ref = insn.getAddress(i)
dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if dat and dat.isPointer():
ref = dat.getValue()
if ref != insn.getAddress():
ghidra_dat = getDataAt(ref) # type: ignore [name-defined] # noqa: F821
if ghidra_dat and ghidra_dat.hasStringValue():
yield String(ghidra_dat.getValue()), ih.address
for addr in capa.features.extractors.ghidra.helpers.find_data_references_from_insn(ih.inner):
data = getDataAt(addr) # type: ignore [name-defined] # noqa: F821
if data and data.hasStringValue():
yield String(data.getValue()), ih.address
def extract_insn_mnemonic_features(
@@ -359,7 +341,7 @@ def extract_insn_cross_section_cflow(
ref = capa.features.extractors.ghidra.helpers.dereference_ptr(insn)
if capa.features.extractors.ghidra.helpers.check_addr_for_api(ref, fakes, imports, externs):
return
elif ref_type == OperandType.DYNAMIC | OperandType.ADDRESS or ref_type == OperandType.DYNAMIC:
elif ref_type == OPERAND_TYPE_DYNAMIC_ADDRESS or ref_type == OperandType.DYNAMIC:
return # cannot resolve dynamics statically
else:
# pure address does not need to get dereferenced/ handled

View File

@@ -53,8 +53,6 @@ class AddressType(str, Enum):
FILE = "file"
DN_TOKEN = "dn token"
DN_TOKEN_OFFSET = "dn token offset"
DEX_METHOD_INDEX = "dex method index"
DEX_CLASS_INDEX = "dex class index"
PROCESS = "process"
THREAD = "thread"
CALL = "call"
@@ -82,12 +80,6 @@ class Address(HashableModel):
elif isinstance(a, capa.features.address.DNTokenOffsetAddress):
return cls(type=AddressType.DN_TOKEN_OFFSET, value=(a.token, a.offset))
elif isinstance(a, capa.features.address.DexMethodAddress):
return cls(type=AddressType.DEX_METHOD_INDEX, value=int(a))
elif isinstance(a, capa.features.address.DexClassAddress):
return cls(type=AddressType.DEX_CLASS_INDEX, value=int(a))
elif isinstance(a, capa.features.address.ProcessAddress):
return cls(type=AddressType.PROCESS, value=(a.ppid, a.pid))
@@ -133,14 +125,6 @@ class Address(HashableModel):
assert isinstance(offset, int)
return capa.features.address.DNTokenOffsetAddress(token, offset)
elif self.type is AddressType.DEX_METHOD_INDEX:
assert isinstance(self.value, int)
return capa.features.address.DexMethodAddress(self.value)
elif self.type is AddressType.DEX_CLASS_INDEX:
assert isinstance(self.value, int)
return capa.features.address.DexClassAddress(self.value)
elif self.type is AddressType.PROCESS:
assert isinstance(self.value, tuple)
ppid, pid = self.value

View File

@@ -45,7 +45,6 @@ import capa.render.result_document
import capa.render.result_document as rdoc
import capa.features.extractors.common
import capa.features.extractors.pefile
import capa.features.extractors.dexfile
import capa.features.extractors.elffile
import capa.features.extractors.dotnetfile
import capa.features.extractors.base_extractor
@@ -73,7 +72,6 @@ from capa.features.common import (
OS_LINUX,
OS_MACOS,
FORMAT_PE,
FORMAT_DEX,
FORMAT_ELF,
OS_WINDOWS,
FORMAT_AUTO,
@@ -216,7 +214,7 @@ def get_default_signatures() -> List[Path]:
"""
compute a list of file system paths to the default FLIRT signatures.
"""
sigs_path = get_default_root() / "sigs"
sigs_path = get_default_root() / "capa" / "sigs"
logger.debug("signatures path: %s", sigs_path)
ret = []
@@ -309,11 +307,6 @@ def get_extractor(
return capa.features.extractors.dnfile.extractor.DnfileFeatureExtractor(path)
elif format_ == FORMAT_DEX:
import capa.features.extractors.dexfile
return capa.features.extractors.dexfile.DexFeatureExtractor(path, code_analysis=True)
elif backend == BACKEND_BINJA:
from capa.features.extractors.binja.find_binja_api import find_binja_path
@@ -382,9 +375,6 @@ def get_file_extractors(sample: Path, format_: str) -> List[FeatureExtractor]:
elif format_ == capa.features.common.FORMAT_ELF:
file_extractors.append(capa.features.extractors.elffile.ElfFeatureExtractor(sample))
elif format_ == capa.features.common.FORMAT_DEX:
file_extractors.append(capa.features.extractors.dexfile.DexFeatureExtractor(sample, code_analysis=False))
elif format_ == FORMAT_CAPE:
report = json.load(Path(sample).open(encoding="utf-8"))
file_extractors.append(capa.features.extractors.cape.extractor.CapeExtractor.from_report(report))
@@ -807,7 +797,6 @@ def install_common_args(parser, wanted=None):
(FORMAT_PE, "Windows PE file"),
(FORMAT_DOTNET, ".NET PE file"),
(FORMAT_ELF, "Executable and Linkable Format"),
(FORMAT_DEX, "Android DEX file"),
(FORMAT_SC32, "32-bit shellcode"),
(FORMAT_SC64, "64-bit shellcode"),
(FORMAT_CAPE, "CAPE sandbox report"),
@@ -973,7 +962,7 @@ def handle_common_args(args):
)
logger.debug("-" * 80)
sigs_path = get_default_root() / "sigs"
sigs_path = get_default_root() / "capa" / "sigs"
if not sigs_path.exists():
logger.error(

View File

@@ -33,7 +33,7 @@ def render_meta(doc: rd.ResultDocument, ostream: StringIO):
(width("md5", 22), width(doc.meta.sample.md5, 82)),
("sha1", doc.meta.sample.sha1),
("sha256", doc.meta.sample.sha256),
("analysis", doc.meta.flavor),
("analysis", doc.meta.flavor.value),
("os", doc.meta.analysis.os),
("format", doc.meta.analysis.format),
("arch", doc.meta.analysis.arch),

View File

@@ -54,12 +54,6 @@ def format_address(address: frz.Address) -> str:
assert isinstance(token, int)
assert isinstance(offset, int)
return f"token({capa.helpers.hex(token)})+{capa.helpers.hex(offset)}"
elif address.type == frz.AddressType.DEX_METHOD_INDEX:
assert isinstance(address.value, int)
return f"method({capa.helpers.hex(address.value)})"
elif address.type == frz.AddressType.DEX_CLASS_INDEX:
assert isinstance(address.value, int)
return f"class({capa.helpers.hex(address.value)})"
elif address.type == frz.AddressType.PROCESS:
assert isinstance(address.value, tuple)
ppid, pid = address.value

View File

@@ -8,8 +8,6 @@
import io
import re
import gzip
import json
import uuid
import codecs
import logging
@@ -39,11 +37,13 @@ import capa.perf
import capa.engine as ceng
import capa.features
import capa.optimizer
import capa.features.com
import capa.features.file
import capa.features.insn
import capa.features.common
import capa.features.basicblock
from capa.engine import Statement, FeatureSet
from capa.features.com import ComType
from capa.features.common import MAX_BYTES_FEATURE_SIZE, Feature
from capa.features.address import Address
@@ -328,42 +328,16 @@ def ensure_feature_valid_for_scopes(scopes: Scopes, feature: Union[Feature, Stat
raise InvalidRule(f"feature {feature} not supported for scopes {scopes}")
class ComType(Enum):
CLASS = "class"
INTERFACE = "interface"
# COM data source https://github.com/stevemk14ebr/COM-Code-Helper/tree/master
VALID_COM_TYPES = {
ComType.CLASS: {"db_path": "assets/classes.json.gz", "prefix": "CLSID_"},
ComType.INTERFACE: {"db_path": "assets/interfaces.json.gz", "prefix": "IID_"},
}
@lru_cache(maxsize=None)
def load_com_database(com_type: ComType) -> Dict[str, List[str]]:
com_db_path: Path = capa.main.get_default_root() / VALID_COM_TYPES[com_type]["db_path"]
if not com_db_path.exists():
raise IOError(f"COM database path '{com_db_path}' does not exist or cannot be accessed")
try:
with gzip.open(com_db_path, "rb") as gzfile:
return json.loads(gzfile.read().decode("utf-8"))
except Exception as e:
raise IOError(f"Error loading COM database from '{com_db_path}'") from e
def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Or:
com_db = load_com_database(com_type)
guid_strings: Optional[List[str]] = com_db.get(com_name)
if guid_strings is None or len(guid_strings) == 0:
def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Statement:
com_db = capa.features.com.load_com_database(com_type)
guids: Optional[List[str]] = com_db.get(com_name)
if not guids:
logger.error(" %s doesn't exist in COM %s database", com_name, com_type)
raise InvalidRule(f"'{com_name}' doesn't exist in COM {com_type} database")
com_features: List = []
for guid_string in guid_strings:
hex_chars = guid_string.replace("-", "")
com_features: List[Feature] = []
for guid in guids:
hex_chars = guid.replace("-", "")
h = [hex_chars[i : i + 2] for i in range(0, len(hex_chars), 2)]
reordered_hex_pairs = [
h[3],
@@ -384,9 +358,10 @@ def translate_com_feature(com_name: str, com_type: ComType) -> ceng.Or:
h[15],
]
guid_bytes = bytes.fromhex("".join(reordered_hex_pairs))
prefix = VALID_COM_TYPES[com_type]["prefix"]
com_features.append(capa.features.common.StringFactory(guid_string, f"{prefix+com_name} as GUID string"))
com_features.append(capa.features.common.Bytes(guid_bytes, f"{prefix+com_name} as bytes"))
prefix = capa.features.com.COM_PREFIXES[com_type]
symbol = prefix + com_name
com_features.append(capa.features.common.String(guid, f"{symbol} as GUID string"))
com_features.append(capa.features.common.Bytes(guid_bytes, f"{symbol} as bytes"))
return ceng.Or(com_features)
@@ -602,7 +577,9 @@ def trim_dll_part(api: str) -> str:
# kernel32.CreateFileA
if api.count(".") == 1:
api = api.split(".")[1]
if "::" not in api:
# skip System.Convert::FromBase64String
api = api.split(".")[1]
return api
@@ -822,11 +799,13 @@ def build_statements(d, scopes: Scopes):
return feature
elif key.startswith("com/"):
com_type = str(key[len("com/") :]).upper()
if com_type not in [item.name for item in ComType]:
raise InvalidRule(f"unexpected COM type: {com_type}")
com_type_name = str(key[len("com/") :])
try:
com_type = ComType(com_type_name)
except ValueError:
raise InvalidRule(f"unexpected COM type: {com_type_name}")
value, description = parse_description(d[key], key, d.get("description"))
return translate_com_feature(value, ComType[com_type])
return translate_com_feature(value, com_type)
else:
Feature = parse_feature(key)

View File

@@ -1,4 +1,4 @@
# capa/sigs
# capa FLIRT signatures
This directory contains FLIRT signatures that capa uses to identify library functions.
Typically, capa will ignore library functions, which reduces false positives and improves runtime.

View File

@@ -35,12 +35,6 @@ $ unzip v4.0.0.zip
$ capa -r /path/to/capa-rules suspicious.exe
```
This technique also doesn't set up the default library identification [signatures](https://github.com/mandiant/capa/tree/master/sigs). You can pass the signature directory using the `-s` argument.
For example, to run capa with both a rule path and a signature path:
```console
$ capa -s /path/to/capa-sigs suspicious.exe
```
Alternatively, see Method 3 below.
### 2. Use capa

View File

@@ -36,8 +36,8 @@ dependencies = [
"pyyaml==6.0.1",
"tabulate==0.9.0",
"colorama==0.4.6",
"termcolor==2.3.0",
"wcwidth==0.2.12",
"termcolor==2.4.0",
"wcwidth==0.2.13",
"ida-settings==2.1.0",
"viv-utils[flirt]==0.7.9",
"halo==0.0.31",
@@ -50,7 +50,6 @@ dependencies = [
"dncil==1.0.2",
"pydantic==2.4.0",
"protobuf==4.23.4",
"dexparser==1.2.0",
]
dynamic = ["version"]
@@ -63,12 +62,12 @@ packages = ["capa"]
[project.optional-dependencies]
dev = [
"pre-commit==3.5.0",
"pytest==7.4.3",
"pytest==7.4.4",
"pytest-sugar==0.9.7",
"pytest-instafail==0.5.0",
"pytest-cov==4.1.0",
"flake8==6.1.0",
"flake8-bugbear==23.11.26",
"flake8==7.0.0",
"flake8-bugbear==23.12.2",
"flake8-encodings==0.5.1",
"flake8-comprehensions==3.14.0",
"flake8-logging-format==0.9.0",
@@ -78,10 +77,10 @@ dev = [
"flake8-simplify==0.21.0",
"flake8-use-pathlib==0.3.0",
"flake8-copyright==0.2.4",
"ruff==0.1.6",
"black==23.11.0",
"isort==5.11.4",
"mypy==1.7.1",
"ruff==0.1.13",
"black==23.12.1",
"isort==5.13.2",
"mypy==1.8.0",
"psutil==5.9.2",
"stix2==3.0.1",
"requests==2.31.0",
@@ -90,15 +89,15 @@ dev = [
"types-backports==0.1.3",
"types-colorama==0.4.15.11",
"types-PyYAML==6.0.8",
"types-tabulate==0.9.0.3",
"types-tabulate==0.9.0.20240106",
"types-termcolor==1.1.4",
"types-psutil==5.8.23",
"types_requests==2.31.0.10",
"types_requests==2.31.0.20240106",
"types-protobuf==4.23.0.3",
]
build = [
"pyinstaller==6.2.0",
"setuptools==69.0.2",
"pyinstaller==6.3.0",
"setuptools==69.0.3",
"build==1.0.3"
]

2
rules

Submodule rules updated: 57b3911a72...9161f73a78

View File

@@ -61,7 +61,22 @@ var_names = ["".join(letters) for letters in itertools.product(string.ascii_lowe
# this have to be the internal names used by capa.py which are sometimes different to the ones written out in the rules, e.g. "2 or more" is "Some", count is Range
unsupported = ["characteristic", "mnemonic", "offset", "subscope", "Range"]
unsupported = [
"characteristic",
"mnemonic",
"offset",
"subscope",
"Range",
"os",
"property",
"format",
"class",
"operand[0].number",
"operand[1].number",
"substring",
"arch",
"namespace",
]
# further idea: shorten this list, possible stuff:
# - 2 or more strings: e.g.
# -- https://github.com/mandiant/capa-rules/blob/master/collection/file-managers/gather-direct-ftp-information.yml
@@ -90,8 +105,7 @@ condition_header = """
condition_rule = """
private rule capa_pe_file : CAPA {
meta:
description = "match in PE files. used by all further CAPA rules"
author = "Arnim Rupp"
description = "Match in PE files. Used by other CAPA rules"
condition:
uint16be(0) == 0x4d5a
or uint16be(0) == 0x558b

View File

@@ -43,7 +43,8 @@
"T1598": "Phishing for Information",
"T1598.001": "Phishing for Information::Spearphishing Service",
"T1598.002": "Phishing for Information::Spearphishing Attachment",
"T1598.003": "Phishing for Information::Spearphishing Link"
"T1598.003": "Phishing for Information::Spearphishing Link",
"T1598.004": "Phishing for Information::Spearphishing Voice"
},
"Resource Development": {
"T1583": "Acquire Infrastructure",
@@ -111,7 +112,9 @@
"T1566": "Phishing",
"T1566.001": "Phishing::Spearphishing Attachment",
"T1566.002": "Phishing::Spearphishing Link",
"T1566.003": "Phishing::Spearphishing via Service"
"T1566.003": "Phishing::Spearphishing via Service",
"T1566.004": "Phishing::Spearphishing Voice",
"T1659": "Content Injection"
},
"Execution": {
"T1047": "Windows Management Instrumentation",
@@ -175,6 +178,7 @@
"T1098.003": "Account Manipulation::Additional Cloud Roles",
"T1098.004": "Account Manipulation::SSH Authorized Keys",
"T1098.005": "Account Manipulation::Device Registration",
"T1098.006": "Account Manipulation::Additional Container Cluster Roles",
"T1133": "External Remote Services",
"T1136": "Create Account",
"T1136.001": "Create Account::Local Account",
@@ -264,7 +268,8 @@
"T1574.010": "Hijack Execution Flow::Services File Permissions Weakness",
"T1574.011": "Hijack Execution Flow::Services Registry Permissions Weakness",
"T1574.012": "Hijack Execution Flow::COR_PROFILER",
"T1574.013": "Hijack Execution Flow::KernelCallbackTable"
"T1574.013": "Hijack Execution Flow::KernelCallbackTable",
"T1653": "Power Settings"
},
"Privilege Escalation": {
"T1037": "Boot or Logon Initialization Scripts",
@@ -298,6 +303,13 @@
"T1078.002": "Valid Accounts::Domain Accounts",
"T1078.003": "Valid Accounts::Local Accounts",
"T1078.004": "Valid Accounts::Cloud Accounts",
"T1098": "Account Manipulation",
"T1098.001": "Account Manipulation::Additional Cloud Credentials",
"T1098.002": "Account Manipulation::Additional Email Delegate Permissions",
"T1098.003": "Account Manipulation::Additional Cloud Roles",
"T1098.004": "Account Manipulation::SSH Authorized Keys",
"T1098.005": "Account Manipulation::Device Registration",
"T1098.006": "Account Manipulation::Additional Container Cluster Roles",
"T1134": "Access Token Manipulation",
"T1134.001": "Access Token Manipulation::Token Impersonation/Theft",
"T1134.002": "Access Token Manipulation::Create Process with Token",
@@ -349,6 +361,7 @@
"T1548.002": "Abuse Elevation Control Mechanism::Bypass User Account Control",
"T1548.003": "Abuse Elevation Control Mechanism::Sudo and Sudo Caching",
"T1548.004": "Abuse Elevation Control Mechanism::Elevated Execution with Prompt",
"T1548.005": "Abuse Elevation Control Mechanism::Temporary Elevated Cloud Access",
"T1574": "Hijack Execution Flow",
"T1574.001": "Hijack Execution Flow::DLL Search Order Hijacking",
"T1574.002": "Hijack Execution Flow::DLL Side-Loading",
@@ -379,6 +392,7 @@
"T1027.009": "Obfuscated Files or Information::Embedded Payloads",
"T1027.010": "Obfuscated Files or Information::Command Obfuscation",
"T1027.011": "Obfuscated Files or Information::Fileless Storage",
"T1027.012": "Obfuscated Files or Information::LNK Icon Smuggling",
"T1036": "Masquerading",
"T1036.001": "Masquerading::Invalid Code Signature",
"T1036.002": "Masquerading::Right-to-Left Override",
@@ -388,6 +402,7 @@
"T1036.006": "Masquerading::Space after Filename",
"T1036.007": "Masquerading::Double File Extension",
"T1036.008": "Masquerading::Masquerade File Type",
"T1036.009": "Masquerading::Break Process Trees",
"T1055": "Process Injection",
"T1055.001": "Process Injection::Dynamic-link Library Injection",
"T1055.002": "Process Injection::Portable Executable Injection",
@@ -475,6 +490,7 @@
"T1548.002": "Abuse Elevation Control Mechanism::Bypass User Account Control",
"T1548.003": "Abuse Elevation Control Mechanism::Sudo and Sudo Caching",
"T1548.004": "Abuse Elevation Control Mechanism::Elevated Execution with Prompt",
"T1548.005": "Abuse Elevation Control Mechanism::Temporary Elevated Cloud Access",
"T1550": "Use Alternate Authentication Material",
"T1550.001": "Use Alternate Authentication Material::Application Access Token",
"T1550.002": "Use Alternate Authentication Material::Pass the Hash",
@@ -503,10 +519,11 @@
"T1562.004": "Impair Defenses::Disable or Modify System Firewall",
"T1562.006": "Impair Defenses::Indicator Blocking",
"T1562.007": "Impair Defenses::Disable or Modify Cloud Firewall",
"T1562.008": "Impair Defenses::Disable Cloud Logs",
"T1562.008": "Impair Defenses::Disable or Modify Cloud Logs",
"T1562.009": "Impair Defenses::Safe Mode Boot",
"T1562.010": "Impair Defenses::Downgrade Attack",
"T1562.011": "Impair Defenses::Spoof Security Alerting",
"T1562.012": "Impair Defenses::Disable or Modify Linux Audit System",
"T1564": "Hide Artifacts",
"T1564.001": "Hide Artifacts::Hidden Files and Directories",
"T1564.002": "Hide Artifacts::Hidden Users",
@@ -518,6 +535,7 @@
"T1564.008": "Hide Artifacts::Email Hiding Rules",
"T1564.009": "Hide Artifacts::Resource Forking",
"T1564.010": "Hide Artifacts::Process Argument Spoofing",
"T1564.011": "Hide Artifacts::Ignore Process Interrupts",
"T1574": "Hijack Execution Flow",
"T1574.001": "Hijack Execution Flow::DLL Search Order Hijacking",
"T1574.002": "Hijack Execution Flow::DLL Side-Loading",
@@ -536,6 +554,7 @@
"T1578.002": "Modify Cloud Compute Infrastructure::Create Cloud Instance",
"T1578.003": "Modify Cloud Compute Infrastructure::Delete Cloud Instance",
"T1578.004": "Modify Cloud Compute Infrastructure::Revert Cloud Instance",
"T1578.005": "Modify Cloud Compute Infrastructure::Modify Cloud Compute Configurations",
"T1599": "Network Boundary Bridging",
"T1599.001": "Network Boundary Bridging::Network Address Translation Traversal",
"T1600": "Weaken Encryption",
@@ -548,7 +567,8 @@
"T1612": "Build Image on Host",
"T1620": "Reflective Code Loading",
"T1622": "Debugger Evasion",
"T1647": "Plist File Modification"
"T1647": "Plist File Modification",
"T1656": "Impersonation"
},
"Credential Access": {
"T1003": "OS Credential Dumping",
@@ -591,6 +611,7 @@
"T1555.003": "Credentials from Password Stores::Credentials from Web Browsers",
"T1555.004": "Credentials from Password Stores::Windows Credential Manager",
"T1555.005": "Credentials from Password Stores::Password Managers",
"T1555.006": "Credentials from Password Stores::Cloud Secrets Management Stores",
"T1556": "Modify Authentication Process",
"T1556.001": "Modify Authentication Process::Domain Controller Authentication",
"T1556.002": "Modify Authentication Process::Password Filter DLL",
@@ -621,6 +642,7 @@
"T1012": "Query Registry",
"T1016": "System Network Configuration Discovery",
"T1016.001": "System Network Configuration Discovery::Internet Connection Discovery",
"T1016.002": "System Network Configuration Discovery::Wi-Fi Discovery",
"T1018": "Remote System Discovery",
"T1033": "System Owner/User Discovery",
"T1040": "Network Sniffing",
@@ -659,7 +681,8 @@
"T1615": "Group Policy Discovery",
"T1619": "Cloud Storage Object Discovery",
"T1622": "Debugger Evasion",
"T1652": "Device Driver Discovery"
"T1652": "Device Driver Discovery",
"T1654": "Log Enumeration"
},
"Lateral Movement": {
"T1021": "Remote Services",
@@ -670,6 +693,7 @@
"T1021.005": "Remote Services::VNC",
"T1021.006": "Remote Services::Windows Remote Management",
"T1021.007": "Remote Services::Cloud Services",
"T1021.008": "Remote Services::Direct Cloud VM Connections",
"T1072": "Software Deployment Tools",
"T1080": "Taint Shared Content",
"T1091": "Replication Through Removable Media",
@@ -763,7 +787,8 @@
"T1572": "Protocol Tunneling",
"T1573": "Encrypted Channel",
"T1573.001": "Encrypted Channel::Symmetric Cryptography",
"T1573.002": "Encrypted Channel::Asymmetric Cryptography"
"T1573.002": "Encrypted Channel::Asymmetric Cryptography",
"T1659": "Content Injection"
},
"Exfiltration": {
"T1011": "Exfiltration Over Other Network Medium",
@@ -783,7 +808,8 @@
"T1567": "Exfiltration Over Web Service",
"T1567.001": "Exfiltration Over Web Service::Exfiltration to Code Repository",
"T1567.002": "Exfiltration Over Web Service::Exfiltration to Cloud Storage",
"T1567.003": "Exfiltration Over Web Service::Exfiltration to Text Storage Sites"
"T1567.003": "Exfiltration Over Web Service::Exfiltration to Text Storage Sites",
"T1567.004": "Exfiltration Over Web Service::Exfiltration Over Webhook"
},
"Impact": {
"T1485": "Data Destruction",
@@ -811,7 +837,8 @@
"T1565": "Data Manipulation",
"T1565.001": "Data Manipulation::Stored Data Manipulation",
"T1565.002": "Data Manipulation::Transmitted Data Manipulation",
"T1565.003": "Data Manipulation::Runtime Data Manipulation"
"T1565.003": "Data Manipulation::Runtime Data Manipulation",
"T1657": "Financial Theft"
}
},
"mbc": {

View File

@@ -100,9 +100,9 @@ def get_viv_extractor(path: Path):
sigpaths = [
CD / "data" / "sigs" / "test_aulldiv.pat",
CD / "data" / "sigs" / "test_aullrem.pat.gz",
CD.parent / "sigs" / "1_flare_msvc_rtf_32_64.sig",
CD.parent / "sigs" / "2_flare_msvc_atlmfc_32_64.sig",
CD.parent / "sigs" / "3_flare_common_libs.sig",
CD.parent / "capa" / "sigs" / "1_flare_msvc_rtf_32_64.sig",
CD.parent / "capa" / "sigs" / "2_flare_msvc_atlmfc_32_64.sig",
CD.parent / "capa" / "sigs" / "3_flare_common_libs.sig",
]
if "raw32" in path.name:
@@ -393,6 +393,10 @@ def get_data_path_by_name(name) -> Path:
return CD / "data" / "ea2876e9175410b6f6719f80ee44b9553960758c7d0f7bed73c0fe9a78d8e669.dll_"
elif name.startswith("1038a2"):
return CD / "data" / "1038a23daad86042c66bfe6c9d052d27048de9653bde5750dc0f240c792d9ac8.elf_"
elif name.startswith("nested_typedef"):
return CD / "data" / "dotnet" / "dd9098ff91717f4906afe9dafdfa2f52.exe_"
elif name.startswith("nested_typeref"):
return CD / "data" / "dotnet" / "2c7d60f77812607dec5085973ff76cea.dll_"
else:
raise ValueError(f"unexpected sample fixture: {name}")
@@ -1274,6 +1278,114 @@ FEATURE_PRESENCE_TESTS_DOTNET = sorted(
), # MemberRef method
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0/myclass_inner0_0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer0/myclass_inner0_1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_0"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_1"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("mynamespace.myclass_outer1/myclass_inner1_0/myclass_inner_inner"),
True,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner_inner"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner1_0"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner1_1"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner0_0"),
False,
),
(
"nested_typedef",
"file",
capa.features.common.Class("myclass_inner0_1"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.OS.Build/VERSION::SdkInt"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.Media.Image/Plane::Buffer"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.Provider.Telephony/Sent/Sent::ContentUri"),
True,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Android.OS.Build::SdkInt"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Plane::Buffer"),
False,
),
(
"nested_typeref",
"file",
capa.features.file.Import("Sent::ContentUri"),
False,
),
],
# order tests by (file, item)
# so that our LRU cache is most effective.

View File

@@ -949,6 +949,7 @@ def test_count_api():
features:
- or:
- count(api(kernel32.CreateFileA)): 1
- count(api(System.Convert::FromBase64String)): 1
"""
)
r = capa.rules.Rule.from_yaml(rule)
@@ -957,6 +958,7 @@ def test_count_api():
assert bool(r.evaluate({API("kernel32.CreateFile"): set()})) is False
assert bool(r.evaluate({API("CreateFile"): {ADDR1}})) is False
assert bool(r.evaluate({API("CreateFileA"): {ADDR1}})) is True
assert bool(r.evaluate({API("System.Convert::FromBase64String"): {ADDR1}})) is True
def test_invalid_number():