Commit Graph

46 Commits

Author SHA1 Message Date
Willi Ballenthin
ee17d75be9 implement BinExport2 backend (#1950)
* elf: os: detect Android via clang compiler .ident note

* elf: os: detect Android via dependency on liblog.so

* main: split main into a bunch of "main routines"

[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.

* features: add BinExport2 declarations

* BinExport2: initial skeleton of feature extraction

* main: remove references to wip BinExport2 code

* changelog

* main: rename first position argument "input_file"

closes #1946

* main: linters

* main: move rule-related routines to capa.rules

ref #1821

* main: extract routines to capa.loader module

closes #1821

* add loader module

* loader: learn to load freeze format

* freeze: use new cli arg handling

* Update capa/loader.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* main: remove duplicate documentation

* main: add doc about where some functions live

* scripts: migrate to new main wrapper helper functions

* scripts: port to main routines

* main: better handle auto-detection of backend

* scripts: migrate bulk-process to main wrappers

* scripts: migrate scripts to main wrappers

* main: rename *_from_args to *_from_cli

* changelog

* cache-ruleset: remove duplication

* main: fix tag handling

* cache-ruleset: fix cli args

* cache-ruleset: fix special rule cli handling

* scripts: fix type bytes

* main: nicely format debug messages

* helpers: ensure log messages aren't very long

* flake8 config

* binexport2: formatting

* loader: learn to load BinExport2 files

* main: debug log the format and backend

* elf: add more arch constants

* binexport: parse global features

* binexport: extract file features

* binexport2: begin to enumerate function/bb/insns

* binexport: pass context to function/bb/insn extractors

* binexport: linters

* binexport: linters

* scripts: add script to inspect binexport2 file

* inspect-binexport: fix xref symbols

* inspect-binexport: factor out the index building

* binexport: move index to binexport extractor module

* binexport: implement ELF/aarch64 GOT/thunk analyzer

* binexport: implement API features

* binexport: record the full vertex for a thunk

* binexport: learn to extract numbers

* binexport: number: skipped mapped numbers

* binexport: fix basic block address indexing

* binexport: rename function

* binexport: extract operand numbers

* binexport: learn to extract calls from characteristics

* binexport: learn to extract mnemonics

* pre-commit: skip protobuf file

* binexport: better search for sample file

* loader: add file extractors for BinExport2

* binexport: remove extra parameter

* new black config

* binexport: index string xrefs

* binexport: learn to extract bytes and strings

* binexport: cache parsed PE/ELF

* binexport: handle Ghidra SYMBOL numbers

* binexport2: handle binexport#78 (Ghidra only uses SYMBOL expresssions)

* main: write error output to stderr, not stdout

* scripts: add example detect-binexport2-capabilities.py

* detect-binexport2-capabilities: more documentation/examples

* elffile: recognize more architectures

* binexport: handle read_memory errors

* binexport: index flow graphs by address

* binexport: cleanup logging

* binexport: learn to extract function names

* binexport: learn to extract all function features

* binexport: learn to extract bb tight loops

* elf: don't require vivisect just for type annotations

* main: remove unused imports

* rules: don't eagerly import ruamel until needed

* loader: avoid eager imports of some backend-related code

* changelog

* fmt

* binexport: better render optional fields

* fix merge conflicts

* fix formatting

* remove Ghidra data reference madness

* handle PermissionError when searching sample file for BinExport2 file

* handle PermissionError when searching sample file for BinExport2 file

* add Android as valid OS

* inspect-binexport: strip strings

* inspect-binexport: render operands

* fix lints

* ruff: update config layout

* inspect-binexport: better align comments/xrefs

* use explicit search paths to get sample for BinExport file

* add initial BinExport tests

* add/update BinExport tests and minor fixes

* inspect-binexport: add perf tracking

* inspect-binexport: cache rendered operands

* lints

* do not extract number features for ret instructions

* Fix BinExport's "tight loop" feature extraction.

`idx.target_edges_by_basic_block_index[basic_block_index]` is of type
`List[Edges]`. The index `basic_block_index` was definitely not an
element.

* inspect-binexport: better render data section

* linters

* main: accept --format=binexport2

* binexport: insn: add support for parsing bare immediate int operands

* binexport2: bb: fix tight loop detection

ref #2050

* binexport: api: generate variations of Win32 APIs

* lints

* binexport: index: don't assume instruction index is 1:1 with address

* be2: index instruction addresses

* be2: temp remove bytes feature processing

* binexport: read memory from an address space extracted from PE/ELF

closes #2061

* be2: resolve thunks to imported functions

* be2: check for be2 string reference before bytes/string extraction overhead

* be2: remove unneeded check

* be2: do not process thunks

* be2: insn: polish thunk handling a bit

* be2: pre-compute thunk targets

* parse negative numbers

* update tests to use Ghidra-generated BinExport file

* remove unused import

* black reformat

* run tests always (for now)

* binexport: tests: fix test case

* binexport: extractor: fix insn lint

* binexport: addressspace: use base address recovered from binexport file

* Add nzxor charecteristic in BinExport extractor.

by referencing vivisect implementation.

* add tests, fix stack cookie detection

* test BinExport feature PRs

* reformat and fix

* complete TODO descriptions

* wip tests

* binexport: add typing where applicable (#2106)

* binexport2: revert import names from BinExport2 proto

binexport2_pb.BinExport2 isnt a package so we can't import it like:

    from ...binexport2_pb.BinExport2 import CallGraph

* fix stack offset numbers and disable offset tests

* xfail OperandOffset

* generate symbol variants

* wip: read negative numbers

* update tight loop tests

* binexport: fix function loop feature detection

* binexport: update binexport function loop tests

* binexport: fix lints and imports

* binexport: add back assert statement to thunk calculation

* binexport: update tests to use Ghidra binexport file

* binexport: add additional debug info to thunk calculation assert

* binexport: update unit tests to focus on Ghidra

* binexport: fix lints

* binexport: remove Ghidra symbol madness and fix x86/amd64 stack offset number tests

* binexport: use masking for Number features

* binexport: ignore call/jmp immediates for intel architecture

* binexport: check if immediate is a mapped address

* binexport: emit offset features for immediates likely structure offsets

* binexport: add twos complement wrapper insn.py

* binexport: add support for x86 offset features

* binexport: code refactor

* binexport: init refactor for multi-arch instruction feature parsing

* binexport: intel: emit indirect call characteristic

* binexport: use helper method for instruction mnemonic

* binexport: arm: emit offset features from stp instruction

* binexport: arm: emit indirect call characteristic

* binexport: arm: improve offset feature extraction

* binexport: add workaroud for Ghidra bug that results in empty operands (no expressions)

* binexport: skip x86 stack string tests

* binexport: update mimikatz.exe_ feature count tests for Ghidra

* core: loader: update binja import

* core: loader: update binja imports

* binexport: arm: ignore number features for add instruction manipulating stack

* binexport: update unit tests

* binexport: arm: ignore number features for sub instruction manipulating stack

* binexport: arm: emit offset features for add instructions

* binexport: remove TODO from tests workflow

* binexport: update CHANGELOG

* binexport: remove outdated TODOs

* binexport: re-enable support for data references in inspect-binexport2.py

* binexport: skip data references to code

* binexport: remove outdated TODOs

* Update scripts/inspect-binexport2.py

* Update CHANGELOG.md

* Update capa/helpers.py

* Update capa/features/extractors/common.py

* Update capa/features/extractors/binexport2/extractor.py

* Update capa/features/extractors/binexport2/arch/arm/insn.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* initial add

* test binexport scripts

* add tests using small ARM ELF

* add method to get instruction by address

* index instructions by address

* adjust and extend tests

* handle operator with no children bug

* binexport: use instruction address index

ref: https://github.com/mandiant/capa/pull/1950/files#r1728570811

* inspect binexport: handle lsl with no children

add pruning phase to expression tree building
to remove known-bad branches. This might address
some of the data we're seeing due to:
https://github.com/NationalSecurityAgency/ghidra/issues/6821

Also introduces a --instruction optional argument
to dump the details of a specific instruction.

* binexport: consolidate expression tree logic into helpers

* binexport: index instruction indices by address

* binexport: introduce instruction pattern matching

Introduce intruction pattern matching to declaratively
describe the instructions and operands that we want to
extract. While there's a bit more code, its much more
thoroughly tested, and is less brittle than the prior
if/else/if/else/if/else implementation.

* binexport: helpers: fix missing comment words

* binexport: update tests to reflect updated test files

* remove testing of feature branch

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
Co-authored-by: Lin Chen <larch.lin.chen@gmail.com>
2024-09-12 10:09:05 -06:00
mr-tz
9eab7eb143 update names 2024-08-26 10:11:51 +00:00
mr-tz
e8550f242c rename using dashes for consistency 2024-08-26 09:55:00 +00:00
Yacine Elhamer
fccb533841 test/scripts.py: bugfix 2024-07-01 21:59:28 +01:00
Yacine Elhamer
3b165c3d8e test:scripts.py: add tests for show-features.py process filtering 2024-07-01 21:41:46 +01:00
mr-tz
97a3fba2c9 fix black 2024-06-12 09:24:16 +00:00
ReWithMe
52e24e560b FEAT(capa2sarif) Add SARIF conversion script from json output (#2093)
* feat(capa2sarif): add new sarif conversion script converting json output to sarif schema, update dependencies, and update changelog

* fix(capa2sarif): removing copy and paste transcription errors

* fix(capa2sarif): remove dependencies from pyproject toml to guarded import statements

* chore(capa2sarif): adding node in readme specifying dependency and applied auto formatter for styling

* style(capa2sarif): applied import sorting and fixed typo in invocations function

* test(capa2sarif): adding simple test for capa to sarif conversion script using existing result document

* style(capa2sarif): fixing typo in version string in usage

* style(capa2sarif): isort failing due to reordering of typehint imports

* style(capa2sarif): fixing import order as isort on local machine was not updating code

---------

Co-authored-by: ReversingWithMe <ryanv@rewith.me>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2024-06-11 15:01:26 +02:00
Willi Ballenthin
76a4a5899f test_scripts: avoid unsupported logic combinations 2024-06-07 05:54:49 +02:00
N0stalgikow
0eb4291b25 Updating copyright across all files based on when it was first introduced. (#2027)
* updating copyright, back to the date of origin of file

* updating regex to account for linter violation
2024-03-13 14:04:53 +01:00
Moritz
2c93c5fc83 lint: get backend from format (#1964)
* get backend from format

* add lint.py script test

* create FakeArgs object

* adjust EOL handling in lints

---------

Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2024-02-01 11:33:16 +01:00
Yacine
d66f834e54 Update tests/test_scripts.py
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2023-08-24 13:48:32 +02:00
Yacine Elhamer
9eb1255b29 cape2yara.py: update for use of scopes, and fix bug 2023-08-24 14:32:49 +02:00
Willi Ballenthin
4978aa74e7 tests: temporarily xfail script test
closes #1717
2023-08-15 08:13:14 +00:00
Willi Ballenthin
c1fbb27d73 Merge branch 'master' into dynamic-feature-extraction 2023-08-10 13:21:49 +00:00
Aayush Goel
232c9ce35c Add test for script & output rendered 2023-08-07 22:43:25 +05:30
Yacine Elhamer
462024ad03 update tests to explicitely specify scopes 2023-08-01 07:41:47 +01:00
Yacine Elhamer
16e32f8441 add tests 2023-07-27 10:31:45 +01:00
Yacine Elhamer
e38e56ccf6 Merge remote-tracking branch 'parentrepo/dynamic-feature-extraction' into sync-1657 2023-07-20 09:33:48 +01:00
Willi Ballenthin
c86ab51210 fix copyright headers everywhere 2023-07-13 05:03:33 +02:00
Yacine Elhamer
f86ecfe446 Merge remote-tracking branch 'parentrepo/dynamic-feature-extraction' into analysis-flavor 2023-07-11 10:43:31 +01:00
Aayush Goel
ef39bc3c3a Merged Changes from PR #1591 2023-07-11 01:14:38 +05:30
Aayush Goel
8e346cb411 Merge branch 'Aayush-Goel-04/Issue#1534' of https://github.com/Aayush-Goel-04/capa into Aayush-Goel-04/Issue#1534 2023-07-11 00:59:21 +05:30
Willi Ballenthin
4a49543d12 introduce flake8-print linter 2023-07-09 22:44:47 +02:00
Aayush Goel
673af45c55 Update args.sample type to Path and str vs as_posix comparisons 2023-07-09 16:02:28 +05:30
Aayush Goel
a8f1067f8a Fixed Path issue in cache-ruleset.py 2023-07-07 12:39:18 +05:30
Yacine Elhamer
a8f722c4de xfail tests that require the old ruleset 2023-07-06 18:15:02 +01:00
Yacine Elhamer
32f936ce8c address review comments 2023-07-06 17:17:18 +01:00
Willi Ballenthin
90e607fe9a flake8 2023-07-06 18:11:48 +02:00
Willi Ballenthin
47074fd129 fix ruff issues 2023-07-06 17:49:40 +02:00
Aayush Goel
c0d712acea Changes os.path to pathlib.Path usage
changed args.rules , args.signatures types in handle_common_args.
2023-07-06 05:12:50 +05:30
Aayush Goel
52c3ea733b Update tests/test_scripts.py
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2023-05-24 15:39:24 +05:30
Aayush Goel
acdaeb26d3 Update test_scripts.py 2023-05-20 13:09:48 +05:30
Aayush Goel
0afc16fd02 Update test rules to test script 2023-05-17 23:31:37 +05:30
Aayush Goel
931dcb1dc5 Update test_scripts.py 2023-05-15 23:35:11 +05:30
Aayush Goel
12c191582f Update tests/test_scripts.py
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2023-05-15 22:58:19 +05:30
Aayush Goel
807efec40f Create RuleSet to test overlap script 2023-05-12 22:44:26 +05:30
Aayush Goel
41ff457d65 Update tests/test_scripts.py
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2023-05-12 16:53:44 +05:30
Aayush Goel
eca86470c6 Update test_scripts.py
RULE_CONTENT can be modified as required
2023-05-11 14:12:52 +05:30
Aayush Goel
187a4712cb Update test_scripts.py
Here new_rule_path and expected_overlaps will be changed based on the new test rule designed.
Adding tests to check if the code works fine
2023-05-10 20:55:22 +05:30
Willi Ballenthin
1ccd2c4d0f tests: fix proto tests on windows (#1417)
closes  #1416
2023-03-30 11:45:03 +02:00
Willi Ballenthin
e8082173ad tests: add test demonstrating to/from proto scripts 2023-03-23 15:42:43 +01:00
manasghandat
1336796c0c code style : update remaining files (#1353)
* code style: update string formatting using fstrings

---------

Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
2023-03-16 11:16:18 +01:00
Harsh Mehta
74009eb4a4 Updated Copyright (#1383)
* Updated Copyright
2023-03-14 17:58:43 +01:00
Willi Ballenthin
91818a116d scripts/capa_as_library: use new ResultDocument
closes #1071
2022-06-28 15:53:37 -06:00
Moritz Raabe
6860b9a040 address Willi's feedback 2021-06-29 21:16:31 +02:00
Moritz Raabe
5c8a4aafd7 test scripts and fix show-features 2021-06-29 21:16:31 +02:00