Commit Graph

945 Commits

Author SHA1 Message Date
Capa Bot
14e076864c Sync capa-testfiles submodule 2025-02-22 19:13:14 +00:00
Capa Bot
06fad4a89e Sync capa-testfiles submodule 2025-02-21 12:17:50 +00:00
vibhatsu
a8e8935212 Replace binascii and struct with native Python methods (#2582)
* refactor: replace binascii with bytes for hex conversions

Signed-off-by: vibhatsu <maulikbarot2915@gmail.com>

* refactor: replace struct unpacking with bytes conversion

Signed-off-by: vibhatsu <maulikbarot2915@gmail.com>

* simplify byte extraction for ELF header

Signed-off-by: vibhatsu <maulikbarot2915@gmail.com>

* Revert "refactor: replace struct unpacking with bytes conversion"

This reverts commit 483f8c9a85.

* update CHANGELOG

Signed-off-by: vibhatsu <maulikbarot2915@gmail.com>

---------

Signed-off-by: vibhatsu <maulikbarot2915@gmail.com>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2025-02-04 09:53:36 +01:00
Willi Ballenthin
6d19226ee9 rules: scopes can now have subscope blocks with same scope (#2584) 2025-02-03 19:54:05 +01:00
Dhruva Kumar Kaushal
923e5e1130 use _yield from []_ to create empty generator when needed #2572 (#2581)
* use _yield from []_ to create empty generator when needed #2572

* Update PR with fixes

* solved CI code style error

* Fixed formatting with black

* Fixed formatting with black

* code styles error

* code styles error

* code styles error

* code style error

* Update capa-rules submodule to master

* Similar changes to other files

---------

Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2025-02-03 16:25:59 +01:00
Willi Ballenthin
990fd20757 update submodules 2025-01-29 02:25:06 -07:00
Willi Ballenthin
cdc1cb7afd rename "sequence" scope to "span of calls" scope
pep8

fix ref

update submodules

update testfiles submodule

duplicate variable
2025-01-29 02:25:06 -07:00
Willi Ballenthin
a1d46bc3c0 sequence: don't update feature locations in place
pep8
2025-01-29 02:25:06 -07:00
Willi Ballenthin
f55086c212 sequence: refactor into SequenceMatcher
contains the call ids for all the calls within the sequence, so we know
where to look for related matched.

sequence: refactor SequenceMatcher

sequence: don't use sequence addresses

sequence: remove sequence address
2025-01-29 02:25:06 -07:00
Willi Ballenthin
39319c57a4 sequence: documentation and tests
sequence: add more tests
2025-01-29 02:25:06 -07:00
Willi Ballenthin
294ff34a30 sequence: only match first overlapping sequence
also, for repeating behavior, match only the first instance.
2025-01-29 02:25:06 -07:00
Willi Ballenthin
b06fea130c dynamic: add sequence scope
addresses discussion in
https://github.com/mandiant/capa-rules/discussions/951

pep8

sequence: add test showing multiple sequences overlapping a single event
2025-01-29 02:25:06 -07:00
Willi Ballenthin
8d17319128 capabilities: use dataclasses to represent complicated return types
foo
2025-01-29 02:25:06 -07:00
Mike Hunhoff
160ce73a35 vmray: loosen file checks to enable processing of additional file types (#2571)
* vmray: loosen file checks to enable addtional file types

* additional refactor to loosen file checks

* update CHANGELOG

* cleanup comments and small code refactor

* fix lints

* use NO_ADDRESS for submissions that don't have a base address

* update comments

* add test for ps1 trace
2025-01-23 12:47:36 -07:00
Capa Bot
3702baf9a9 Sync capa-testfiles submodule 2025-01-23 18:36:54 +00:00
Capa Bot
23cf2799ca Sync capa-testfiles submodule 2025-01-21 16:47:14 +00:00
Capa Bot
726c89794f Sync capa-testfiles submodule 2025-01-17 12:59:22 +00:00
Willi Ballenthin
72fe291742 strings: fix type hints and uncovered bugs (#2555)
* strings: fix type hints and uncovered bugs

changelog

add strings tests

strings: fix buf_filled_with

fix strings tests

refactor: optimize and document buf_filled_with function in strings.py

docs: add docstring to buf_filled_with function

doc

strings: add typing

* strings: more validation and testing

thanks @fariss

* copyright
2025-01-16 01:59:16 -07:00
Ana Maria Martinez Gomez
3cd97ae9f2 [copyright + license] Fix headers
Replace the header from source code files using the following script:
```Python
for dir_path, dir_names, file_names in os.walk("capa"):
    for file_name in file_names:
        # header are only in `.py` and `.toml` files
        if file_name[-3:] not in (".py", "oml"):
            continue
        file_path = f"{dir_path}/{file_name}"
        f = open(file_path, "rb+")
        content = f.read()
        m = re.search(OLD_HEADER, content)
        if not m:
            continue
        print(f"{file_path}: {m.group('year')}")
        content = content.replace(m.group(0), NEW_HEADER % m.group("year"))
        f.seek(0)
        f.write(content)
```

Some files had the copyright headers inside a `"""` comment and needed
manual changes before applying the script. `hook-vivisect.py` and
`pyinstaller.spec` didn't include the license in the header and also
needed manual changes.

The old header had the confusing sentence `All rights reserved`, which
does not make sense for an open source license. Replace the header by
the default Google header that corrects this issue and keep capa
consistent with other Google projects.

Adapt the linter to work with the new header.

Replace also the copyright text in the `web/public/index.html` file for
consistency.
2025-01-15 08:52:42 -07:00
Xusheng
4448d612f1 binja: fix up the analysis for the al-khaser_x64.exe_ file. Fix https://github.com/mandiant/capa/issues/2507 2024-12-04 09:36:08 +01:00
Xusheng
d7cf8d1251 Revert "skip test where BN misses the function"
This reverts commit 9ad3f06e1d.
2024-12-04 09:36:08 +01:00
Moritz
e1c786466a Merge pull request #2518 from mandiant/bn/skip-test
skip test where BN misses the function
2024-12-03 14:05:24 +01:00
mr-tz
9ad3f06e1d skip test where BN misses the function 2024-12-03 11:09:38 +00:00
Capa Bot
201ec07b58 Sync capa-testfiles submodule 2024-12-03 08:34:05 +00:00
Capa Bot
c85be8dc72 Sync capa-testfiles submodule 2024-12-03 08:26:34 +00:00
Xusheng
28fcd10d2e Add a unit test for Binary Ninja database 2024-12-02 23:34:07 +08:00
Capa Bot
8cfccbcb44 Sync capa-testfiles submodule 2024-11-28 10:25:40 +00:00
Xusheng
ca7580d417 Update Binary Ninja version to 4.2 (#2499) 2024-11-25 21:50:53 +01:00
Capa Bot
4398b8ac31 Sync capa-testfiles submodule 2024-11-11 15:21:56 +00:00
Capa Bot
35767e6c6a Sync capa-testfiles submodule 2024-10-22 15:29:01 +00:00
mr-tz
2987eeb0ac update type annotations
tmp
2024-10-22 09:38:25 +00:00
Moritz
030954d556 Merge pull request #2433 from mandiant/fix/vmray-string-call-args
fix backslash handling in string call arguments
2024-10-03 11:28:34 +02:00
Capa Bot
389a5eb84f Sync capa-testfiles submodule 2024-10-02 16:56:11 +00:00
mr-tz
6d3b96f0b0 fix backslash handling in string call arguments 2024-10-02 16:54:38 +00:00
Capa Bot
108bd7f224 Sync capa-testfiles submodule 2024-09-30 12:08:25 +00:00
Capa Bot
80899f3f70 Sync capa-testfiles submodule 2024-09-27 09:53:30 +00:00
Moritz
ff1043e976 Merge branch 'master' into fix/2408 2024-09-27 09:35:24 +02:00
Fariss
51a4eb46b8 replace tqdm, termcolor, tabulate with rich (#2374)
* logging: use rich handler for logging

* tqdm: remove unneeded redirecting_print_to_tqdm function

* tqdm: introduce `CapaProgressBar` rich `Progress` bar

* tqdm: replace tqdm with rich Progress bar

* tqdm: remove tqdm dependency

* termcolor: replace termcolor and update `scripts/`

* tests: update `test_render.py` to use rich.console.Console

* termcolor: remove termcolor dependency

* capa.render.utils: add `write` & `writeln` methods to subclass `Console`

* update markup util functions to use fmt strings

* tests: update `test_render.py` to use `capa.render.utils.Console`

* replace kwarg `end=""` with `write` and `writeln` methods

* tabulate: replace tabulate with `rich.table`

* tabulate: remove `tabulate` and its dependency `wcwidth`

* logging: handle logging in `capa.main`

* logging: set up logging in `capa.main`

this commit sets up logging in `capa.main` and uses a shared
`log_console` in `capa.helpers` for logging purposes

* changelog: replace packages with rich

* remove entry from pyinstaller and unneeded progress.update call

* update requirements.txt

* scripts: use `capa.helpers.log_console` in `CapaProgressBar`

* logging: configure root logger to use `RichHandler`

* remove unused import `inspect`
2024-09-27 09:34:21 +02:00
Mike Hunhoff
bfcc705117 dynamic: vmray: remove redundant test 2024-09-26 14:42:08 -06:00
Mike Hunhoff
834150ad1d dynamic: drakvuf: fix A/W API detection 2024-09-26 14:36:16 -06:00
Mike Hunhoff
31ec208a9b dynamic: cape: fix A/W API detection 2024-09-26 14:27:45 -06:00
Mike Hunhoff
a5d9459c42 dynamic: vmray: fix A/W API detection 2024-09-26 14:15:21 -06:00
Moritz
06271a88d4 Fix VMRay missing process data (#2396)
* get all processes, see #2394

* add tests for process recording

* rename symbols for clarification

* handle single and list entries

* update changelog

* dynamic: vmray: use monitor IDs to track processes and threads

* dynamic: vmray: code refactor

* dynamic: vmray: add sanity checks when processing monitor processes

* dynamic: vmray: remove unnecessary keys() access

* dynamic: vmray: clarify comments

* Update CHANGELOG.md

Co-authored-by: Willi Ballenthin <wballenthin@google.com>

* dynamic: vmray: update CHANGELOG

---------

Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
2024-09-26 13:57:30 -06:00
Capa Bot
9975f769f9 Sync capa-testfiles submodule 2024-09-26 17:34:51 +00:00
Capa Bot
12337be2b7 Sync capa-testfiles submodule 2024-09-25 09:17:50 +00:00
Capa Bot
6eda8c9713 Sync capa-testfiles submodule 2024-09-24 11:29:53 +00:00
Capa Bot
22e88c860f Sync capa-testfiles submodule 2024-09-24 11:25:28 +00:00
Capa Bot
25ca29573c Sync capa-testfiles submodule 2024-09-16 12:18:40 +00:00
Capa Bot
8b22a7fca2 Sync capa-testfiles submodule 2024-09-13 12:59:45 +00:00
Willi Ballenthin
ee17d75be9 implement BinExport2 backend (#1950)
* elf: os: detect Android via clang compiler .ident note

* elf: os: detect Android via dependency on liblog.so

* main: split main into a bunch of "main routines"

[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.

* features: add BinExport2 declarations

* BinExport2: initial skeleton of feature extraction

* main: remove references to wip BinExport2 code

* changelog

* main: rename first position argument "input_file"

closes #1946

* main: linters

* main: move rule-related routines to capa.rules

ref #1821

* main: extract routines to capa.loader module

closes #1821

* add loader module

* loader: learn to load freeze format

* freeze: use new cli arg handling

* Update capa/loader.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* main: remove duplicate documentation

* main: add doc about where some functions live

* scripts: migrate to new main wrapper helper functions

* scripts: port to main routines

* main: better handle auto-detection of backend

* scripts: migrate bulk-process to main wrappers

* scripts: migrate scripts to main wrappers

* main: rename *_from_args to *_from_cli

* changelog

* cache-ruleset: remove duplication

* main: fix tag handling

* cache-ruleset: fix cli args

* cache-ruleset: fix special rule cli handling

* scripts: fix type bytes

* main: nicely format debug messages

* helpers: ensure log messages aren't very long

* flake8 config

* binexport2: formatting

* loader: learn to load BinExport2 files

* main: debug log the format and backend

* elf: add more arch constants

* binexport: parse global features

* binexport: extract file features

* binexport2: begin to enumerate function/bb/insns

* binexport: pass context to function/bb/insn extractors

* binexport: linters

* binexport: linters

* scripts: add script to inspect binexport2 file

* inspect-binexport: fix xref symbols

* inspect-binexport: factor out the index building

* binexport: move index to binexport extractor module

* binexport: implement ELF/aarch64 GOT/thunk analyzer

* binexport: implement API features

* binexport: record the full vertex for a thunk

* binexport: learn to extract numbers

* binexport: number: skipped mapped numbers

* binexport: fix basic block address indexing

* binexport: rename function

* binexport: extract operand numbers

* binexport: learn to extract calls from characteristics

* binexport: learn to extract mnemonics

* pre-commit: skip protobuf file

* binexport: better search for sample file

* loader: add file extractors for BinExport2

* binexport: remove extra parameter

* new black config

* binexport: index string xrefs

* binexport: learn to extract bytes and strings

* binexport: cache parsed PE/ELF

* binexport: handle Ghidra SYMBOL numbers

* binexport2: handle binexport#78 (Ghidra only uses SYMBOL expresssions)

* main: write error output to stderr, not stdout

* scripts: add example detect-binexport2-capabilities.py

* detect-binexport2-capabilities: more documentation/examples

* elffile: recognize more architectures

* binexport: handle read_memory errors

* binexport: index flow graphs by address

* binexport: cleanup logging

* binexport: learn to extract function names

* binexport: learn to extract all function features

* binexport: learn to extract bb tight loops

* elf: don't require vivisect just for type annotations

* main: remove unused imports

* rules: don't eagerly import ruamel until needed

* loader: avoid eager imports of some backend-related code

* changelog

* fmt

* binexport: better render optional fields

* fix merge conflicts

* fix formatting

* remove Ghidra data reference madness

* handle PermissionError when searching sample file for BinExport2 file

* handle PermissionError when searching sample file for BinExport2 file

* add Android as valid OS

* inspect-binexport: strip strings

* inspect-binexport: render operands

* fix lints

* ruff: update config layout

* inspect-binexport: better align comments/xrefs

* use explicit search paths to get sample for BinExport file

* add initial BinExport tests

* add/update BinExport tests and minor fixes

* inspect-binexport: add perf tracking

* inspect-binexport: cache rendered operands

* lints

* do not extract number features for ret instructions

* Fix BinExport's "tight loop" feature extraction.

`idx.target_edges_by_basic_block_index[basic_block_index]` is of type
`List[Edges]`. The index `basic_block_index` was definitely not an
element.

* inspect-binexport: better render data section

* linters

* main: accept --format=binexport2

* binexport: insn: add support for parsing bare immediate int operands

* binexport2: bb: fix tight loop detection

ref #2050

* binexport: api: generate variations of Win32 APIs

* lints

* binexport: index: don't assume instruction index is 1:1 with address

* be2: index instruction addresses

* be2: temp remove bytes feature processing

* binexport: read memory from an address space extracted from PE/ELF

closes #2061

* be2: resolve thunks to imported functions

* be2: check for be2 string reference before bytes/string extraction overhead

* be2: remove unneeded check

* be2: do not process thunks

* be2: insn: polish thunk handling a bit

* be2: pre-compute thunk targets

* parse negative numbers

* update tests to use Ghidra-generated BinExport file

* remove unused import

* black reformat

* run tests always (for now)

* binexport: tests: fix test case

* binexport: extractor: fix insn lint

* binexport: addressspace: use base address recovered from binexport file

* Add nzxor charecteristic in BinExport extractor.

by referencing vivisect implementation.

* add tests, fix stack cookie detection

* test BinExport feature PRs

* reformat and fix

* complete TODO descriptions

* wip tests

* binexport: add typing where applicable (#2106)

* binexport2: revert import names from BinExport2 proto

binexport2_pb.BinExport2 isnt a package so we can't import it like:

    from ...binexport2_pb.BinExport2 import CallGraph

* fix stack offset numbers and disable offset tests

* xfail OperandOffset

* generate symbol variants

* wip: read negative numbers

* update tight loop tests

* binexport: fix function loop feature detection

* binexport: update binexport function loop tests

* binexport: fix lints and imports

* binexport: add back assert statement to thunk calculation

* binexport: update tests to use Ghidra binexport file

* binexport: add additional debug info to thunk calculation assert

* binexport: update unit tests to focus on Ghidra

* binexport: fix lints

* binexport: remove Ghidra symbol madness and fix x86/amd64 stack offset number tests

* binexport: use masking for Number features

* binexport: ignore call/jmp immediates for intel architecture

* binexport: check if immediate is a mapped address

* binexport: emit offset features for immediates likely structure offsets

* binexport: add twos complement wrapper insn.py

* binexport: add support for x86 offset features

* binexport: code refactor

* binexport: init refactor for multi-arch instruction feature parsing

* binexport: intel: emit indirect call characteristic

* binexport: use helper method for instruction mnemonic

* binexport: arm: emit offset features from stp instruction

* binexport: arm: emit indirect call characteristic

* binexport: arm: improve offset feature extraction

* binexport: add workaroud for Ghidra bug that results in empty operands (no expressions)

* binexport: skip x86 stack string tests

* binexport: update mimikatz.exe_ feature count tests for Ghidra

* core: loader: update binja import

* core: loader: update binja imports

* binexport: arm: ignore number features for add instruction manipulating stack

* binexport: update unit tests

* binexport: arm: ignore number features for sub instruction manipulating stack

* binexport: arm: emit offset features for add instructions

* binexport: remove TODO from tests workflow

* binexport: update CHANGELOG

* binexport: remove outdated TODOs

* binexport: re-enable support for data references in inspect-binexport2.py

* binexport: skip data references to code

* binexport: remove outdated TODOs

* Update scripts/inspect-binexport2.py

* Update CHANGELOG.md

* Update capa/helpers.py

* Update capa/features/extractors/common.py

* Update capa/features/extractors/binexport2/extractor.py

* Update capa/features/extractors/binexport2/arch/arm/insn.py

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>

* initial add

* test binexport scripts

* add tests using small ARM ELF

* add method to get instruction by address

* index instructions by address

* adjust and extend tests

* handle operator with no children bug

* binexport: use instruction address index

ref: https://github.com/mandiant/capa/pull/1950/files#r1728570811

* inspect binexport: handle lsl with no children

add pruning phase to expression tree building
to remove known-bad branches. This might address
some of the data we're seeing due to:
https://github.com/NationalSecurityAgency/ghidra/issues/6821

Also introduces a --instruction optional argument
to dump the details of a specific instruction.

* binexport: consolidate expression tree logic into helpers

* binexport: index instruction indices by address

* binexport: introduce instruction pattern matching

Introduce intruction pattern matching to declaratively
describe the instructions and operands that we want to
extract. While there's a bit more code, its much more
thoroughly tested, and is less brittle than the prior
if/else/if/else/if/else implementation.

* binexport: helpers: fix missing comment words

* binexport: update tests to reflect updated test files

* remove testing of feature branch

---------

Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
Co-authored-by: Mike Hunhoff <mike.hunhoff@gmail.com>
Co-authored-by: mr-tz <moritz.raabe@mandiant.com>
Co-authored-by: Lin Chen <larch.lin.chen@gmail.com>
2024-09-12 10:09:05 -06:00