* elf: read segment memory size
* elf: add routine to read mapped memory
* elf: better detect OS for binaries compiled by Go
* elf: guess OS from Go source filenames
* changelog
* elf: mypy
* merge
* elf: add OS detection based on vDSO strings
* elf: document VTGrep searches
* elf: describe further technique to identify Go binaries
* elf: search for `.go.buildinfo` section via @yelhamer
* black
* elf: detect Alpine Linux ident
* elf: log interest symtab entries
* tests: add test for OS detection by Go buildinfo
* loader: handle missing viv modules
* pre-commit: run deptry before tests (which are slow)
* loader: describe removing viv symbolic switch solver
* pyproject: add PyGithub for deptry
* black
* feat(capa2sarif): add new sarif conversion script converting json output to sarif schema, update dependencies, and update changelog
* fix(capa2sarif): removing copy and paste transcription errors
* fix(capa2sarif): remove dependencies from pyproject toml to guarded import statements
* chore(capa2sarif): adding node in readme specifying dependency and applied auto formatter for styling
* style(capa2sarif): applied import sorting and fixed typo in invocations function
* test(capa2sarif): adding simple test for capa to sarif conversion script using existing result document
* style(capa2sarif): fixing typo in version string in usage
* style(capa2sarif): isort failing due to reordering of typehint imports
* style(capa2sarif): fixing import order as isort on local machine was not updating code
---------
Co-authored-by: ReversingWithMe <ryanv@rewith.me>
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
Implement the "tighten rule pre-selection" algorithm described here:
https://github.com/mandiant/capa/issues/2063#issuecomment-2100498720
In summary:
> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".
This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).
When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).
It was used in some places already, but now used everywhere consistently.
This should make it easier to refactor the FeatureSet type, if necessary,
because its easier to see all the places its used.
* Load .json.gz files directly
* Add helper function to load .json and replace json.load references
* add test and update change log
* add .json.gz in EXTENSIONS_DYNAMIC
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
---------
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
* get backend from format
* add lint.py script test
* create FakeArgs object
* adjust EOL handling in lints
---------
Co-authored-by: Willi Ballenthin <wballenthin@google.com>
* main: split main into a bunch of "main routines"
[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.
* main: remove references to wip BinExport2 code
* changelog
* main: rename first position argument "input_file"
closes#1946
* main: linters
* main: move rule-related routines to capa.rules
ref #1821
* main: extract routines to capa.loader module
closes#1821
* add loader module
* loader: learn to load freeze format
* freeze: use new cli arg handling
* Update capa/loader.py
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>
* main: remove duplicate documentation
* main: add doc about where some functions live
* scripts: migrate to new main wrapper helper functions
* scripts: port to main routines
* main: better handle auto-detection of backend
* scripts: migrate bulk-process to main wrappers
* scripts: migrate scripts to main wrappers
* main: rename *_from_args to *_from_cli
* changelog
* cache-ruleset: remove duplication
* main: fix tag handling
* cache-ruleset: fix cli args
* cache-ruleset: fix special rule cli handling
* scripts: fix type bytes
* main: remove old TODO message
* loader: fix references to binja extractor
---------
Co-authored-by: Moritz <mr-tz@users.noreply.github.com>