Compare commits

...

277 Commits

Author SHA1 Message Date
Ana Maria Martinez Gomez
3831f1c104 extractors: Do not use generate_api_features
`generate_api_features` was merged with the implementation of
`generate_import_features` and replaced by `generate_symbol`:
2b2656c2a3
 Use the new function in the miasm backend implementation.
2021-02-05 15:41:13 +01:00
Ana Maria Martinez Gomez
dc828e82b3 extractors: add required loc_db
Since the following PR, miasm requires LocationDB in the object's
constructor instead of creating a new LocationDB:
https://github.com/cea-sec/miasm/pull/1274

This was not the case at the point I started the miasm backend
implementation. Adapt the code to work with this change, which also
means interacting with miasm in a better way.
2021-02-05 15:41:04 +01:00
Ana María Martínez Gómez
2e98ba990c tests: enable tests for miasm
Everything is red :( Some tests are failing due to the not yet
implemented features. In addition, it looks like miasm has problems
disassembling some of the used files.
2021-02-03 15:07:31 +01:00
Ana María Martínez Gómez
d008fef23f extractors: enable miasm in Python3
Do not make miasm the default until we have ensured everything works as
it should.
2021-02-03 15:07:31 +01:00
Ana María Martínez Gómez
fe458c387a extractors: use block and feature offset function
`f` and `bb` in miasm are not an integer. Introduce `block_offset()` and
`feature_offset()` in the extractors and use them in main to solve this.

Related to https://github.com/cea-sec/miasm/pull/1277
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
3e52c7de23 features: store mnemomics lower case
miasm extracts mnemonic capitalized while other backends do it
lowercase. To ensure capa works with all of them, use lower case in the
Mnemomic constructor.
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
2d1e7946e3 extractors: Implement extract_insn_mnemonic_features
Extract insn mnemonic features in miasm.
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
f2fe173ef3 extractors: Implement extract_insn_api_features
Extract insn API features in miasm.
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
b2fc52d390 extractors: implement miasm insn features template
Add a template for insn features. These features needs some work and
there are many of them, so I'll introduce them independently in their
own commit.
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
5ba4629c3c extractors: implement miasm function features
Add function features.
2021-02-03 12:50:56 +01:00
Ana María Martínez Gómez
4fc9c77791 extractors: implement miasm basic block features
Add basic block features.
2021-02-03 12:50:55 +01:00
Ana María Martínez Gómez
31ba9ee1b3 extractors: Implement get_basic_blocks in miasm
Implement `get_basic_blocks` in `MiasmFeatureExtractor`.
2021-02-03 12:50:55 +01:00
Ana María Martínez Gómez
b4a808ac76 extractors: Implement get_functions in miasm
Implement `get_functions` in `MiasmFeatureExtractor`. It is a proof of
concept, which just considers all loc_keys targets of calls a function.
This is enough to test feature extraction against the functions. A final
version should include other function recognition techniques and be
ported to miasm.
2021-02-03 12:50:55 +01:00
Ana María Martínez Gómez
0f030115d1 extractors: Implement cfg in miasm
Implement `_build_cfg()` in `MiasmFeatureExtractor`.

Co-authored-by: William Ballenthin <william.ballenthin@fireeye.com>
2021-02-03 12:50:55 +01:00
Ana María Martínez Gómez
42573d8df2 extractors: implement miasm file features
Begin to implement miasm backend. Add file features.

This implementation needs:
- https://github.com/cea-sec/miasm/pull/1273

Co-authored-by: William Ballenthin <william.ballenthin@fireeye.com>
2021-02-03 12:50:51 +01:00
Moritz
073c2b5754 Merge pull request #412 from fireeye/ida/meta-add-baseaddr
add imagebase to IDA meta data
2021-02-02 16:48:22 +01:00
mike-hunhoff
ef41d74b82 Merge pull request #411 from fireeye/fix/410
fixes #410
2021-02-02 08:38:23 -07:00
Moritz Raabe
84b3f38810 add imagebase to IDA meta data 2021-02-02 13:54:46 +01:00
mike-hunhoff
2288f38a11 Update capa/main.py
Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
2021-02-01 12:45:36 -07:00
mike-hunhoff
dbc4e06657 Update capa/main.py
Co-authored-by: Willi Ballenthin <willi.ballenthin@gmail.com>
2021-02-01 12:45:29 -07:00
Michael Hunhoff
2433777a76 fixes #410 2021-02-01 11:43:24 -07:00
Moritz
bb7001f5f2 Merge pull request #409 from fireeye/fix/extract-bytes
improve bytes feature extraction
2021-02-01 17:38:40 +01:00
Moritz Raabe
9b5aaa40de improve bytes feature extraction 2021-02-01 17:17:22 +01:00
Capa Bot
96d74f48f4 Sync capa rules submodule 2021-02-01 11:55:33 +00:00
Capa Bot
f07af25a6a Sync capa rules submodule 2021-01-28 16:52:21 +00:00
Willi Ballenthin
14e65c4601 Merge pull request #401 from fireeye/linter-format
Lint rule formatting and improved rule dump
2021-01-28 09:18:20 -07:00
Capa Bot
b5c2fb0259 Sync capa rules submodule 2021-01-28 16:06:09 +00:00
Capa Bot
92d98db7bb Sync capa-testfiles submodule 2021-01-28 15:25:17 +00:00
Moritz
e6f7ef604a Merge pull request #404 from fireeye/bugfix/403
fixing #403
2021-01-28 11:17:39 +01:00
Moritz Raabe
0eb8d3e47c fix time debug output 2021-01-28 11:09:25 +01:00
Moritz Raabe
072e30498b adjust negative hex numbers in to_yaml 2021-01-28 10:54:17 +01:00
Moritz Raabe
d6e73577af dont change quotes when dumping 2021-01-28 10:54:17 +01:00
Moritz Raabe
a81f98be8e manual adjust negative numbers 2021-01-28 10:54:17 +01:00
Moritz Raabe
0980e35c29 simplify string comparison 2021-01-28 10:54:17 +01:00
Moritz Raabe
336c2a3aff add option to only check reformat status 2021-01-28 10:54:17 +01:00
Moritz Raabe
e3055bc740 check rule format consistency 2021-01-28 10:54:17 +01:00
Capa Bot
9406e3dbfb Sync capa rules submodule 2021-01-28 09:52:43 +00:00
Moritz
5307b7e1b1 Merge pull request #408 from fireeye/fix/lint-lib-path
adjust expected lib path and log time
2021-01-28 10:28:30 +01:00
Moritz Raabe
f18a8f5b31 adjust expected lib path and log time 2021-01-28 10:18:03 +01:00
Moritz
cfe99c4b72 Merge pull request #407 from fireeye/fix/lint-logging
disable extractor progress
2021-01-28 09:25:07 +01:00
Moritz Raabe
0d439c0f55 disable extractor progress 2021-01-28 09:22:15 +01:00
Moritz
6288a96a8b Merge pull request #406 from fireeye/ci/disable-python36
Disable Python 3.6 tests
2021-01-28 08:35:42 +01:00
Moritz
819b6f6ccf Merge pull request #402 from fireeye/lib-rules-subscoped
potential fix for #398
2021-01-28 08:35:28 +01:00
Moritz Raabe
4bc06aa8cd closes #405 2021-01-28 08:23:15 +01:00
Moritz Raabe
7b64425c24 update doc and test case 2021-01-28 08:18:23 +01:00
Michael Hunhoff
44c9d6a22b fixing #403 2021-01-27 18:29:53 -07:00
Moritz Raabe
c750447d62 potential fix for #398 2021-01-27 17:59:56 +01:00
Willi Ballenthin
059ec8f3f2 Merge pull request #400 from fireeye/ci/enable-py39-2
bump smda, enable Python 3.9
2021-01-22 07:18:54 -07:00
Moritz Raabe
2c5508febd bump smda, enable Python 3.9 2021-01-22 10:00:25 +01:00
Capa Bot
905fff041b Sync capa rules submodule 2021-01-21 21:32:42 +00:00
Willi Ballenthin
20ce29b033 Merge pull request #396 from fireeye/dependabot/pip/smda-1.5.11
Bump smda from 1.5.10 to 1.5.11
2021-01-19 08:21:00 -07:00
Capa Bot
4bd93a680e Sync capa-testfiles submodule 2021-01-18 08:02:29 +00:00
dependabot[bot]
c9bf7f424d Bump smda from 1.5.10 to 1.5.11
Bumps [smda](https://github.com/danielplohmann/smda) from 1.5.10 to 1.5.11.
- [Release notes](https://github.com/danielplohmann/smda/releases)
- [Commits](https://github.com/danielplohmann/smda/commits)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-18 06:44:33 +00:00
Capa Bot
4cde2e1a78 Sync capa rules submodule 2021-01-16 15:39:09 +00:00
Capa Bot
48c045d381 Sync capa rules submodule 2021-01-12 18:30:44 +00:00
Capa Bot
2b385ead7f Sync capa rules submodule 2021-01-12 18:30:11 +00:00
Capa Bot
0fcc9f3df6 Sync capa-testfiles submodule 2021-01-12 18:27:32 +00:00
Capa Bot
b251202804 Sync capa-testfiles submodule 2021-01-12 18:27:11 +00:00
Capa Bot
6967010281 Sync capa-testfiles submodule 2021-01-12 18:26:12 +00:00
Capa Bot
7e0846e66a Sync capa rules submodule 2021-01-12 17:55:13 +00:00
Moritz
4e3daad96d Merge pull request #391 from fireeye/fix/freeze-base-addr
add base address to freeze
2021-01-11 11:30:29 +01:00
Capa Bot
37fb3da5db Sync capa rules submodule 2021-01-08 16:36:36 +00:00
Capa Bot
762f48957c Sync capa rules submodule 2021-01-08 15:16:32 +00:00
Capa Bot
c1af7b8783 Sync capa-testfiles submodule 2021-01-08 15:14:26 +00:00
Moritz Raabe
f89084677d add base address to freeze 2021-01-08 14:48:26 +01:00
Capa Bot
0716084bbb Sync capa-testfiles submodule 2021-01-08 08:46:53 +00:00
Capa Bot
a6c946e6c9 Sync capa rules submodule 2021-01-07 13:59:20 +00:00
Capa Bot
3f6e088faa Sync capa-testfiles submodule 2021-01-07 11:53:24 +00:00
Capa Bot
9abdd5813b Sync capa rules submodule 2021-01-07 07:47:28 +00:00
Capa Bot
f33ea36e6f Sync capa rules submodule 2021-01-05 15:49:04 +00:00
Moritz
8788e0a9c9 Merge pull request #388 from fireeye/ci/linter-update
lint with tags
2021-01-05 16:37:21 +01:00
Moritz Raabe
b1c1cb4b9b lint with --tag 2021-01-05 16:16:35 +01:00
Capa Bot
982d4ac472 Sync capa-testfiles submodule 2021-01-04 14:42:43 +00:00
Capa Bot
b7a8d667b9 Sync capa rules submodule 2021-01-04 12:51:43 +00:00
Capa Bot
8f8729df05 Sync capa-testfiles submodule 2020-12-30 19:06:28 +00:00
Capa Bot
e928d281dd Sync capa-testfiles submodule 2020-12-30 15:21:36 +00:00
Capa Bot
625583f5ab Sync capa rules submodule 2020-12-23 12:44:25 +00:00
Capa Bot
ab54553dd2 Sync capa rules submodule 2020-12-22 17:16:54 +00:00
Moritz
47bf7b1325 Merge pull request #375 from doomedraven/return_dict
add render to dict, is the same as default but just in dictionary so …
2020-12-22 15:52:50 +01:00
Moritz
145d75f579 Merge pull request #381 from fireeye/fix/viv-set-logger-levels
set level of more viv loggers explicitly
2020-12-22 15:52:05 +01:00
Capa Bot
01d976d7f7 Sync capa rules submodule 2020-12-22 13:17:37 +00:00
Capa Bot
095e3720ab Sync capa-testfiles submodule 2020-12-22 12:00:35 +00:00
Capa Bot
d62a37fe1f Sync capa-testfiles submodule 2020-12-21 16:17:33 +00:00
Capa Bot
5323f2fc31 Sync capa rules submodule 2020-12-17 17:14:43 +00:00
Capa Bot
5539cb0d08 Sync capa rules submodule 2020-12-17 17:12:21 +00:00
Capa Bot
76e80106d6 Sync capa-testfiles submodule 2020-12-17 09:29:56 +00:00
Capa Bot
9ab7b9a033 Sync capa rules submodule 2020-12-16 20:47:34 +00:00
Capa Bot
fe97d6a349 Sync capa-testfiles submodule 2020-12-15 19:23:15 +00:00
Capa Bot
2242c2afe8 Sync capa-testfiles submodule 2020-12-15 19:19:09 +00:00
Willi Ballenthin
ec25fb5c36 Merge pull request #384 from fireeye/dependabot/pip/smda-1.5.10
Bump smda from 1.5.9 to 1.5.10
2020-12-14 10:32:31 -07:00
dependabot[bot]
ce25f5cadd Bump smda from 1.5.9 to 1.5.10
Bumps [smda](https://github.com/danielplohmann/smda) from 1.5.9 to 1.5.10.
- [Release notes](https://github.com/danielplohmann/smda/releases)
- [Commits](https://github.com/danielplohmann/smda/commits)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-14 07:15:58 +00:00
Capa Bot
1099f40f19 Sync capa rules submodule 2020-12-12 05:43:31 +00:00
Capa Bot
70368b3f1e Sync capa rules submodule 2020-12-11 10:42:16 +00:00
Capa Bot
0181ebad45 Sync capa-testfiles submodule 2020-12-10 17:38:00 +00:00
DoomedRaven
e158e3f13c remove type hint to make CI happy 2020-12-08 21:46:39 +01:00
DoomedRaven
b1bbded23c black -l 120 . 2020-12-08 21:39:50 +01:00
DoomedRaven
b77d9d3738 isort --profile black --length-sort --line-width 120 capa_as_library.py 2020-12-08 21:34:42 +01:00
DoomedRaven
d0b2421752 isort capa_as_library.py 2020-12-08 20:53:26 +01:00
DoomedRaven
96b65a7c60 add example how to render it as library
```
>>> from capa_as_library import capa_details
>>> details = capa_details("/opt/CAPEv2/storage/analyses/83/binary", "dictionary")
>>> from pprint import pprint as pp
>>> pp(details)
{'ATTCK': {'DEFENSE EVASION': ['Obfuscated Files or Information [T1027]',
                               'Virtualization/Sandbox Evasion::System Checks '
                               '[T1497.001]'],
           'EXECUTION': ['Shared Modules [T1129]']},
 'CAPABILITY': {'anti-analysis/anti-vm/vm-detection': ['execute anti-VM '
                                                       'instructions (3 '
                                                       'matches)'],
                'anti-analysis/obfuscation/string/stackstring': ['contain '
                                                                 'obfuscated '
                                                                 'stackstrings'],
                'data-manipulation/encryption/rc4': ['encrypt data using RC4 '
                                                     'PRGA'],
                'executable/pe/section/rsrc': ['contain a resource (.rsrc) '
                                               'section'],
                'host-interaction/cli': ['accept command line arguments'],
                'host-interaction/environment-variable': ['query environment '
                                                          'variable'],
                'host-interaction/file-system/read': ['read .ini file',
                                                      'read file'],
                'host-interaction/file-system/write': ['write file (3 '
                                                       'matches)'],
                'host-interaction/process': ['get thread local storage value '
                                             '(3 matches)',
                                             'set thread local storage value '
                                             '(2 matches)'],
                'host-interaction/process/terminate': ['terminate process (3 '
                                                       'matches)'],
                'host-interaction/thread/terminate': ['terminate thread'],
                'linking/runtime-linking': ['link function at runtime (7 '
                                            'matches)',
                                            'link many functions at runtime'],
                'load-code/pe': ['parse PE header (3 matches)']},
 'MBC': {'ANTI-BEHAVIORAL ANALYSIS': ['Virtual Machine Detection::Instruction '
                                      'Testing [B0009.029]'],
         'ANTI-STATIC ANALYSIS': ['Disassembler Evasion::Argument Obfuscation '
                                  '[B0012.001]'],
         'CRYPTOGRAPHY': ['Encrypt Data::RC4 [C0027.009]',
                          'Generate Pseudo-random Sequence::RC4 PRGA '
                          '[C0021.004]']},
 'md5': 'ad56c384476a81faef9aebd60b2f4623',
 'path': '/opt/CAPEv2/storage/analyses/83/binary',
 'sha1': 'aa027d89f5d3f991ad3e14ffb681616a77621836',
 'sha256': '16995e059eb47de0b58a95ce2c3d863d964a7a16064d4298cee9db1de266e68d'}
>>>
```
2020-12-08 20:00:24 +01:00
Willi Ballenthin
177c90093e Merge pull request #380 from doomedraven/patch-1
fix is_ordinal IndexError
2020-12-08 09:21:53 -07:00
Moritz Raabe
28ee091107 set level of more viv loggers explicitly 2020-12-08 16:30:23 +01:00
doomedraven
64c71d8e6d fix is_ordinal IndexError
```
 Traceback (most recent call last):
   File "/opt/CAPE/utils/../lib/cuckoo/common/cape_utils.py", line 223, in flare_capa_details
     capabilities, counts = capa.main.find_capabilities(rules, extractor, disable_progress=True)
   File "/usr/local/lib/python2.7/dist-packages/capa/main.py", line 116, in find_capabilities
     function_matches, bb_matches, feature_count = find_function_capabilities(ruleset, extractor, f)
   File "/usr/local/lib/python2.7/dist-packages/capa/main.py", line 68, in find_function_capabilities
     for feature, va in extractor.extract_insn_features(f, bb, insn):
   File "/usr/local/lib/python2.7/dist-packages/capa/features/extractors/viv/__init__.py", line 84, in extract_insn_features
     for feature, va in capa.features.extractors.viv.insn.extract_features(f, bb, insn):
   File "/usr/local/lib/python2.7/dist-packages/capa/features/extractors/viv/insn.py", line 599, in extract_features
     for feature, va in insn_handler(f, bb, insn):
   File "/usr/local/lib/python2.7/dist-packages/capa/features/extractors/viv/insn.py", line 93, in extract_insn_api_features
     for name in capa.features.extractors.helpers.generate_symbols(dll, symbol):
   File "/usr/local/lib/python2.7/dist-packages/capa/features/extractors/helpers.py", line 61, in generate_symbols
     if not is_ordinal(symbol):
   File "/usr/local/lib/python2.7/dist-packages/capa/features/extractors/helpers.py", line 45, in is_ordinal
     return symbol[0] == "#"
 IndexError: string index out of range
```
2020-12-08 09:50:00 +01:00
Moritz
9ce0c94e17 Merge pull request #379 from fireeye/fix/nzxor-xor-instructions
add more xor instructions
2020-12-08 09:37:35 +01:00
Moritz Raabe
08c3372635 add more xor instructions 2020-12-08 09:21:50 +01:00
Capa Bot
2fafc70b69 Sync capa-testfiles submodule 2020-12-07 18:06:53 +00:00
Capa Bot
0e62ebe3a2 Sync capa-testfiles submodule 2020-12-07 17:16:01 +00:00
Moritz
1cc4d20b89 Merge pull request #373 from fireeye/ci/setup-dependabot
add dependabot config
2020-12-07 18:03:57 +01:00
Capa Bot
af4889894a Sync capa rules submodule 2020-12-04 08:31:42 +00:00
Moritz
429a5e1ea3 Merge pull request #378 from fireeye/fix/viv-string-extractor
fix: add viv extract strings for i386ImmMemOper operands
2020-12-04 08:55:23 +01:00
Moritz Raabe
4ef860eb07 fix: add viv extract strings for i386ImmMemOper operands 2020-12-03 20:24:29 +01:00
Capa Bot
b59ebf30c6 Sync capa-testfiles submodule 2020-12-03 18:57:45 +00:00
Capa Bot
a1ae8d54a6 Sync capa rules submodule 2020-12-02 15:24:15 +00:00
Capa Bot
8155207bea Sync capa rules submodule 2020-12-02 15:13:30 +00:00
Capa Bot
337d2cfa6d Sync capa rules submodule 2020-12-02 15:12:27 +00:00
Capa Bot
df2229782b Sync capa rules submodule 2020-12-02 15:08:55 +00:00
doomedraven
5920552649 small improvements 2020-12-01 20:31:56 +01:00
doomedraven
b4827fcb00 add render to dict, is the same as default but just in dictionary so simplifies the integrations 2020-12-01 19:43:54 +01:00
Willi Ballenthin
63983ccb65 Merge pull request #372 from doomedraven/patch-1
Simple example how to use capa as library
2020-12-01 06:56:44 -07:00
Willi Ballenthin
eac7e2b749 capa_as_library: style and comments 2020-12-01 06:54:55 -07:00
Moritz Raabe
65a365bca1 update halo requirements py2/3 2020-12-01 11:46:53 +01:00
Moritz Raabe
fecd0e11eb add dependabot config 2020-12-01 11:46:14 +01:00
doomedraven
51ad526cfc Simple example how to use capa as library
Just quick example how to use capa as library, to save time to someone, reading code and scripts
2020-12-01 11:20:49 +01:00
Moritz
10a062017d Merge pull request #370 from fireeye/pin-smda
pin smda
2020-12-01 11:10:23 +01:00
Moritz Raabe
0d351794db pin smda
addresses #369
2020-12-01 11:02:36 +01:00
Capa Bot
067e3ffced Sync capa-testfiles submodule 2020-11-30 19:36:59 +00:00
Capa Bot
50d55fae56 Sync capa-testfiles submodule 2020-11-23 17:55:56 +00:00
Capa Bot
ce63628d3d Sync capa rules submodule 2020-11-19 15:43:59 +00:00
Capa Bot
13df7f90f6 Sync capa rules submodule 2020-11-19 15:09:24 +00:00
Capa Bot
f5099b873d Sync capa rules submodule 2020-11-19 11:40:38 +00:00
Capa Bot
70eb38895d Sync capa-testfiles submodule 2020-11-18 16:28:34 +00:00
Capa Bot
7aea9fa1d2 Sync capa rules submodule 2020-11-16 19:38:02 +00:00
Capa Bot
5d30be31e0 Sync capa rules submodule 2020-11-16 09:44:08 +00:00
Capa Bot
7abe66e3de Sync capa rules submodule 2020-11-16 06:40:23 +00:00
mike-hunhoff
49ef5e5e64 Merge pull request #364 from fireeye/viv/fix-353
improve viv extractor unicode string detection
2020-11-10 17:56:47 -07:00
Michael Hunhoff
c2266bc105 improve viv extractor unicode string detection with supporting unit test 2020-11-10 12:23:07 -07:00
Moritz
a813e219e6 Merge pull request #363 from fireeye/williballenthin-patch-1
ci: disable py3.9 testing
2020-11-09 21:14:36 +01:00
Moritz
1c1fb20546 Merge pull request #355 from danielplohmann/backend-smda
initial commit for backend-smda
2020-11-09 21:13:51 +01:00
Willi Ballenthin
65feb60bb8 ci: disable py3.9 testing 2020-11-09 13:06:37 -07:00
Daniel Plohmann (jupiter)
f7492c7dc7 throw UnsupportedRuntimeError if SmdaFeatureExtractor is used with a Python version < 3.0 2020-11-09 16:20:08 +01:00
Moritz Raabe
dfc805b89b improvements for PR #355 2020-11-09 13:39:19 +01:00
Moritz Raabe
75defc13a0 disable fail-fast for tests job 2020-11-09 13:22:23 +01:00
Daniel Plohmann (jupiter)
7d4888bb77 addressing the comments in the PR discussion 2020-11-06 10:09:06 +01:00
Daniel Plohmann (jupiter)
1a34029171 Merge branch 'master' of github.com:fireeye/capa into backend-smda 2020-11-06 09:50:09 +01:00
Willi Ballenthin
f6ad4652e4 Merge pull request #358 from fireeye/doc/pyinstaller
document PyInstaller build process
2020-11-05 09:19:51 -07:00
pnx@pyrite
1e25604b0b replacement test for nested x64 thunks - still needs to be verified for vivisect 2020-11-05 16:31:47 +01:00
pnx@pyrite
3a43ffa641 adjusted identification of thunks via SMDA. 2020-11-05 12:58:07 +01:00
Capa Bot
8f6bcf3d98 Sync capa rules submodule 2020-11-03 14:23:36 +00:00
Moritz Raabe
0fd9753681 document PyInstaller build process
closes #357
2020-11-03 15:03:32 +01:00
Capa Bot
76a04dfe25 Sync capa rules submodule 2020-11-03 13:20:30 +00:00
Capa Bot
16317182e3 Sync capa-testfiles submodule 2020-11-03 13:14:45 +00:00
Daniel Plohmann (jupiter)
6bcdf64f67 formatting 2020-10-30 15:34:02 +01:00
Daniel Plohmann (jupiter)
d276a07a71 comments on a test where disassembly differs among backends 2020-10-30 15:29:38 +01:00
Daniel Plohmann (jupiter)
f3b59b342a Merge branch 'backend-smda' of github.com:danielplohmann/capa into backend-smda 2020-10-30 15:25:45 +01:00
Daniel Plohmann (jupiter)
4a0f1f22ba test fixes 2020-10-30 15:25:42 +01:00
Jon Crussell
0c85e7604c use magical derefs
Found derefs in viv/insn.py, does exactly what we need!
2020-10-30 07:23:24 -07:00
Jon Crussell
8f6a46e2d8 add check for pointer to string
Check if memory referenced is a pointer to a string. Fixes mimikatz
string test.
2020-10-30 07:01:07 -07:00
Daniel Plohmann (jupiter)
74b2c18296 down to 14 failed 2020-10-29 20:05:50 +01:00
Jon Crussell
b12d0b6424 tests: add smda backend test
40 failed, 73 passed.
2020-10-29 09:56:28 -07:00
Daniel Plohmann (jupiter)
60ddf0400e addressing review 2020-10-29 17:47:10 +01:00
Daniel Plohmann (jupiter)
669d3484c0 Merge remote-tracking branch 'origin/master' into backend-smda 2020-10-29 17:38:21 +01:00
William Ballenthin
5420ad97a3 sync submodules 2020-10-29 09:42:56 -06:00
Daniel Plohmann (jupiter)
36822926af initial commit for backend-smda 2020-10-29 11:28:22 +01:00
Capa Bot
eef8f2e781 Sync capa rules submodule 2020-10-29 03:50:40 +00:00
Capa Bot
31ac667623 Sync capa rules submodule 2020-10-27 15:16:07 +00:00
Capa Bot
868ceb25bf Sync capa rules submodule 2020-10-27 15:15:30 +00:00
Capa Bot
ee3ab94774 Sync capa rules submodule 2020-10-27 15:15:04 +00:00
Capa Bot
1c47877a8c Sync capa rules submodule 2020-10-27 15:14:22 +00:00
Capa Bot
84698462f3 Sync capa rules submodule 2020-10-27 15:13:25 +00:00
Capa Bot
da7dc793e7 Sync capa rules submodule 2020-10-27 15:12:51 +00:00
Capa Bot
044ee83fbc Sync capa-testfiles submodule 2020-10-26 16:48:15 +00:00
Capa Bot
aea324c4a8 Sync capa rules submodule 2020-10-26 16:47:44 +00:00
Capa Bot
4d05b20830 Sync capa rules submodule 2020-10-26 16:46:53 +00:00
Willi Ballenthin
276928951c build: event published/edited, not created 2020-10-23 15:17:32 -06:00
Willi Ballenthin
9486654e77 changelog: v1.4.1 2020-10-23 15:13:22 -06:00
Willi Ballenthin
2a2b4cbb06 Merge pull request #351 from fireeye/ci-build-windows-vcpython27
fix build on windows-latest
2020-10-23 15:10:56 -06:00
Willi Ballenthin
3ba4a8cdd8 Update build.yml 2020-10-23 15:07:13 -06:00
Willi Ballenthin
8820dabab9 Update build.yml 2020-10-23 14:59:34 -06:00
Willi Ballenthin
f9d89301df Update build.yml 2020-10-23 14:58:44 -06:00
Willi Ballenthin
7edb93d3ad Update build.yml 2020-10-23 14:57:14 -06:00
Moritz
5c5d9974e1 Merge pull request #350 from fireeye/release-1.4.0
release v1.4.0
2020-10-23 22:31:00 +02:00
Moritz Raabe
b0bf4f8f8e prepare new release 2020-10-23 22:24:50 +02:00
Capa Bot
04ea03caf6 Sync capa rules submodule 2020-10-23 18:50:52 +00:00
Capa Bot
cf0841bdcc Sync capa-testfiles submodule 2020-10-23 18:49:05 +00:00
Capa Bot
cc4f5f66d8 Sync capa-testfiles submodule 2020-10-23 18:42:54 +00:00
Capa Bot
e6d75ee7c4 Sync capa rules submodule 2020-10-23 16:46:53 +00:00
Moritz
61986fc98c Merge pull request #333 from fireeye/improve-packaging-setup
add long description and other improvements
2020-10-23 13:16:13 +02:00
Moritz
0e009c7c12 Merge pull request #347 from fireeye/fix/non-ascii-char-filename
get decoded sample path
2020-10-23 13:15:36 +02:00
Moritz
425613ee42 Merge pull request #346 from fireeye/extract/api-jmps
Extract/api jmps
2020-10-23 13:15:10 +02:00
Moritz Raabe
679316946e addressing Willi's feedback 2020-10-22 20:10:47 +02:00
Moritz
8bb305038b Merge pull request #343 from fireeye/fix/file-imports-ordinal-name
extract ordinal and name imports
2020-10-22 20:07:42 +02:00
Moritz Raabe
fbe104d254 get decoded sample path
closes #328
2020-10-22 19:56:41 +02:00
Capa Bot
cb44cb0ee2 Sync capa-testfiles submodule 2020-10-22 17:49:54 +00:00
Capa Bot
2163f64877 Sync capa-testfiles submodule 2020-10-22 17:49:18 +00:00
Capa Bot
a14d958ef0 Sync capa-testfiles submodule 2020-10-22 13:17:55 +00:00
Capa Bot
c65ef12783 Sync capa rules submodule 2020-10-22 04:02:25 +00:00
Capa Bot
8eb1727c76 Sync capa rules submodule 2020-10-21 15:54:41 +00:00
William Ballenthin
fafe24295a Merge branch 'master' of github.com:fireeye/capa 2020-10-21 09:53:09 -06:00
William Ballenthin
d900a6c145 render: default: sanity check MBC 2020-10-21 09:52:40 -06:00
Capa Bot
03df2fa3e9 Sync capa rules submodule 2020-10-21 15:43:31 +00:00
Moritz Raabe
69a4b99d70 extract apis called via jmp
closes #337
2020-10-21 12:39:45 +02:00
Capa Bot
39d95b2fd2 Sync capa rules submodule 2020-10-21 10:21:54 +00:00
Moritz Raabe
1e3b29de2e add IDA specific test 2020-10-21 12:16:50 +02:00
Moritz
d5186f160d Merge pull request #342 from fireeye/viv/extractor/api-thunk-chains
extract api features for thunk chains
2020-10-21 11:37:58 +02:00
Capa Bot
5d7dbd15c7 Sync capa-testfiles submodule 2020-10-21 09:35:22 +00:00
Moritz Raabe
12d5fe0afe addressing feedback 2020-10-21 11:25:08 +02:00
Capa Bot
3df1cc9038 Sync capa rules submodule 2020-10-20 21:04:10 +00:00
Willi Ballenthin
d46152b73e Merge pull request #345 from fireeye/fix/build-workflow-set-env-var
set env var via environment file
2020-10-20 09:55:26 -06:00
Moritz Raabe
9fc6e0d6a2 Merge branch 'enhance/show-features' into viv/extractor/api-thunk-chains 2020-10-20 15:26:51 +02:00
Moritz Raabe
4994d0597f set env var via environment file 2020-10-20 15:14:36 +02:00
Moritz Raabe
76b46d7957 ensure function is defined in vivisect (or do so)
and show features in IDA
2020-10-20 15:09:07 +02:00
Moritz Raabe
0a369c548b extract ordinal and name imports 2020-10-20 14:56:38 +02:00
Moritz Raabe
9a738ba413 extract api features for thunk chains
closes #341
2020-10-20 14:49:09 +02:00
Moritz
a442536246 Merge pull request #340 from fireeye/ida/extractor/improve-api-thunk-detection
ida/extractor: improve detection of APIs called via two or more chained thunks
2020-10-19 20:51:16 +02:00
Capa Bot
f85b6fde7b Sync capa rules submodule 2020-10-16 16:05:56 +00:00
Capa Bot
8dc6a5109a Sync capa-testfiles submodule 2020-10-15 21:00:58 +00:00
Michael Hunhoff
235d9d4ab5 improve detection of APIs called via two or more chained thunks 2020-10-15 14:31:23 -06:00
Capa Bot
3572de058b Sync capa rules submodule 2020-10-08 18:16:59 +00:00
Capa Bot
93068aff1b Sync capa-testfiles submodule 2020-10-08 18:16:15 +00:00
Capa Bot
49e7d75ce5 Sync capa rules submodule 2020-10-08 15:53:20 +00:00
Capa Bot
6aa1ecd1a8 Sync capa-testfiles submodule 2020-10-08 15:52:23 +00:00
Capa Bot
b442fbb19c Sync capa rules submodule 2020-10-07 20:58:02 +00:00
Capa Bot
46fc4f0c25 Sync capa-testfiles submodule 2020-10-07 20:57:34 +00:00
Capa Bot
155de6f2b9 Sync capa rules submodule 2020-10-06 16:30:56 +00:00
Capa Bot
459af7ab1b Sync capa rules submodule 2020-10-06 02:36:03 +00:00
Willi Ballenthin
2bd408a274 Merge pull request #338 from fireeye/fix/feature-str
fix feature display
2020-10-05 14:19:54 -06:00
Moritz Raabe
bc1c5a59f8 display value including 0 2020-10-05 22:10:04 +02:00
Willi Ballenthin
49cecdc75d Merge pull request #336 from fireeye/fix-335
modify find_byte_sequence to yield all locations
2020-10-05 11:02:36 -06:00
Capa Bot
2a6aeae763 Sync capa rules submodule 2020-10-05 17:02:21 +00:00
Michael Hunhoff
f295e1da31 modify find_byte_sequence to yield all locations, instead of only first 2020-10-05 10:27:45 -06:00
Capa Bot
1981859343 Sync capa rules submodule 2020-10-05 16:11:30 +00:00
Capa Bot
9de237e1a3 Sync capa-testfiles submodule 2020-10-05 14:18:32 +00:00
Moritz Raabe
77b412c1e8 add long description and other improvements 2020-10-02 17:08:03 +02:00
Moritz
a31529bb79 Merge pull request #332 from fireeye/render-mbc
render mbc table
2020-10-02 11:09:39 +02:00
Moritz Raabe
00bc1a169e render mbc table 2020-10-01 11:10:03 +02:00
Capa Bot
3e98cac397 Sync capa rules submodule 2020-10-01 09:00:31 +00:00
Capa Bot
8cd0777683 Sync capa rules submodule 2020-10-01 08:32:39 +00:00
Capa Bot
8bac77c2ab Sync capa rules submodule 2020-10-01 07:57:13 +00:00
Capa Bot
3312e1b20b Sync capa rules submodule 2020-09-30 17:27:42 +00:00
Capa Bot
d55e2a2647 Sync capa rules submodule 2020-09-28 15:03:30 +00:00
Willi Ballenthin
e87d9cd1b5 Merge pull request #330 from fireeye/fix-329
fix 329
2020-09-28 09:01:34 -06:00
Michael Hunhoff
5dda95385d use rpartition in capa.features.insn.API to handle API name w/ multiple . 2020-09-28 08:33:08 -06:00
Willi Ballenthin
d60bdb561e Merge pull request #327 from fireeye/fix/312-statement-descriptions
parse descriptions for statements
2020-09-25 11:50:47 -06:00
Capa Bot
fab89beba0 Sync capa rules submodule 2020-09-25 17:49:24 +00:00
Moritz Raabe
1cb9ed9c01 addressing final comments 2020-09-25 18:38:46 +02:00
Moritz Raabe
00b7f2e02f addressing Willi's feedback 2020-09-24 20:23:15 +02:00
Moritz Raabe
4691302a78 parse descriptions for statements 2020-09-24 15:35:30 +02:00
Willi Ballenthin
d8a32630fb Merge pull request #326 from fireeye/fix-325
main: fix reported total rule count
2020-09-23 16:07:22 -06:00
Willi Ballenthin
29b6bd8aad Merge pull request #324 from fireeye/fix-307
scripts: add script demonstrating bulk processing
2020-09-23 14:45:56 -06:00
William Ballenthin
c2516e7453 main: fix reported total rule count
closes #325
2020-09-23 11:19:01 -06:00
Willi Ballenthin
1fd8c3c068 Merge pull request #323 from fireeye/fix-306
use PyYAML CLoader to parse rules when available
2020-09-23 10:01:15 -06:00
William Ballenthin
314757a235 scripts: add script demonstrating bulk processing
closes #307
2020-09-23 09:13:49 -06:00
William Ballenthin
5b613903e5 rules: fix ordering of meta under py2 2020-09-23 06:32:22 -06:00
Capa Bot
b2caad9b4b Sync capa rules submodule 2020-09-22 18:49:29 +00:00
William Ballenthin
4b066e908c ci: use sudo to apt 2020-09-22 11:20:15 -06:00
William Ballenthin
041e443619 ci: install libyaml when appropriate 2020-09-22 11:18:15 -06:00
William Ballenthin
999bd84a86 rules: fall back to python pyyaml when libyaml not present 2020-09-22 11:06:48 -06:00
William Ballenthin
2a894fb5f6 rules: fall back to python based yaml parser when libyaml not present 2020-09-22 10:54:53 -06:00
William Ballenthin
79bf5c2d6b rules: use yaml.CLoader for better performance 2020-09-22 10:46:05 -06:00
Capa Bot
98298a3b2d Sync capa rules submodule 2020-09-21 18:03:51 +00:00
Capa Bot
71454c6400 Sync capa-testfiles submodule 2020-09-21 09:33:08 +00:00
Capa Bot
5e2e316474 Sync capa rules submodule 2020-09-18 20:47:00 +00:00
Capa Bot
6bca211267 Sync capa rules submodule 2020-09-18 18:37:14 +00:00
Moritz
f8cbc0a12d Merge pull request #321 from fireeye/ida/explorer-update-documentation
explorer: documentation updates, logo
2020-09-18 17:03:19 +02:00
Capa Bot
9708c89772 Sync capa rules submodule 2020-09-18 14:26:29 +00:00
Michael Hunhoff
29492bfdc8 fixing feature count for explorer progress indicator 2020-09-17 14:50:14 -06:00
Capa Bot
d2e05f03cc Sync capa rules submodule 2020-09-17 18:34:36 +00:00
Capa Bot
01bf7b3bd3 Sync capa rules submodule 2020-09-17 18:07:50 +00:00
Capa Bot
db790ab20c Sync capa-testfiles submodule 2020-09-17 18:01:18 +00:00
Capa Bot
71c19a1fbc Sync capa rules submodule 2020-09-17 15:02:03 +00:00
Capa Bot
73e9b6e804 Sync capa rules submodule 2020-09-17 15:01:25 +00:00
Michael Hunhoff
199e9fc81d Merge branch 'master' into ida/explorer-update-documentation 2020-09-16 13:55:24 -06:00
Michael Hunhoff
a9591aad1b updating explorer documentation link 2020-09-16 13:53:47 -06:00
Michael Hunhoff
0168f444d9 removing old .jpg, adding explorer logo, updating explorer readme 2020-09-16 13:33:11 -06:00
mike-hunhoff
4659ab0649 Merge pull request #316 from fireeye/fix-315
explorer: add additional check for invalid model index
2020-09-16 08:40:59 -06:00
Michael Hunhoff
49700ffb9f add check for invalid model index, fix 315 2020-09-16 08:27:38 -06:00
Moritz
6c6062d5a8 Update usage.md 2020-09-15 10:31:08 +02:00
Moritz
01e8b198c0 Update installation.md 2020-09-15 10:13:41 +02:00
53 changed files with 2742 additions and 227 deletions

BIN
.github/capa-explorer-logo.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

BIN
.github/capa-ida.jpg vendored

Binary file not shown.

Before

Width:  |  Height:  |  Size: 453 KiB

6
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"

View File

@@ -2,7 +2,7 @@ name: build
on:
release:
types: [created, edited]
types: [edited, published]
jobs:
build:
@@ -30,6 +30,12 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: 2.7
- if: matrix.os == 'ubuntu-latest'
run: sudo apt-get install -y libyaml-dev
- if: matrix.os == 'windows-latest'
run: |
choco install vcredist2008
choco install --ignore-dependencies vcpython27
- name: Install PyInstaller
# pyinstaller 4 doesn't support Python 2.7
run: pip install 'pyinstaller==3.*'
@@ -65,7 +71,7 @@ jobs:
- name: Set executable flag
run: chmod +x ${{ matrix.artifact_name }}
- name: Set zip name
run: echo ::set-env name=zip_name::capa-${GITHUB_REF#refs/tags/}-${{ matrix.asset_name }}.zip
run: echo "zip_name=capa-${GITHUB_REF#refs/tags/}-${{ matrix.asset_name }}.zip" >> $GITHUB_ENV
- name: Zip ${{ matrix.artifact_name }} into ${{ env.zip_name }}
run: zip ${{ env.zip_name }} ${{ matrix.artifact_name }}
- name: Upload ${{ env.zip_name }} to GH Release
@@ -74,4 +80,3 @@ jobs:
repo_token: ${{ secrets.GITHUB_TOKEN}}
file: ${{ env.zip_name }}
tag: ${{ github.ref }}

View File

@@ -45,13 +45,13 @@ jobs:
runs-on: ubuntu-latest
needs: [code_style, rule_linter]
strategy:
fail-fast: false
matrix:
include:
- python: 2.7
- python: 3.6
- python: 3.7
- python: 3.8
- python: '3.9.0-rc.1' # Python latest
- python: 3.9.1
steps:
- name: Checkout capa with submodules
uses: actions/checkout@v2
@@ -61,6 +61,8 @@ jobs:
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Install pyyaml
run: sudo apt-get install -y libyaml-dev
- name: Install capa
run: pip install -e .[dev]
- name: Run tests

View File

@@ -1,5 +1,133 @@
# Change Log
## v1.4.1 (2020-10-23)
This release fixes an issue building capa on our CI server, which prevented us from building standalone binaries for v1.4.1.
### Bug Fixes
- install VC dependencies for Python 2.7 during Windows build
### Raw diffs
- [capa v1.4.0...v1.4.1](https://github.com/fireeye/capa/compare/v1.4.0...v1.4.1)
- [capa-rules v1.4.0...v1.4.1](https://github.com/fireeye/capa-rules/compare/v1.4.0...v1.4.1)
## v1.4.0 (2020-10-23)
This capa release includes changes to the rule parsing, enhanced feature extraction, various bug fixes, and improved capa scripts. Everyone should benefit from the improved functionality and performance. The community helped to add 69 new rules. We appreciate everyone who opened issues, provided feedback, and contributed code and rules. A special shout out to the following new project contributors:
- @mwilliams31
- @yt0ng
@dzbeck added [Malware Behavior Catalog](https://github.com/MBCProject/mbc-markdown) (MBC) and ATT&CK mappings for 86 rules.
Download a standalone binary below and checkout the readme [here on GitHub](https://github.com/fireeye/capa/). Report issues on our [issue tracker](https://github.com/fireeye/capa/issues) and contribute new rules at [capa-rules](https://github.com/fireeye/capa-rules/).
### New features
- script that demonstrates bulk processing @williballenthin #307
- main: render MBC table @mr-tz #332
- ida backend: improve detection of APIs called via two or more chained thunks @mike-hunhoff #340
- viv backend: improve detection of APIs called via two or more chained thunks @mr-tz #341
- features: extract APIs called via jmp instruction @mr-tz #337
### New rules
- clear the Windows event log @mike-hunhoff
- crash the Windows event logging service @mike-hunhoff
- packed with kkrunchy @re-fox
- packed with nspack @re-fox
- packed with pebundle @re-fox
- packed with pelocknt @re-fox
- packed with peshield @re-fox
- packed with petite @re-fox
- packed with rlpack @re-fox
- packed with upack @re-fox
- packed with y0da crypter @re-fox
- compiled with rust @re-fox
- compute adler32 checksum @mwilliams31
- encrypt-data-using-hc-128 @recvfrom
- manipulate console @williballenthin
- references logon banner @re-fox
- terminate process via fastfail @re-fox
- delete volume shadow copies @mr-tz
- authenticate HMAC @mr-tz
- compiled from EPL @williballenthin
- compiled with Go @williballenthin
- create Restart Manager session @mike-hunhoff
- decode data using Base64 via WinAPI @mike-hunhoff
- empty recycle bin quietly @mwilliams31
- enumerate network shares @mike-hunhoff
- hook routines via microsoft detours @williballenthin
- hooked by API Override @williballenthin
- impersonate user @mike-hunhoff
- the @williballenthin packer detection package, thanks to Hexacorn for the data, see https://www.hexacorn.com/blog/2016/12/15/pe-section-names-re-visited/
- packed with CCG
- packed with Crunch
- packed with Dragon Armor
- packed with enigma
- packed with Epack
- packed with MaskPE
- packed with MEW
- packed with Mpress
- packed with Neolite
- packed with PECompact
- packed with Pepack
- packed with Perplex
- packed with ProCrypt
- packed with RPCrypt
- packed with SeauSFX
- packed with Shrinker
- packed with Simple Pack
- packed with StarForce
- packed with SVKP
- packed with Themida
- packed with TSULoader
- packed with VProtect
- packed with WWPACK
- rebuilt by ImpRec
- packaged as a Pintool
- packaged as a CreateInstall installer
- packaged as a WinZip self-extracting archive
- reference 114DNS DNS server @williballenthin
- reference AliDNS DNS server @williballenthin
- reference Cloudflare DNS server @williballenthin
- reference Comodo Secure DNS server @williballenthin
- reference Google Public DNS server @williballenthin
- reference Hurricane Electric DNS server @williballenthin
- reference kornet DNS server @williballenthin
- reference L3 DNS server @williballenthin
- reference OpenDNS DNS server @williballenthin
- reference Quad9 DNS server @williballenthin
- reference Verisign DNS server @williballenthin
- run as service @mike-hunhoff
- schedule task via ITaskService @mike-hunhoff
- references DNS over HTTPS endpoints @yt0ng
### Bug fixes
- ida plugin: fix tree-view exception @mike-hunhoff #315
- ida plugin: fix feature count @mike-hunhoff
- main: fix reported total rule count @williballenthin #325
- features: fix handling of API names with multiple periods @mike-hunhoff #329
- ida backend: find all byte sequences instead of only first @mike-hunhoff #335
- features: display 0 value @mr-tz #338
- ida backend: extract ordinal and name imports @mr-tz #343
- show-features: improvements and support within IDA @mr-tz #342
- main: sanity check MBC rendering @williballenthin
- main: handle sample path that contains non-ASCII characters @mr-tz #328
### Changes
- rules: use yaml.CLoader for better performance @williballenthin #306
- rules: parse descriptions for statements @mr-tz #312
### Raw diffs
- [capa v1.3.0...v1.4.0](https://github.com/fireeye/capa/compare/v1.3.0...v1.4.0)
- [capa-rules v1.3.0...v1.4.0](https://github.com/fireeye/capa-rules/compare/v1.3.0...v1.4.0)
## v1.3.0 (2020-09-14)
This release brings newly updated mappings to the [Malware Behavior Catalog version 2.0](https://github.com/MBCProject/mbc-markdown), many enhancements to the IDA Pro plugin, [flare-capa on PyPI](https://pypi.org/project/flare-capa/), a bunch of bug fixes to improve feature extraction, and four new rules. We received contributions from ten reverse engineers, including seven new ones:

View File

@@ -1,7 +1,7 @@
![capa](.github/logo.png)
[![CI status](https://github.com/fireeye/capa/workflows/CI/badge.svg)](https://github.com/fireeye/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
[![Number of rules](https://img.shields.io/badge/rules-345-blue.svg)](https://github.com/fireeye/capa-rules)
[![Number of rules](https://img.shields.io/badge/rules-455-blue.svg)](https://github.com/fireeye/capa-rules)
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)
capa detects capabilities in executable files.

View File

@@ -6,7 +6,6 @@
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
import copy
import collections

View File

@@ -16,6 +16,9 @@ import capa.engine
logger = logging.getLogger(__name__)
MAX_BYTES_FEATURE_SIZE = 0x100
# thunks may be chained so we specify a delta to control the depth to which these chains are explored
THUNK_CHAIN_DEPTH_DELTA = 5
# identifiers for supported architectures names that tweak a feature
# for example, offset/x32
ARCH_X32 = "x32"
@@ -74,7 +77,7 @@ class Feature(object):
return self.value
def __str__(self):
if self.value:
if self.value is not None:
if self.description:
return "%s(%s = %s)" % (self.name, self.get_value_str(), self.description)
else:

View File

@@ -8,6 +8,8 @@
import abc
from capa.helpers import oint
class FeatureExtractor(object):
"""
@@ -35,6 +37,12 @@ class FeatureExtractor(object):
#
super(FeatureExtractor, self).__init__()
def block_offset(self, bb):
return oint(bb)
def function_offset(self, f):
return oint(f)
@abc.abstractmethod
def get_base_address(self):
"""

View File

@@ -42,7 +42,9 @@ def is_ordinal(symbol):
"""
is the given symbol an ordinal that is prefixed by "#"?
"""
return symbol[0] == "#"
if symbol:
return symbol[0] == "#"
return False
def generate_symbols(dll, symbol):

View File

@@ -37,11 +37,11 @@ def check_segment_for_pe(seg):
)
for i in range(256)
]
todo = [
(capa.features.extractors.ida.helpers.find_byte_sequence(seg.start_ea, seg.end_ea, mzx), mzx, pex, i)
for mzx, pex, i in mz_xor
]
todo = [(off, mzx, pex, i) for (off, mzx, pex, i) in todo if off != idaapi.BADADDR]
todo = []
for (mzx, pex, i) in mz_xor:
for off in capa.features.extractors.ida.helpers.find_byte_sequence(seg.start_ea, seg.end_ea, mzx):
todo.append((off, mzx, pex, i))
while len(todo):
off, mzx, pex, i = todo.pop()
@@ -61,8 +61,7 @@ def check_segment_for_pe(seg):
if idc.get_bytes(peoff, 2) == pex:
yield (off, i)
nextres = capa.features.extractors.ida.helpers.find_byte_sequence(off + 1, seg.end_ea, mzx)
if nextres != -1:
for nextres in capa.features.extractors.ida.helpers.find_byte_sequence(off + 1, seg.end_ea, mzx):
todo.append((nextres, mzx, pex, i))
@@ -96,7 +95,14 @@ def extract_file_import_names():
- importname
"""
for (ea, info) in capa.features.extractors.ida.helpers.get_file_imports().items():
if info[1]:
if info[1] and info[2]:
# e.g. in mimikatz: ('cabinet', 'FCIAddFile', 11L)
# extract by name here and by ordinal below
for name in capa.features.extractors.helpers.generate_symbols(info[0], info[1]):
yield Import(name), ea
dll = info[0]
symbol = "#%d" % (info[2])
elif info[1]:
dll = info[0]
symbol = info[1]
elif info[2]:

View File

@@ -16,17 +16,24 @@ import ida_bytes
def find_byte_sequence(start, end, seq):
"""find byte sequence
"""yield all ea of a given byte sequence
args:
start: min virtual address
end: max virtual address
seq: bytes to search e.g. b'\x01\x03'
seq: bytes to search e.g. b"\x01\x03"
"""
if sys.version_info[0] >= 3:
return idaapi.find_binary(start, end, " ".join(["%02x" % b for b in seq]), 0, idaapi.SEARCH_DOWN)
seq = " ".join(["%02x" % b for b in seq])
else:
return idaapi.find_binary(start, end, " ".join(["%02x" % ord(b) for b in seq]), 0, idaapi.SEARCH_DOWN)
seq = " ".join(["%02x" % ord(b) for b in seq])
while True:
ea = idaapi.find_binary(start, end, seq, 0, idaapi.SEARCH_DOWN)
if ea == idaapi.BADADDR:
break
start = ea + 1
yield ea
def get_functions(start=None, end=None, skip_thunks=False, skip_libs=False):
@@ -159,6 +166,10 @@ def basic_block_size(bb):
def read_bytes_at(ea, count):
""" """
# check if byte has a value, see get_wide_byte doc
if not idc.is_loaded(ea):
return b""
segm_end = idc.get_segm_end(ea)
if ea + count > segm_end:
return idc.get_bytes(ea, segm_end - ea)

View File

@@ -12,7 +12,15 @@ import idautils
import capa.features.extractors.helpers
import capa.features.extractors.ida.helpers
from capa.features import ARCH_X32, ARCH_X64, MAX_BYTES_FEATURE_SIZE, Bytes, String, Characteristic
from capa.features import (
ARCH_X32,
ARCH_X64,
MAX_BYTES_FEATURE_SIZE,
THUNK_CHAIN_DEPTH_DELTA,
Bytes,
String,
Characteristic,
)
from capa.features.insn import API, Number, Offset, Mnemonic
# security cookie checks may perform non-zeroing XORs, these are expected within a certain
@@ -46,23 +54,34 @@ def get_imports(ctx):
def check_for_api_call(ctx, insn):
""" check instruction for API call """
if not idaapi.is_call_insn(insn):
if not insn.get_canon_mnem() in ("call", "jmp"):
return
for ref in idautils.CodeRefsFrom(insn.ea, False):
info = ()
ref = insn.ea
# attempt to resolve API calls by following chained thunks to a reasonable depth
for _ in range(THUNK_CHAIN_DEPTH_DELTA):
# assume only one code/data ref when resolving "call" or "jmp"
try:
ref = tuple(idautils.CodeRefsFrom(ref, False))[0]
except IndexError:
try:
# thunks may be marked as data refs
ref = tuple(idautils.DataRefsFrom(ref))[0]
except IndexError:
break
info = get_imports(ctx).get(ref, ())
if info:
yield "%s.%s" % (info[0], info[1])
else:
f = idaapi.get_func(ref)
# check if call to thunk
# TODO: first instruction might not always be the thunk
if f and (f.flags & idaapi.FUNC_THUNK):
for thunk_ref in idautils.DataRefsFrom(ref):
# TODO: always data ref for thunk??
info = get_imports(ctx).get(thunk_ref, ())
if info:
yield "%s.%s" % (info[0], info[1])
break
f = idaapi.get_func(ref)
if not f or not (f.flags & idaapi.FUNC_THUNK):
break
if info:
yield "%s.%s" % (info[0], info[1])
def extract_insn_api_features(f, bb, insn):
@@ -129,6 +148,9 @@ def extract_insn_bytes_features(f, bb, insn):
example:
push offset iid_004118d4_IShellLinkA ; riid
"""
if idaapi.is_call_insn(insn):
return
ref = capa.features.extractors.ida.helpers.find_data_reference_from_insn(insn)
if ref != insn.ea:
extracted_bytes = capa.features.extractors.ida.helpers.read_bytes_at(ref, MAX_BYTES_FEATURE_SIZE)
@@ -283,7 +305,7 @@ def extract_insn_nzxor_characteristic_features(f, bb, insn):
bb (IDA BasicBlock)
insn (IDA insn_t)
"""
if insn.itype != idaapi.NN_xor:
if insn.itype not in (idaapi.NN_xor, idaapi.NN_xorpd, idaapi.NN_xorps, idaapi.NN_pxor):
return
if capa.features.extractors.ida.helpers.is_operand_equal(insn.Op1, insn.Op2):
return

View File

@@ -0,0 +1,107 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import miasm.analysis.binary
import miasm.analysis.machine
from miasm.core.locationdb import LocationDB
import capa.features.extractors.miasm.file
import capa.features.extractors.miasm.insn
import capa.features.extractors.miasm.function
import capa.features.extractors.miasm.basicblock
from capa.features.extractors import FeatureExtractor
class MiasmFeatureExtractor(FeatureExtractor):
def __init__(self, buf):
super(MiasmFeatureExtractor, self).__init__()
self.buf = buf
self.loc_db = LocationDB()
self.container = miasm.analysis.binary.Container.from_string(buf, self.loc_db)
self.pe = self.container.executable
self.machine = miasm.analysis.machine.Machine(self.container.arch)
self.cfg = self._build_cfg()
def get_base_address(self):
return self.container.entry_point
def extract_file_features(self):
for feature, va in capa.features.extractors.miasm.file.extract_file_features(self):
yield feature, va
# TODO: Improve this function (it just considers all loc_keys target of calls a function), port to miasm
def get_functions(self):
"""
returns all loc_keys which are the argument of any call function
"""
functions = set()
for block in self.cfg.blocks:
for line in block.lines:
if line.is_subcall() and line.args[0].is_loc():
loc_key = line.args[0].loc_key
if loc_key not in functions:
functions.add(loc_key)
yield loc_key
def extract_function_features(self, loc_key):
for feature, va in capa.features.extractors.miasm.function.extract_features(self, loc_key):
yield feature, va
def block_offset(self, bb):
return bb.lines[0].offset
def function_offset(self, f):
return self.cfg.loc_key_to_block(f).lines[0].offset
def get_basic_blocks(self, loc_key):
"""
get the basic blocks of the function represented by lock_key
"""
block = self.cfg.loc_key_to_block(loc_key)
disassembler = self.machine.dis_engine(self.container.bin_stream, loc_db=self.loc_db, follow_call=False)
cfg = disassembler.dis_multiblock(self.block_offset(block))
return cfg.blocks
def extract_basic_block_features(self, _, bb):
for feature, va in capa.features.extractors.miasm.basicblock.extract_features(bb):
yield feature, va
def get_instructions(self, _, bb):
return bb.lines
def extract_insn_features(self, f, bb, insn):
for feature, va in capa.features.extractors.miasm.insn.extract_features(self, f, bb, insn):
yield feature, va
def _get_entry_points(self):
entry_points = {self.get_base_address()}
for _, va in miasm.jitter.loader.pe.get_export_name_addr_list(self.pe):
entry_points.add(va)
return entry_points
# This is more efficient that using the `blocks` argument in `dis_multiblock`
# See http://www.williballenthin.com/post/2020-01-12-miasm-part-2
# TODO: port this efficiency improvement to miasm
def _build_cfg(self):
loc_db = self.container.loc_db
disassembler = self.machine.dis_engine(self.container.bin_stream, follow_call=True, loc_db=loc_db)
job_done = set()
cfgs = {}
for va in self._get_entry_points():
cfgs[va] = disassembler.dis_multiblock(va, job_done=job_done)
complete_cfs = miasm.core.asmblock.AsmCFG(loc_db)
for cfg in cfgs.values():
complete_cfs.merge(cfg)
disassembler.apply_splitting(complete_cfs)
return complete_cfs

View File

@@ -0,0 +1,134 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
import string
import struct
from capa.features import Characteristic
from capa.features.basicblock import BasicBlock
from capa.features.extractors.helpers import MIN_STACKSTRING_LEN
# TODO: Avoid this duplication (this code is in __init__ as well)
def block_offset(bb):
return bb.lines[0].offset
def extract_bb_tight_loop(bb):
""" check basic block for tight loop indicators """
if any(c.loc_key == bb.loc_key for c in bb.bto):
yield Characteristic("tight loop"), block_offset(bb)
def is_mov_imm_to_stack(instr):
"""
Return if instruction moves immediate onto stack
"""
if not instr.name.startswith("MOV"):
return False
try:
dst, src = instr.args
except ValueError:
# not two operands
return False
if not src.is_int():
return False
if not dst.is_mem():
return False
# should detect things like `@8[ESP + 0x8]` and `EBP` and not fail in other cases
if any(register in str(dst) for register in ["EBP", "RBP", "ESP", "RSP"]):
return True
return False
def is_printable_ascii(chars):
if sys.version_info >= (3, 0):
return all(c < 127 and chr(c) in string.printable for c in chars)
else:
return all(ord(c) < 127 and c in string.printable for c in chars)
def is_printable_utf16le(chars):
if all(c == b"\x00" for c in chars[1::2]):
return is_printable_ascii(chars[::2])
def get_printable_len(insn):
"""
Return string length if all operand bytes are ascii or utf16-le printable
"""
dst, src = insn.args
if not src.is_int():
return ValueError("unexpected operand type")
if not dst.is_mem():
return ValueError("unexpected operand type")
if isinstance(src.arg, int):
val = src.arg
else:
val = src.arg.arg
size = (val.bit_length() + 7) // 8
if size == 0:
return 0
elif size == 1:
chars = struct.pack("<B", val)
elif size == 2:
chars = struct.pack("<H", val)
elif size == 4:
chars = struct.pack("<I", val)
elif size == 8:
chars = struct.pack("<Q", val)
if is_printable_ascii(chars):
return size
if is_printable_utf16le(chars):
return size / 2
return 0
def extract_stackstring(bb):
""" check basic block for stackstring indicators """
count = 0
for line in bb.lines:
if is_mov_imm_to_stack(line):
count += get_printable_len(line)
if count > MIN_STACKSTRING_LEN:
yield Characteristic("stack string"), block_offset(bb)
return
def extract_features(bb):
"""
extract features from the given basic block.
args:
bb (miasm.core.asmblock.AsmBlock): the basic block to process.
yields:
Feature, set[VA]: the features and their location found in this basic block.
"""
yield BasicBlock(), block_offset(bb)
for bb_handler in BASIC_BLOCK_HANDLERS:
for feature, va in bb_handler(bb):
yield feature, va
BASIC_BLOCK_HANDLERS = (
extract_bb_tight_loop,
extract_stackstring,
)

View File

@@ -0,0 +1,102 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import re
import miasm.analysis.binary
import capa.features.extractors.strings
from capa.features import String, Characteristic
from capa.features.file import Export, Import, Section
def extract_file_embedded_pe(extractor):
"""
extract embedded PE features
"""
buf = extractor.buf
for match in re.finditer(b"MZ", buf):
offset = match.start()
subcontainer = miasm.analysis.binary.ContainerPE.from_string(buf[offset:], loc_db=extractor.loc_db)
if isinstance(subcontainer, miasm.analysis.binary.ContainerPE):
yield Characteristic("embedded pe"), offset
def extract_file_export_names(extractor):
"""
extract file exports and their addresses
"""
for symbol, va in miasm.jitter.loader.pe.get_export_name_addr_list(extractor.pe):
# Only use func names and not ordinals
if isinstance(symbol, str):
yield Export(symbol), va
def extract_file_import_names(extractor):
"""
extract imported function names and their addresses
1. imports by ordinal:
- modulename.#ordinal
2. imports by name, results in two features to support importname-only matching:
- modulename.importname
- importname
"""
for ((dll, symbol), va_set) in miasm.jitter.loader.pe.get_import_address_pe(extractor.pe).items():
dll_name = dll[:-4] # Remove .dll
for va in va_set:
if isinstance(symbol, int):
yield Import("%s.#%s" % (dll_name, symbol)), va
else:
yield Import("%s.%s" % (dll_name, symbol)), va
yield Import(symbol), va
def extract_file_section_names(extractor):
"""
extract file sections and their addresses
"""
for section in extractor.pe.SHList.shlist:
name = section.name.partition(b"\x00")[0].decode("ascii")
va = section.addr
yield Section(name), va
def extract_file_strings(extractor):
"""
extract ASCII and UTF-16 LE strings from file
"""
for s in capa.features.extractors.strings.extract_ascii_strings(extractor.buf):
yield String(s.s), s.offset
for s in capa.features.extractors.strings.extract_unicode_strings(extractor.buf):
yield String(s.s), s.offset
def extract_file_features(extractor):
"""
extract file features from given buffer and parsed binary
args:
buf (bytes): binary content
container (miasm.analysis.binary.ContainerPE): parsed binary returned by miasm
yields:
Tuple[Feature, VA]: a feature and its location.
"""
for file_handler in FILE_HANDLERS:
for feature, va in file_handler(extractor):
yield feature, va
FILE_HANDLERS = (
extract_file_embedded_pe,
extract_file_export_names,
extract_file_import_names,
extract_file_section_names,
extract_file_strings,
)

View File

@@ -0,0 +1,50 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from capa.features import Characteristic
def extract_function_calls_to(extractor, loc_key):
for pred_key in extractor.cfg.predecessors(loc_key):
pred_block = extractor.cfg.loc_key_to_block(pred_key)
pred_insn = pred_block.get_subcall_instr()
if pred_insn and pred_insn.is_subcall():
dst = pred_insn.args[0]
if dst.is_loc() and dst.loc_key == loc_key:
yield Characteristic("calls to"), pred_insn.offset
def extract_function_loop(extractor, loc_key):
"""
returns if the function has a loop
"""
block = extractor.cfg.loc_key_to_block(loc_key)
disassembler = extractor.machine.dis_engine(
extractor.container.bin_stream, loc_db=extractor.loc_db, follow_call=False
)
offset = extractor.block_offset(block)
cfg = disassembler.dis_multiblock(offset)
if cfg.has_loop():
yield Characteristic("loop"), offset
def extract_features(extractor, loc_key):
"""
extract features from the given function.
args:
cfg (AsmCFG): the CFG of the function from which to extract features
loc_key (LocKey): LocKey which represents the beginning of the function
yields:
Feature, set[VA]: the features and their location found in this function.
"""
for func_handler in FUNCTION_HANDLERS:
for feature, va in func_handler(extractor, loc_key):
yield feature, va
FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop)

View File

@@ -0,0 +1,126 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import miasm.expression.expression
import capa.features.extractors.helpers
from capa.features.insn import Mnemonic
# TODO: remove duplication (similar code in file.py)
# TODO: this function should be cached
def get_imports(pe):
imports = {}
for ((dll, symbol), va_set) in miasm.jitter.loader.pe.get_import_address_pe(pe).items():
dll_name = dll[:-4]
for va in va_set:
if isinstance(symbol, int):
imports[va] = "%s.#%s" % (dll_name, symbol)
else:
imports[va] = "%s.%s" % (dll_name, symbol)
return imports
def extract_insn_api_features(extractor, _f, _bb, insn):
"""parse API features from the given instruction."""
if insn.is_subcall():
arg = insn.args[0]
if isinstance(arg, miasm.expression.expression.ExprMem) and isinstance(
arg.ptr, miasm.expression.expression.ExprInt
):
target = int(arg.ptr)
imports = get_imports(extractor.pe)
if target in imports:
dll, _, symbol = imports[target].rpartition(".")
for feature in capa.features.extractors.helpers.generate_symbols(dll, symbol):
yield feature, insn.offset
def extract_insn_number_features(extractor, f, bb, insn):
"""parse number features from the given instruction."""
raise NotImplementedError()
def extract_insn_string_features(extractor, f, bb, insn):
"""parse string features from the given instruction."""
raise NotImplementedError()
def extract_insn_offset_features(extractor, f, bb, insn):
"""parse structure offset features from the given instruction."""
raise NotImplementedError()
def extract_insn_nzxor_characteristic_features(extractor, f, bb, insn):
"""
parse non-zeroing XOR instruction from the given instruction.
ignore expected non-zeroing XORs, e.g. security cookies.
"""
raise NotImplementedError()
def extract_insn_mnemonic_features(extractor, f, bb, insn):
"""parse mnemonic features from the given instruction."""
yield Mnemonic(insn.name), insn.offset
def extract_insn_peb_access_characteristic_features(extractor, f, bb, insn):
"""
parse peb access from the given function. fs:[0x30] on x86, gs:[0x60] on x64
"""
raise NotImplementedError()
def extract_insn_segment_access_features(extractor, f, bb, insn):
""" parse the instruction for access to fs or gs """
raise NotImplementedError()
def extract_insn_cross_section_cflow(extractor, f, bb, insn):
"""
inspect the instruction for a CALL or JMP that crosses section boundaries.
"""
raise NotImplementedError()
# this is a feature that's most relevant at the function scope,
# however, its most efficient to extract at the instruction scope.
def extract_function_calls_from(f, bb, insn):
raise NotImplementedError()
def extract_features(extractor, f, bb, insn):
"""
extract features from the given insn.
args:
extractor (MiasmFeatureExtractor)
f (miasm.expression.expression.LocKey): the function from which to extract features
bb (miasm.core.asmblock.AsmBlock): the basic block to process.
insn (Instruction): the instruction to process.
yields:
Feature, set[VA]: the features and their location found in this insn.
"""
for insn_handler in INSTRUCTION_HANDLERS:
for feature, va in insn_handler(extractor, f, bb, insn):
yield feature, va
INSTRUCTION_HANDLERS = (
extract_insn_api_features,
# extract_insn_number_features,
# extract_insn_string_features,
# extract_insn_bytes_features,
# extract_insn_offset_features,
# extract_insn_nzxor_characteristic_features,
extract_insn_mnemonic_features,
# extract_insn_peb_access_characteristic_features,
# extract_insn_cross_section_cflow,
# extract_insn_segment_access_features,
# extract_function_calls_from,
# extract_function_indirect_call_characteristic_features,
)

View File

@@ -0,0 +1,52 @@
import sys
import types
from smda.common.SmdaReport import SmdaReport
from smda.common.SmdaInstruction import SmdaInstruction
import capa.features.extractors.smda.file
import capa.features.extractors.smda.insn
import capa.features.extractors.smda.function
import capa.features.extractors.smda.basicblock
from capa.main import UnsupportedRuntimeError
from capa.features.extractors import FeatureExtractor
class SmdaFeatureExtractor(FeatureExtractor):
def __init__(self, smda_report: SmdaReport, path):
super(SmdaFeatureExtractor, self).__init__()
if sys.version_info < (3, 0):
raise UnsupportedRuntimeError("SMDA should only be used with Python 3.")
self.smda_report = smda_report
self.path = path
def get_base_address(self):
return self.smda_report.base_addr
def extract_file_features(self):
for feature, va in capa.features.extractors.smda.file.extract_features(self.smda_report, self.path):
yield feature, va
def get_functions(self):
for function in self.smda_report.getFunctions():
yield function
def extract_function_features(self, f):
for feature, va in capa.features.extractors.smda.function.extract_features(f):
yield feature, va
def get_basic_blocks(self, f):
for bb in f.getBlocks():
yield bb
def extract_basic_block_features(self, f, bb):
for feature, va in capa.features.extractors.smda.basicblock.extract_features(f, bb):
yield feature, va
def get_instructions(self, f, bb):
for smda_ins in bb.getInstructions():
yield smda_ins
def extract_insn_features(self, f, bb, insn):
for feature, va in capa.features.extractors.smda.insn.extract_features(f, bb, insn):
yield feature, va

View File

@@ -0,0 +1,131 @@
import sys
import string
import struct
from capa.features import Characteristic
from capa.features.basicblock import BasicBlock
from capa.features.extractors.helpers import MIN_STACKSTRING_LEN
def _bb_has_tight_loop(f, bb):
"""
parse tight loops, true if last instruction in basic block branches to bb start
"""
return bb.offset in f.blockrefs[bb.offset] if bb.offset in f.blockrefs else False
def extract_bb_tight_loop(f, bb):
""" check basic block for tight loop indicators """
if _bb_has_tight_loop(f, bb):
yield Characteristic("tight loop"), bb.offset
def _bb_has_stackstring(f, bb):
"""
extract potential stackstring creation, using the following heuristics:
- basic block contains enough moves of constant bytes to the stack
"""
count = 0
for instr in bb.getInstructions():
if is_mov_imm_to_stack(instr):
count += get_printable_len(instr.getDetailed())
if count > MIN_STACKSTRING_LEN:
return True
return False
def get_operands(smda_ins):
return [o.strip() for o in smda_ins.operands.split(",")]
def extract_stackstring(f, bb):
""" check basic block for stackstring indicators """
if _bb_has_stackstring(f, bb):
yield Characteristic("stack string"), bb.offset
def is_mov_imm_to_stack(smda_ins):
"""
Return if instruction moves immediate onto stack
"""
if not smda_ins.mnemonic.startswith("mov"):
return False
try:
dst, src = get_operands(smda_ins)
except ValueError:
# not two operands
return False
try:
int(src, 16)
except ValueError:
return False
if not any(regname in dst for regname in ["ebp", "rbp", "esp", "rsp"]):
return False
return True
def is_printable_ascii(chars):
return all(c < 127 and chr(c) in string.printable for c in chars)
def is_printable_utf16le(chars):
if all(c == 0x00 for c in chars[1::2]):
return is_printable_ascii(chars[::2])
def get_printable_len(instr):
"""
Return string length if all operand bytes are ascii or utf16-le printable
Works on a capstone instruction
"""
# should have exactly two operands for mov immediate
if len(instr.operands) != 2:
return 0
op_value = instr.operands[1].value.imm
if instr.imm_size == 1:
chars = struct.pack("<B", op_value & 0xFF)
elif instr.imm_size == 2:
chars = struct.pack("<H", op_value & 0xFFFF)
elif instr.imm_size == 4:
chars = struct.pack("<I", op_value & 0xFFFFFFFF)
elif instr.imm_size == 8:
chars = struct.pack("<Q", op_value & 0xFFFFFFFFFFFFFFFF)
else:
raise ValueError("Unhandled operand data type 0x%x." % instr.imm_size)
if is_printable_ascii(chars):
return instr.imm_size
if is_printable_utf16le(chars):
return instr.imm_size // 2
return 0
def extract_features(f, bb):
"""
extract features from the given basic block.
args:
f (smda.common.SmdaFunction): the function from which to extract features
bb (smda.common.SmdaBasicBlock): the basic block to process.
yields:
Feature, set[VA]: the features and their location found in this basic block.
"""
yield BasicBlock(), bb.offset
for bb_handler in BASIC_BLOCK_HANDLERS:
for feature, va in bb_handler(f, bb):
yield feature, va
BASIC_BLOCK_HANDLERS = (
extract_bb_tight_loop,
extract_stackstring,
)

View File

@@ -0,0 +1,139 @@
import struct
# if we have SMDA we definitely have lief
import lief
import capa.features.extractors.helpers
import capa.features.extractors.strings
from capa.features import String, Characteristic
from capa.features.file import Export, Import, Section
def carve(pbytes, offset=0):
"""
Return a list of (offset, size, xor) tuples of embedded PEs
Based on the version from vivisect:
https://github.com/vivisect/vivisect/blob/7be4037b1cecc4551b397f840405a1fc606f9b53/PE/carve.py#L19
And its IDA adaptation:
capa/features/extractors/ida/file.py
"""
mz_xor = [
(
capa.features.extractors.helpers.xor_static(b"MZ", i),
capa.features.extractors.helpers.xor_static(b"PE", i),
i,
)
for i in range(256)
]
pblen = len(pbytes)
todo = [(pbytes.find(mzx, offset), mzx, pex, i) for mzx, pex, i in mz_xor]
todo = [(off, mzx, pex, i) for (off, mzx, pex, i) in todo if off != -1]
while len(todo):
off, mzx, pex, i = todo.pop()
# The MZ header has one field we will check
# e_lfanew is at 0x3c
e_lfanew = off + 0x3C
if pblen < (e_lfanew + 4):
continue
newoff = struct.unpack("<I", capa.features.extractors.helpers.xor_static(pbytes[e_lfanew : e_lfanew + 4], i))[0]
nextres = pbytes.find(mzx, off + 1)
if nextres != -1:
todo.append((nextres, mzx, pex, i))
peoff = off + newoff
if pblen < (peoff + 2):
continue
if pbytes[peoff : peoff + 2] == pex:
yield (off, i)
def extract_file_embedded_pe(smda_report, file_path):
with open(file_path, "rb") as f:
fbytes = f.read()
for offset, i in carve(fbytes, 1):
yield Characteristic("embedded pe"), offset
def extract_file_export_names(smda_report, file_path):
lief_binary = lief.parse(file_path)
if lief_binary is not None:
for function in lief_binary.exported_functions:
yield Export(function.name), function.address
def extract_file_import_names(smda_report, file_path):
# extract import table info via LIEF
lief_binary = lief.parse(file_path)
if not isinstance(lief_binary, lief.PE.Binary):
return
for imported_library in lief_binary.imports:
library_name = imported_library.name.lower()
library_name = library_name[:-4] if library_name.endswith(".dll") else library_name
for func in imported_library.entries:
if func.name:
va = func.iat_address + smda_report.base_addr
for name in capa.features.extractors.helpers.generate_symbols(library_name, func.name):
yield Import(name), va
elif func.is_ordinal:
for name in capa.features.extractors.helpers.generate_symbols(library_name, "#%s" % func.ordinal):
yield Import(name), va
def extract_file_section_names(smda_report, file_path):
lief_binary = lief.parse(file_path)
if not isinstance(lief_binary, lief.PE.Binary):
return
if lief_binary and lief_binary.sections:
base_address = lief_binary.optional_header.imagebase
for section in lief_binary.sections:
yield Section(section.name), base_address + section.virtual_address
def extract_file_strings(smda_report, file_path):
"""
extract ASCII and UTF-16 LE strings from file
"""
with open(file_path, "rb") as f:
b = f.read()
for s in capa.features.extractors.strings.extract_ascii_strings(b):
yield String(s.s), s.offset
for s in capa.features.extractors.strings.extract_unicode_strings(b):
yield String(s.s), s.offset
def extract_features(smda_report, file_path):
"""
extract file features from given workspace
args:
smda_report (smda.common.SmdaReport): a SmdaReport
file_path: path to the input file
yields:
Tuple[Feature, VA]: a feature and its location.
"""
for file_handler in FILE_HANDLERS:
result = file_handler(smda_report, file_path)
for feature, va in file_handler(smda_report, file_path):
yield feature, va
FILE_HANDLERS = (
extract_file_embedded_pe,
extract_file_export_names,
extract_file_import_names,
extract_file_section_names,
extract_file_strings,
)

View File

@@ -0,0 +1,38 @@
from capa.features import Characteristic
from capa.features.extractors import loops
def extract_function_calls_to(f):
for inref in f.inrefs:
yield Characteristic("calls to"), inref
def extract_function_loop(f):
"""
parse if a function has a loop
"""
edges = []
for bb_from, bb_tos in f.blockrefs.items():
for bb_to in bb_tos:
edges.append((bb_from, bb_to))
if edges and loops.has_loop(edges):
yield Characteristic("loop"), f.offset
def extract_features(f):
"""
extract features from the given function.
args:
f (smda.common.SmdaFunction): the function from which to extract features
yields:
Feature, set[VA]: the features and their location found in this function.
"""
for func_handler in FUNCTION_HANDLERS:
for feature, va in func_handler(f):
yield feature, va
FUNCTION_HANDLERS = (extract_function_calls_to, extract_function_loop)

View File

@@ -0,0 +1,393 @@
import re
import string
import struct
from smda.common.SmdaReport import SmdaReport
import capa.features.extractors.helpers
from capa.features import (
ARCH_X32,
ARCH_X64,
MAX_BYTES_FEATURE_SIZE,
THUNK_CHAIN_DEPTH_DELTA,
Bytes,
String,
Characteristic,
)
from capa.features.insn import API, Number, Offset, Mnemonic
# security cookie checks may perform non-zeroing XORs, these are expected within a certain
# byte range within the first and returning basic blocks, this helps to reduce FP features
SECURITY_COOKIE_BYTES_DELTA = 0x40
PATTERN_HEXNUM = re.compile(r"[+\-] (?P<num>0x[a-fA-F0-9]+)")
PATTERN_SINGLENUM = re.compile(r"[+\-] (?P<num>[0-9])")
def get_arch(smda_report):
if smda_report.architecture == "intel":
if smda_report.bitness == 32:
return ARCH_X32
elif smda_report.bitness == 64:
return ARCH_X64
else:
raise NotImplementedError
def extract_insn_api_features(f, bb, insn):
"""parse API features from the given instruction."""
if insn.offset in f.apirefs:
api_entry = f.apirefs[insn.offset]
# reformat
dll_name, api_name = api_entry.split("!")
dll_name = dll_name.split(".")[0]
dll_name = dll_name.lower()
for name in capa.features.extractors.helpers.generate_symbols(dll_name, api_name):
yield API(name), insn.offset
elif insn.offset in f.outrefs:
current_function = f
current_instruction = insn
for index in range(THUNK_CHAIN_DEPTH_DELTA):
if current_function and len(current_function.outrefs[current_instruction.offset]) == 1:
target = current_function.outrefs[current_instruction.offset][0]
referenced_function = current_function.smda_report.getFunction(target)
if referenced_function:
# TODO SMDA: implement this function for both jmp and call, checking if function has 1 instruction which refs an API
if referenced_function.isApiThunk():
api_entry = (
referenced_function.apirefs[target] if target in referenced_function.apirefs else None
)
if api_entry:
# reformat
dll_name, api_name = api_entry.split("!")
dll_name = dll_name.split(".")[0]
dll_name = dll_name.lower()
for name in capa.features.extractors.helpers.generate_symbols(dll_name, api_name):
yield API(name), insn.offset
elif referenced_function.num_instructions == 1 and referenced_function.num_outrefs == 1:
current_function = referenced_function
current_instruction = [i for i in referenced_function.getInstructions()][0]
else:
return
def extract_insn_number_features(f, bb, insn):
"""parse number features from the given instruction."""
# example:
#
# push 3136B0h ; dwControlCode
operands = [o.strip() for o in insn.operands.split(",")]
if insn.mnemonic == "add" and operands[0] in ["esp", "rsp"]:
# skip things like:
#
# .text:00401140 call sub_407E2B
# .text:00401145 add esp, 0Ch
return
for operand in operands:
try:
yield Number(int(operand, 16)), insn.offset
yield Number(int(operand, 16), arch=get_arch(f.smda_report)), insn.offset
except:
continue
def read_bytes(smda_report, va, num_bytes=None):
"""
read up to MAX_BYTES_FEATURE_SIZE from the given address.
"""
rva = va - smda_report.base_addr
if smda_report.buffer is None:
return
buffer_end = len(smda_report.buffer)
max_bytes = num_bytes if num_bytes is not None else MAX_BYTES_FEATURE_SIZE
if rva + max_bytes > buffer_end:
return smda_report.buffer[rva:]
else:
return smda_report.buffer[rva : rva + max_bytes]
def derefs(smda_report, p):
"""
recursively follow the given pointer, yielding the valid memory addresses along the way.
useful when you may have a pointer to string, or pointer to pointer to string, etc.
this is a "do what i mean" type of helper function.
based on the implementation in viv/insn.py
"""
depth = 0
while True:
if not smda_report.isAddrWithinMemoryImage(p):
return
yield p
bytes_ = read_bytes(smda_report, p, num_bytes=4)
val = struct.unpack("I", bytes_)[0]
# sanity: pointer points to self
if val == p:
return
# sanity: avoid chains of pointers that are unreasonably deep
depth += 1
if depth > 10:
return
p = val
def extract_insn_bytes_features(f, bb, insn):
"""
parse byte sequence features from the given instruction.
example:
# push offset iid_004118d4_IShellLinkA ; riid
"""
for data_ref in insn.getDataRefs():
for v in derefs(f.smda_report, data_ref):
bytes_read = read_bytes(f.smda_report, v)
if bytes_read is None:
continue
if capa.features.extractors.helpers.all_zeros(bytes_read):
continue
yield Bytes(bytes_read), insn.offset
def detect_ascii_len(smda_report, offset):
if smda_report.buffer is None:
return 0
ascii_len = 0
rva = offset - smda_report.base_addr
char = smda_report.buffer[rva]
while char < 127 and chr(char) in string.printable:
ascii_len += 1
rva += 1
char = smda_report.buffer[rva]
if char == 0:
return ascii_len
return 0
def detect_unicode_len(smda_report, offset):
if smda_report.buffer is None:
return 0
unicode_len = 0
rva = offset - smda_report.base_addr
char = smda_report.buffer[rva]
second_char = smda_report.buffer[rva + 1]
while char < 127 and chr(char) in string.printable and second_char == 0:
unicode_len += 2
rva += 2
char = smda_report.buffer[rva]
second_char = smda_report.buffer[rva + 1]
if char == 0 and second_char == 0:
return unicode_len
return 0
def read_string(smda_report, offset):
alen = detect_ascii_len(smda_report, offset)
if alen > 1:
return read_bytes(smda_report, offset, alen).decode("utf-8")
ulen = detect_unicode_len(smda_report, offset)
if ulen > 2:
return read_bytes(smda_report, offset, ulen).decode("utf-16")
def extract_insn_string_features(f, bb, insn):
"""parse string features from the given instruction."""
# example:
#
# push offset aAcr ; "ACR > "
for data_ref in insn.getDataRefs():
for v in derefs(f.smda_report, data_ref):
string_read = read_string(f.smda_report, v)
if string_read:
yield String(string_read.rstrip("\x00")), insn.offset
def extract_insn_offset_features(f, bb, insn):
"""parse structure offset features from the given instruction."""
# examples:
#
# mov eax, [esi + 4]
# mov eax, [esi + ecx + 16384]
operands = [o.strip() for o in insn.operands.split(",")]
for operand in operands:
if not "ptr" in operand:
continue
if "esp" in operand or "ebp" in operand or "rbp" in operand:
continue
number = 0
number_hex = re.search(PATTERN_HEXNUM, operand)
number_int = re.search(PATTERN_SINGLENUM, operand)
if number_hex:
number = int(number_hex.group("num"), 16)
number = -1 * number if number_hex.group().startswith("-") else number
elif number_int:
number = int(number_int.group("num"))
number = -1 * number if number_int.group().startswith("-") else number
yield Offset(number), insn.offset
yield Offset(number, arch=get_arch(f.smda_report)), insn.offset
def is_security_cookie(f, bb, insn):
"""
check if an instruction is related to security cookie checks
"""
# security cookie check should use SP or BP
operands = [o.strip() for o in insn.operands.split(",")]
if operands[1] not in ["esp", "ebp", "rsp", "rbp"]:
return False
for index, block in enumerate(f.getBlocks()):
# expect security cookie init in first basic block within first bytes (instructions)
block_instructions = [i for i in block.getInstructions()]
if index == 0 and insn.offset < (block_instructions[0].offset + SECURITY_COOKIE_BYTES_DELTA):
return True
# ... or within last bytes (instructions) before a return
if block_instructions[-1].mnemonic.startswith("ret") and insn.offset > (
block_instructions[-1].offset - SECURITY_COOKIE_BYTES_DELTA
):
return True
return False
def extract_insn_nzxor_characteristic_features(f, bb, insn):
"""
parse non-zeroing XOR instruction from the given instruction.
ignore expected non-zeroing XORs, e.g. security cookies.
"""
if insn.mnemonic not in ("xor", "xorpd", "xorps", "pxor"):
return
operands = [o.strip() for o in insn.operands.split(",")]
if operands[0] == operands[1]:
return
if is_security_cookie(f, bb, insn):
return
yield Characteristic("nzxor"), insn.offset
def extract_insn_mnemonic_features(f, bb, insn):
"""parse mnemonic features from the given instruction."""
yield Mnemonic(insn.mnemonic), insn.offset
def extract_insn_peb_access_characteristic_features(f, bb, insn):
"""
parse peb access from the given function. fs:[0x30] on x86, gs:[0x60] on x64
"""
if insn.mnemonic not in ["push", "mov"]:
return
operands = [o.strip() for o in insn.operands.split(",")]
for operand in operands:
if "fs:" in operand and "0x30" in operand:
yield Characteristic("peb access"), insn.offset
elif "gs:" in operand and "0x60" in operand:
yield Characteristic("peb access"), insn.offset
def extract_insn_segment_access_features(f, bb, insn):
""" parse the instruction for access to fs or gs """
operands = [o.strip() for o in insn.operands.split(",")]
for operand in operands:
if "fs:" in operand:
yield Characteristic("fs access"), insn.offset
elif "gs:" in operand:
yield Characteristic("gs access"), insn.offset
def extract_insn_cross_section_cflow(f, bb, insn):
"""
inspect the instruction for a CALL or JMP that crosses section boundaries.
"""
if insn.mnemonic in ["call", "jmp"]:
if insn.offset in f.apirefs:
return
smda_report = insn.smda_function.smda_report
if insn.offset in f.outrefs:
for target in f.outrefs[insn.offset]:
if smda_report.getSection(insn.offset) != smda_report.getSection(target):
yield Characteristic("cross section flow"), insn.offset
elif insn.operands.startswith("0x"):
target = int(insn.operands, 16)
if smda_report.getSection(insn.offset) != smda_report.getSection(target):
yield Characteristic("cross section flow"), insn.offset
# this is a feature that's most relevant at the function scope,
# however, its most efficient to extract at the instruction scope.
def extract_function_calls_from(f, bb, insn):
if insn.mnemonic != "call":
return
if insn.offset in f.outrefs:
for outref in f.outrefs[insn.offset]:
yield Characteristic("calls from"), outref
if outref == f.offset:
# if we found a jump target and it's the function address
# mark as recursive
yield Characteristic("recursive call"), outref
if insn.offset in f.apirefs:
yield Characteristic("calls from"), f.apirefs[insn.offset]
# this is a feature that's most relevant at the function or basic block scope,
# however, its most efficient to extract at the instruction scope.
def extract_function_indirect_call_characteristic_features(f, bb, insn):
"""
extract indirect function call characteristic (e.g., call eax or call dword ptr [edx+4])
does not include calls like => call ds:dword_ABD4974
"""
if insn.mnemonic != "call":
return
if insn.operands.startswith("0x"):
return False
if "qword ptr" in insn.operands and "rip" in insn.operands:
return False
if insn.operands.startswith("dword ptr [0x"):
return False
# call edx
# call dword ptr [eax+50h]
# call qword ptr [rsp+78h]
yield Characteristic("indirect call"), insn.offset
def extract_features(f, bb, insn):
"""
extract features from the given insn.
args:
f (smda.common.SmdaFunction): the function to process.
bb (smda.common.SmdaBasicBlock): the basic block to process.
insn (smda.common.SmdaInstruction): the instruction to process.
yields:
Feature, set[VA]: the features and their location found in this insn.
"""
for insn_handler in INSTRUCTION_HANDLERS:
for feature, va in insn_handler(f, bb, insn):
yield feature, va
INSTRUCTION_HANDLERS = (
extract_insn_api_features,
extract_insn_number_features,
extract_insn_string_features,
extract_insn_bytes_features,
extract_insn_offset_features,
extract_insn_nzxor_characteristic_features,
extract_insn_mnemonic_features,
extract_insn_peb_access_characteristic_features,
extract_insn_cross_section_cflow,
extract_insn_segment_access_features,
extract_function_calls_from,
extract_function_indirect_call_characteristic_features,
)

View File

@@ -0,0 +1,20 @@
# Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
from vivisect.const import XR_TO, REF_CODE
def get_coderef_from(vw, va):
"""
return first code `tova` whose origin is the specified va
return None if no code reference is found
"""
xrefs = vw.getXrefsFrom(va, REF_CODE)
if len(xrefs) > 0:
return xrefs[0][XR_TO]
else:
return None

View File

@@ -132,7 +132,7 @@ def is_indirect_call(vw, va, insn=None):
if insn is None:
insn = vw.parseOpcode(va)
return insn.mnem == "call" and isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper)
return insn.mnem in ("call", "jmp") and isinstance(insn.opers[0], envi.archs.i386.disasm.i386RegOper)
def resolve_indirect_call(vw, va, insn=None):

View File

@@ -7,11 +7,19 @@
# See the License for the specific language governing permissions and limitations under the License.
import envi.memory
import vivisect.const
import envi.archs.i386.disasm
import capa.features.extractors.helpers
from capa.features import ARCH_X32, ARCH_X64, MAX_BYTES_FEATURE_SIZE, Bytes, String, Characteristic
import capa.features.extractors.viv.helpers
from capa.features import (
ARCH_X32,
ARCH_X64,
MAX_BYTES_FEATURE_SIZE,
THUNK_CHAIN_DEPTH_DELTA,
Bytes,
String,
Characteristic,
)
from capa.features.insn import API, Number, Offset, Mnemonic
from capa.features.extractors.viv.indirect_calls import NotFoundError, resolve_indirect_call
@@ -67,9 +75,13 @@ def extract_insn_api_features(f, bb, insn):
#
# call dword [0x00473038]
if insn.mnem != "call":
if insn.mnem not in ("call", "jmp"):
return
if insn.mnem == "jmp":
if f.vw.getFunctionMeta(f.va, "Thunk"):
return
# traditional call via IAT
if isinstance(insn.opers[0], envi.archs.i386.disasm.i386ImmMemOper):
oper = insn.opers[0]
@@ -86,21 +98,29 @@ def extract_insn_api_features(f, bb, insn):
#
# this is also how calls to internal functions may be decoded on x64.
# see Lab21-01.exe_:0x140001178
elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386PcRelOper):
target = insn.opers[0].getOperValue(insn)
#
# follow chained thunks, e.g. in 82bf6347acf15e5d883715dc289d8a2b at 0x14005E0FF in
# 0x140059342 (viv) / 0x14005E0C0 (IDA)
# 14005E0FF call j_ElfClearEventLogFileW (14005AAF8)
# 14005AAF8 jmp ElfClearEventLogFileW (14005E196)
# 14005E196 jmp cs:__imp_ElfClearEventLogFileW
try:
thunk = f.vw.getFunctionMeta(target, "Thunk")
except vivisect.exc.InvalidFunction:
elif isinstance(insn.opers[0], envi.archs.i386.disasm.i386PcRelOper):
imports = get_imports(f.vw)
target = capa.features.extractors.viv.helpers.get_coderef_from(f.vw, insn.va)
if not target:
return
else:
if thunk:
dll, _, symbol = thunk.rpartition(".")
if symbol.startswith("ord"):
symbol = "#" + symbol[len("ord") :]
for _ in range(THUNK_CHAIN_DEPTH_DELTA):
if target in imports:
dll, symbol = imports[target]
for name in capa.features.extractors.helpers.generate_symbols(dll, symbol):
yield API(name), insn.va
target = capa.features.extractors.viv.helpers.get_coderef_from(f.vw, target)
if not target:
return
# call via import on x64
# see Lab21-01.exe_:0x14000118C
elif isinstance(insn.opers[0], envi.archs.amd64.disasm.Amd64RipRelOper):
@@ -238,10 +258,10 @@ def extract_insn_bytes_features(f, bb, insn):
example:
# push offset iid_004118d4_IShellLinkA ; riid
"""
for oper in insn.opers:
if insn.mnem == "call":
continue
if insn.mnem == "call":
return
for oper in insn.opers:
if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
v = oper.getOperValue(oper)
elif isinstance(oper, envi.archs.i386.disasm.i386RegMemOper):
@@ -291,6 +311,10 @@ def read_string(vw, offset):
# vivisect seems to mis-detect the end unicode strings
# off by one, too short
ulen += 1
else:
# vivisect seems to mis-detect the end unicode strings
# off by two, too short
ulen += 2
return read_memory(vw, offset, ulen).decode("utf-16")
raise ValueError("not a string", offset)
@@ -305,6 +329,9 @@ def extract_insn_string_features(f, bb, insn):
for oper in insn.opers:
if isinstance(oper, envi.archs.i386.disasm.i386ImmOper):
v = oper.getOperValue(oper)
elif isinstance(oper, envi.archs.i386.disasm.i386ImmMemOper):
# like 0x10056CB4 in `lea eax, dword [0x10056CB4]`
v = oper.imm
elif isinstance(oper, envi.archs.i386.disasm.i386SibOper):
# like 0x401000 in `mov eax, 0x401000[2 * ebx]`
v = oper.imm
@@ -395,7 +422,7 @@ def extract_insn_nzxor_characteristic_features(f, bb, insn):
parse non-zeroing XOR instruction from the given instruction.
ignore expected non-zeroing XORs, e.g. security cookies.
"""
if insn.mnem != "xor":
if insn.mnem not in ("xor", "xorpd", "xorps", "pxor"):
return
if insn.opers[0] == insn.opers[1]:

View File

@@ -5,6 +5,7 @@ json format:
{
'version': 1,
'base address': int(base address),
'functions': {
int(function va): {
'basic blocks': {
@@ -86,6 +87,7 @@ def dumps(extractor):
"""
ret = {
"version": 1,
"base address": extractor.get_base_address(),
"functions": {},
"scopes": {
"file": [],
@@ -147,6 +149,7 @@ def loads(s):
raise ValueError("unsupported freeze format version: %d" % (doc.get("version")))
features = {
"base address": doc.get("base address"),
"file features": [],
"functions": {},
}

View File

@@ -13,10 +13,10 @@ class API(Feature):
def __init__(self, name, description=None):
# Downcase library name if given
if "." in name:
modname, impname = name.split(".")
modname, _, impname = name.rpartition(".")
name = modname.lower() + "." + impname
super(API, self).__init__(name, description)
super(API, self).__init__(name, description=description)
class Number(Feature):
@@ -37,4 +37,4 @@ class Offset(Feature):
class Mnemonic(Feature):
def __init__(self, value, description=None):
super(Mnemonic, self).__init__(value, description=description)
super(Mnemonic, self).__init__(value.lower(), description=description)

View File

@@ -103,6 +103,7 @@ def collect_metadata():
"analysis": {
"format": idaapi.get_file_type_name(),
"extractor": "ida",
"base_address": idaapi.get_imagebase(),
},
"version": capa.version.__version__,
}

View File

@@ -1,15 +1,15 @@
# capa explorer
![capa explorer](../../../.github/capa-explorer-logo.png)
capa explorer is an IDA Pro plugin that integrates the FLARE team's open-source framework, capa, with IDA. capa is a framework that uses a well-defined collection of rules to
capa explorer is an IDA Pro plugin written in Python that integrates the FLARE team's open-source framework, capa, with IDA. capa is a framework that uses a well-defined collection of rules to
identify capabilities in a program. You can run capa against a PE file or shellcode and it tells you what it thinks the program can do. For example, it might suggest that
the program is a backdoor, can install services, or relies on HTTP to communicate.
The capa explorer IDA plugin brings capa's detection capabilities to IDA. You can use capa explorer to run capa directly on an IDA database without needing access
the program is a backdoor, can install services, or relies on HTTP to communicate. You can use capa explorer to run capa directly on an IDA database without requiring access
to the source binary. Once a database has been analyzed, capa explorer can be used to quickly identify and navigate to interesting areas of a program
and dissect capa rule matches at the assembly level.
To illustrate, we use capa explorer to analyze Lab 14-02 from [Practical Malware Analysis](https://nostarch.com/malware) (PMA) available [here](https://practicalmalwareanalysis.com/labs/). Our
goal is to understand the program's functionality.
We love using capa explorer during malware analysis because it teaches us what parts of a program suggest a behavior. As we click on rows, capa explorer jumps directly
to important addresses in the IDA Pro database and highlights key features in the Disassembly view so they stand out visually. To illustrate, we use capa explorer to
analyze Lab 14-02 from [Practical Malware Analysis](https://nostarch.com/malware) (PMA) available [here](https://practicalmalwareanalysis.com/labs/). Our goal is to understand
the program's functionality.
After loading Lab 14-02 into IDA and analyzing the database with capa explorer, we see that capa detected a rule match for `self delete via COMSPEC environment variable`:
@@ -65,7 +65,7 @@ capa explorer is limited to the file types supported by capa, which includes:
You can install capa explorer using the following steps:
1. Install capa for the Python interpreter used by your IDA installation:
1. Install capa and its dependencies from PyPI for the Python interpreter used by your IDA installation:
```
$ pip install flare-capa
```
@@ -74,9 +74,9 @@ You can install capa explorer using the following steps:
### Usage
1. Run IDA and analyze a supported file type (select `Manual Load` and `Load Resources` for best results)
1. Run IDA and analyze a supported file type (select the `Manual Load` and `Load Resources` options in IDA for best results)
2. Open capa explorer in IDA by navigating to `Edit > Plugins > FLARE capa explorer` or using the keyboard shortcut `Alt+F5`
3. Click `Analyze`
3. Click the `Analyze` button
When running capa explorer for the first time you are prompted to select a file directory containing capa rules. The plugin conveniently
remembers your selection for future runs; you can change this selection by navigating to `Rules > Change rules directory...`. We recommend
@@ -84,9 +84,9 @@ downloading and using the [standard collection of capa rules](https://github.com
#### Tips
* Start analysis by clicking `Analyze`
* Reset the plugin user interface and remove highlighting from IDA disassembly view by clicking `Reset`
* Change your capa rules directory by navigating to `Rules > Change rules directory...`
* Start analysis by clicking the `Analyze` button
* Reset the plugin user interface and remove highlighting from IDA disassembly view by clicking the `Reset` button
* Change your capa rules directory by navigating to `Rules > Change rules directory...` from the plugin menu
* Hover your cursor over a rule match to view the source content of the rule
* Double-click the `Address` column to navigate the IDA Disassembly view to the associated feature
* Double-click a result in the `Rule Information` column to expand its children
@@ -97,7 +97,7 @@ downloading and using the [standard collection of capa rules](https://github.com
Because capa explorer is packaged with capa you will need to install capa locally for development.
You can install capa locally by following the steps outlined in `Method 3: Inspecting the capa source code` of the [capa
installation guide](https://github.com/fireeye/capa/blob/ida_plugin_documentation/doc/installation.md#method-3-inspecting-the-capa-source-code). Once installed, copy [capa_explorer.py](https://raw.githubusercontent.com/fireeye/capa/master/capa/ida/plugin/capa_explorer.py)
installation guide](https://github.com/fireeye/capa/blob/master/doc/installation.md#method-3-inspecting-the-capa-source-code). Once installed, copy [capa_explorer.py](https://raw.githubusercontent.com/fireeye/capa/master/capa/ida/plugin/capa_explorer.py)
to your IDA plugins directory to run the plugin in IDA.
### Components
@@ -107,5 +107,5 @@ capa explorer consists of two main components:
* An IDA [feature extractor](https://github.com/fireeye/capa/tree/master/capa/features/extractors/ida) built on top of IDA's binary analysis engine
* This component uses IDAPython to extract [capa features](https://github.com/fireeye/capa-rules/blob/master/doc/format.md#extracted-features) from the IDA database such as strings,
disassembly, and control flow; these extracted features are used by capa to find feature combinations that result in a rule match
* An [interactive plugin](https://github.com/fireeye/capa/tree/master/capa/ida/plugin) for displaying and exploring capa rule matches
* An [interactive user interface](https://github.com/fireeye/capa/tree/master/capa/ida/plugin) for displaying and exploring capa rule matches
* This component integrates the IDA feature extractor and capa, providing an interactive user interface to dissect rule matches found by capa using features extracted by the IDA feature extractor

View File

@@ -413,7 +413,7 @@ class CapaExplorerForm(idaapi.PluginForm):
# new analysis, new doc
self.doc = None
self.process_total = 0
self.process_count = 0
self.process_count = 1
def update_wait_box(text):
"""update the IDA wait box"""

View File

@@ -108,6 +108,7 @@ class CapaExplorerQtreeView(QtWidgets.QTreeView):
if not model_index.isValid():
raise ValueError("invalid index")
return model_index.internalPointer()
def send_data_to_clipboard(self, data):
@@ -254,6 +255,10 @@ class CapaExplorerQtreeView(QtWidgets.QTreeView):
@param pos: cursor position
"""
model_index = self.indexAt(pos)
if not model_index.isValid():
return
item = self.map_index_to_source_item(model_index)
column = model_index.column()

View File

@@ -29,7 +29,7 @@ import capa.version
import capa.features
import capa.features.freeze
import capa.features.extractors
from capa.helpers import oint, get_file_taste
from capa.helpers import get_file_taste
RULES_PATH_DEFAULT_STRING = "(embedded rules)"
SUPPORTED_FILE_MAGIC = set(["MZ"])
@@ -40,8 +40,11 @@ logger = logging.getLogger("capa")
def set_vivisect_log_level(level):
logging.getLogger("vivisect").setLevel(level)
logging.getLogger("vivisect.base").setLevel(level)
logging.getLogger("vivisect.impemu").setLevel(level)
logging.getLogger("vtrace").setLevel(level)
logging.getLogger("envi").setLevel(level)
logging.getLogger("envi.codeflow").setLevel(level)
def find_function_capabilities(ruleset, extractor, f):
@@ -69,14 +72,14 @@ def find_function_capabilities(ruleset, extractor, f):
bb_features[feature].add(va)
function_features[feature].add(va)
_, matches = capa.engine.match(ruleset.basic_block_rules, bb_features, oint(bb))
_, matches = capa.engine.match(ruleset.basic_block_rules, bb_features, extractor.block_offset(bb))
for rule_name, res in matches.items():
bb_matches[rule_name].extend(res)
for va, _ in res:
function_features[capa.features.MatchedRule(rule_name)].add(va)
_, function_matches = capa.engine.match(ruleset.function_rules, function_features, oint(f))
_, function_matches = capa.engine.match(ruleset.function_rules, function_features, extractor.function_offset(f))
return function_matches, bb_matches, len(function_features)
@@ -112,10 +115,16 @@ def find_capabilities(ruleset, extractor, disable_progress=None):
}
}
for f in tqdm.tqdm(list(extractor.get_functions()), disable=disable_progress, desc="matching", unit=" functions"):
pbar = tqdm.tqdm
if disable_progress:
# do not use tqdm to avoid unnecessary side effects when caller intends
# to disable progress completely
pbar = lambda s, *args, **kwargs: s
for f in pbar(list(extractor.get_functions()), desc="matching", unit=" functions"):
function_matches, bb_matches, feature_count = find_function_capabilities(ruleset, extractor, f)
meta["feature_counts"]["functions"][f.__int__()] = feature_count
logger.debug("analyzed function 0x%x and extracted %d features", f.__int__(), feature_count)
meta["feature_counts"]["functions"][extractor.function_offset(f)] = feature_count
logger.debug("analyzed function 0x%x and extracted %d features", extractor.function_offset(f), feature_count)
for rule_name, res in function_matches.items():
all_function_matches[rule_name].extend(res)
@@ -295,7 +304,27 @@ class UnsupportedRuntimeError(RuntimeError):
def get_extractor_py3(path, format, disable_progress=False):
raise UnsupportedRuntimeError()
if False: # TODO: How to decide which backend to use?
from smda.SmdaConfig import SmdaConfig
from smda.Disassembler import Disassembler
import capa.features.extractors.smda
smda_report = None
with halo.Halo(text="analyzing program", spinner="simpleDots", stream=sys.stderr, enabled=not disable_progress):
config = SmdaConfig()
config.STORE_BUFFER = True
smda_disasm = Disassembler(config)
smda_report = smda_disasm.disassembleFile(path)
return capa.features.extractors.smda.SmdaFeatureExtractor(smda_report, path)
else:
import capa.features.extractors.miasm
with open(path, "rb") as f:
buf = f.read()
return capa.features.extractors.miasm.MiasmFeatureExtractor(buf)
def get_extractor(path, format, disable_progress=False):
@@ -351,7 +380,13 @@ def get_rules(rule_path, disable_progress=False):
rules = []
for rule_path in tqdm.tqdm(list(rule_paths), disable=disable_progress, desc="loading ", unit=" rules"):
pbar = tqdm.tqdm
if disable_progress:
# do not use tqdm to avoid unnecessary side effects when caller intends
# to disable progress completely
pbar = lambda s, *args, **kwargs: s
for rule_path in pbar(list(rule_paths), desc="loading ", unit=" rules"):
try:
rule = capa.rules.Rule.from_yaml_file(rule_path)
except capa.rules.InvalidRule:
@@ -446,7 +481,23 @@ def main(argv=None):
parser = argparse.ArgumentParser(
description=desc, epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter
)
parser.add_argument("sample", type=str, help="path to sample to analyze")
if sys.version_info >= (3, 0):
parser.add_argument(
# Python 3 str handles non-ASCII arguments correctly
"sample",
type=str,
help="path to sample to analyze",
)
else:
parser.add_argument(
# in #328 we noticed that the sample path is not handled correctly if it contains non-ASCII characters
# https://stackoverflow.com/a/22947334/ offers a solution and decoding using getfilesystemencoding works
# in our testing, however other sources suggest `sys.stdin.encoding` (https://stackoverflow.com/q/4012571/)
"sample",
type=lambda s: s.decode(sys.getfilesystemencoding()),
help="path to sample to analyze",
)
parser.add_argument("--version", action="version", version="%(prog)s {:s}".format(capa.version.__version__))
parser.add_argument(
"-r",
@@ -493,7 +544,9 @@ def main(argv=None):
try:
taste = get_file_taste(args.sample)
except IOError as e:
logger.error("%s", str(e))
# per our research there's not a programmatic way to render the IOError with non-ASCII filename unless we
# handle the IOError separately and reach into the args
logger.error("%s", e.args[0])
return -1
# py2 doesn't know about cp65001, which is a variant of utf-8 on windows
@@ -536,7 +589,13 @@ def main(argv=None):
try:
rules = get_rules(rules_path, disable_progress=args.quiet)
rules = capa.rules.RuleSet(rules)
logger.debug("successfully loaded %s rules", len(rules))
logger.debug(
"successfully loaded %s rules",
# during the load of the RuleSet, we extract subscope statements into their own rules
# that are subsequently `match`ed upon. this inflates the total rule count.
# so, filter out the subscope rules when reporting total number of loaded rules.
len([i for i in filter(lambda r: "capa/subscope-rule" not in r.meta, rules.rules.values())]),
)
if args.tag:
rules = rules.filter_rules_by_meta(args.tag)
logger.debug("selected %s rules", len(rules))

View File

@@ -161,6 +161,65 @@ def render_attack(doc, ostream):
ostream.write("\n")
def render_mbc(doc, ostream):
"""
example::
+--------------------------+------------------------------------------------------------+
| MBC Objective | MBC Behavior |
|--------------------------+------------------------------------------------------------|
| ANTI-BEHAVIORAL ANALYSIS | Virtual Machine Detection::Instruction Testing [B0009.029] |
| COLLECTION | Keylogging::Polling [F0002.002] |
| COMMUNICATION | Interprocess Communication::Create Pipe [C0003.001] |
| | Interprocess Communication::Write Pipe [C0003.004] |
| IMPACT | Remote Access::Reverse Shell [B0022.001] |
+--------------------------+------------------------------------------------------------+
"""
objectives = collections.defaultdict(set)
for rule in rutils.capability_rules(doc):
if not rule["meta"].get("mbc"):
continue
mbcs = rule["meta"]["mbc"]
if not isinstance(mbcs, list):
raise ValueError("invalid rule: MBC mapping is not a list")
for mbc in mbcs:
objective, _, rest = mbc.partition("::")
if "::" in rest:
behavior, _, rest = rest.partition("::")
method, _, id = rest.rpartition(" ")
objectives[objective].add((behavior, method, id))
else:
behavior, _, id = rest.rpartition(" ")
objectives[objective].add((behavior, id))
rows = []
for objective, behaviors in sorted(objectives.items()):
inner_rows = []
for spec in sorted(behaviors):
if len(spec) == 2:
behavior, id = spec
inner_rows.append("%s %s" % (rutils.bold(behavior), id))
elif len(spec) == 3:
behavior, method, id = spec
inner_rows.append("%s::%s %s" % (rutils.bold(behavior), method, id))
else:
raise RuntimeError("unexpected MBC spec format")
rows.append(
(
rutils.bold(objective.upper()),
"\n".join(inner_rows),
)
)
if rows:
ostream.write(
tabulate.tabulate(rows, headers=[width("MBC Objective", 25), width("MBC Behavior", 75)], tablefmt="psql")
)
ostream.write("\n")
def render_default(doc):
ostream = rutils.StringIO()
@@ -168,6 +227,8 @@ def render_default(doc):
ostream.write("\n")
render_attack(doc, ostream)
ostream.write("\n")
render_mbc(doc, ostream)
ostream.write("\n")
render_capabilities(doc, ostream)
return ostream.getvalue()

View File

@@ -6,13 +6,20 @@
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import re
import uuid
import codecs
import logging
import binascii
import functools
try:
from functools import lru_cache
except ImportError:
from backports.functools_lru_cache import lru_cache
import six
import yaml
import ruamel.yaml
import capa.engine
@@ -25,7 +32,6 @@ from capa.features import MAX_BYTES_FEATURE_SIZE
logger = logging.getLogger(__name__)
# these are the standard metadata fields, in the preferred order.
# when reformatted, any custom keys will come after these.
META_KEYS = (
@@ -271,27 +277,63 @@ def parse_description(s, value_type, description=None):
return value, description
def pop_statement_description_entry(d):
"""
extracts the description for statements and removes the description entry from the document
a statement can only have one description
example:
the features definition
- or:
- description: statement description
- number: 1
description: feature description
becomes
<statement>: [
{ "description": "statement description" }, <-- extracted here
{ "number": 1, "description": "feature description" }
]
"""
if not isinstance(d, list):
return None
# identify child of form '{ "description": <description> }'
descriptions = list(filter(lambda c: isinstance(c, dict) and len(c) == 1 and "description" in c, d))
if len(descriptions) > 1:
raise InvalidRule("statements can only have one description")
if not descriptions:
return None
description = descriptions[0]
d.remove(description)
return description["description"]
def build_statements(d, scope):
if len(d.keys()) > 2:
raise InvalidRule("too many statements")
key = list(d.keys())[0]
description = pop_statement_description_entry(d[key])
if key == "and":
return And([build_statements(dd, scope) for dd in d[key]], description=d.get("description"))
return And([build_statements(dd, scope) for dd in d[key]], description=description)
elif key == "or":
return Or([build_statements(dd, scope) for dd in d[key]], description=d.get("description"))
return Or([build_statements(dd, scope) for dd in d[key]], description=description)
elif key == "not":
if len(d[key]) != 1:
raise InvalidRule("not statement must have exactly one child statement")
return Not(build_statements(d[key][0], scope), description=d.get("description"))
return Not(build_statements(d[key][0], scope), description=description)
elif key.endswith(" or more"):
count = int(key[: -len("or more")])
return Some(count, [build_statements(dd, scope) for dd in d[key]], description=d.get("description"))
return Some(count, [build_statements(dd, scope) for dd in d[key]], description=description)
elif key == "optional":
# `optional` is an alias for `0 or more`
# which is useful for documenting behaviors,
# like with `write file`, we might say that `WriteFile` is optionally found alongside `CreateFileA`.
return Some(0, [build_statements(dd, scope) for dd in d[key]], description=d.get("description"))
return Some(0, [build_statements(dd, scope) for dd in d[key]], description=description)
elif key == "function":
if scope != FILE_SCOPE:
@@ -350,18 +392,18 @@ def build_statements(d, scope):
count = d[key]
if isinstance(count, int):
return Range(feature, min=count, max=count, description=d.get("description"))
return Range(feature, min=count, max=count, description=description)
elif count.endswith(" or more"):
min = parse_int(count[: -len(" or more")])
max = None
return Range(feature, min=min, max=max, description=d.get("description"))
return Range(feature, min=min, max=max, description=description)
elif count.endswith(" or fewer"):
min = None
max = parse_int(count[: -len(" or fewer")])
return Range(feature, min=min, max=max, description=d.get("description"))
return Range(feature, min=min, max=max, description=description)
elif count.startswith("("):
min, max = parse_range(count)
return Range(feature, min=min, max=max, description=d.get("description"))
return Range(feature, min=min, max=max, description=description)
else:
raise InvalidRule("unexpected range: %s" % (count))
elif key == "string" and not isinstance(d[key], six.string_types):
@@ -385,26 +427,6 @@ def second(s):
return s[1]
# we use the ruamel.yaml parser because it supports roundtripping of documents with comments.
yaml = ruamel.yaml.YAML(typ="rt")
# use block mode, not inline json-like mode
yaml.default_flow_style = False
# indent lists by two spaces below their parent
#
# features:
# - or:
# - mnemonic: aesdec
# - mnemonic: vaesdec
yaml.indent(sequence=2, offset=2)
# avoid word wrapping
yaml.width = 4096
class Rule(object):
def __init__(self, name, scope, statement, meta, definition=""):
super(Rule, self).__init__()
@@ -533,7 +555,7 @@ class Rule(object):
return self.statement.evaluate(features)
@classmethod
def from_dict(cls, d, s):
def from_dict(cls, d, definition):
name = d["rule"]["meta"]["name"]
# if scope is not specified, default to function scope.
# this is probably the mode that rule authors will start with.
@@ -551,17 +573,65 @@ class Rule(object):
if scope not in SUPPORTED_FEATURES.keys():
raise InvalidRule("{:s} is not a supported scope".format(scope))
return cls(name, scope, build_statements(statements[0], scope), d["rule"]["meta"], s)
return cls(name, scope, build_statements(statements[0], scope), d["rule"]["meta"], definition)
@staticmethod
@lru_cache()
def _get_yaml_loader():
try:
# prefer to use CLoader to be fast, see #306
# on Linux, make sure you install libyaml-dev or similar
# on Windows, get WHLs from pyyaml.org/pypi
loader = yaml.CLoader
logger.debug("using libyaml CLoader.")
except:
loader = yaml.Loader
logger.debug("unable to import libyaml CLoader, falling back to Python yaml parser.")
logger.debug("this will be slower to load rules.")
return loader
@staticmethod
def _get_ruamel_yaml_parser():
# use ruamel to enable nice formatting
# we use the ruamel.yaml parser because it supports roundtripping of documents with comments.
y = ruamel.yaml.YAML(typ="rt")
# use block mode, not inline json-like mode
y.default_flow_style = False
# leave quotes unchanged
y.preserve_quotes = True
# indent lists by two spaces below their parent
#
# features:
# - or:
# - mnemonic: aesdec
# - mnemonic: vaesdec
y.indent(sequence=2, offset=2)
# avoid word wrapping
y.width = 4096
return y
@classmethod
def from_yaml(cls, s):
return cls.from_dict(yaml.load(s), s)
def from_yaml(cls, s, use_ruamel=False):
if use_ruamel:
# ruamel enables nice formatting and doc roundtripping with comments
doc = cls._get_ruamel_yaml_parser().load(s)
else:
# use pyyaml because it can be much faster than ruamel (pure python)
doc = yaml.load(s, Loader=cls._get_yaml_loader())
return cls.from_dict(doc, s)
@classmethod
def from_yaml_file(cls, path):
def from_yaml_file(cls, path, use_ruamel=False):
with open(path, "rb") as f:
try:
return cls.from_yaml(f.read().decode("utf-8"))
return cls.from_yaml(f.read().decode("utf-8"), use_ruamel=use_ruamel)
except InvalidRule as e:
raise InvalidRuleWithPath(path, str(e))
@@ -575,12 +645,25 @@ class Rule(object):
# but not for rule logic.
# programmatic generation of rules is not yet supported.
definition = yaml.load(self.definition)
# definition retains a reference to `meta`,
# so we're updating that in place.
definition["rule"]["meta"] = self.meta
meta = self.meta
# use ruamel because it supports round tripping.
# pyyaml will lose the existing ordering of rule statements.
definition = self._get_ruamel_yaml_parser().load(self.definition)
# we want to apply any updates that have been made to `meta`.
# so we would like to assigned it like this:
#
# definition["rule"]["meta"] = self.meta
#
# however, `self.meta` is not ordered, its just a dict, so subsequent formatting doesn't work.
# so, we'll manually copy the keys over, re-using the existing ordereddict/CommentedMap
meta = definition["rule"]["meta"]
for k in meta.keys():
if k not in self.meta:
del meta[k]
for k, v in self.meta.items():
meta[k] = v
# the name and scope of the rule instance overrides anything in meta.
meta["name"] = self.name
meta["scope"] = self.scope
@@ -617,7 +700,7 @@ class Rule(object):
del meta[key]
ostream = six.BytesIO()
yaml.dump(definition, ostream)
self._get_ruamel_yaml_parser().dump(definition, ostream)
for key, value in hidden_meta.items():
if value is None:
@@ -641,7 +724,18 @@ class Rule(object):
# tweaking `ruamel.indent()` doesn't quite give us the control we want.
# so, add the two extra spaces that we've determined we need through experimentation.
# see #263
doc = doc.replace(" description:", " description:")
# only do this for the features section, so the meta description doesn't get reformatted
# assumes features section always exists
features_offset = doc.find("features")
doc = doc[:features_offset] + doc[features_offset:].replace(" description:", " description:")
# for negative hex numbers, yaml dump outputs:
# - offset: !!int '0x-30'
# we prefer:
# - offset: -0x30
# the below regex makes these adjustments and while ugly, we don't have to explore the ruamel.yaml insides
doc = re.sub(r"!!int '0x-([0-9a-fA-F]+)'", r"-0x\1", doc)
return doc
@@ -791,7 +885,8 @@ class RuleSet(object):
given a collection of rules, collect the rules that are needed at the given scope.
these rules are ordered topologically.
don't include "lib" rules, unless they are dependencies of other rules.
don't include auto-generated "subscope" rules.
we want to include general "lib" rules here - even if they are not dependencies of other rules, see #398
"""
scope_rules = set([])
@@ -800,7 +895,7 @@ class RuleSet(object):
# at lower scope, e.g. function scope.
# so, we find all dependencies of all rules, and later will filter them down.
for rule in rules:
if rule.meta.get("lib", False):
if rule.meta.get("capa/subscope-rule", False):
continue
scope_rules.update(get_rules_and_dependencies(rules, rule.name))

View File

@@ -1 +1 @@
__version__ = "1.3.0"
__version__ = "1.4.0"

View File

@@ -22,15 +22,15 @@ By default, on MacOS Catalina or greater, Gatekeeper will block execution of the
![approve dialog](img/approve.png)
## Method 2: Using capa as a Python library
To install capa as a Python library, you'll need to install a few dependencies, and then use `pip` to fetch the capa module.
To install capa as a Python library use `pip` to fetch the `flare-capa` module.
#### *Note*:
This method is appropriate for integrating capa in an existing project. It is not the right choice for local tool usage, such as within IDA Pro - see Method 3, instead.
That's because this technique doesn't pull the default rule set, so you should check it out separately from [capa-rules](https://github.com/fireeye/capa-rules/) and pass the directory to the entrypoint using `-r`.
This method is appropriate for integrating capa in an existing project.
This technique doesn't pull the default rule set, so you should check it out separately from [capa-rules](https://github.com/fireeye/capa-rules/) and pass the directory to the entrypoint using `-r` or set the rules path in the IDA Pro plugin.
Alternatively, see Method 3 below.
### 1. Install capa module
Second, use `pip` to install the capa module to your local Python environment. This fetches the library code to your computer but does not keep editable source files around for you to hack on. If you'd like to edit the source files, see below.
`$ pip install https://github.com/fireeye/capa/archive/master.zip`
Use `pip` to install the capa module to your local Python environment. This fetches the library code to your computer but does not keep editable source files around for you to hack on. If you'd like to edit the source files, see below. `$ pip install flare-capa`
### 2. Use capa
You can now import the `capa` module from a Python script or use the IDA Pro plugins from the `capa/ida` directory. For more information please see the [usage](usage.md) documentation.
@@ -74,8 +74,20 @@ Note that some development dependencies (including the black code formatter) req
To check the code style, formatting and run the tests you can run the script `scripts/ci.sh`.
You can run it with the argument `no_tests` to skip the tests and only run the code style and formatting: `scripts/ci.sh no_tests`
### 3. Setup hooks [optional]
### 3. Compile binary using PyInstaller
We compile capa standalone binaries using PyInstaller. To reproduce the build process check out the source code as described above and follow these steps.
#### Install PyInstaller:
For Python 2.7: `$ pip install 'pyinstaller==3.*'` (PyInstaller 4 doesn't support Python 2.7)
For Python 3: `$ pip install 'pyinstaller`
#### Run Pyinstaller
`$ pyinstaller .github/pyinstaller/pyinstaller.spec`
You can find the compiled binary in the created directory `dist/`.
### 4. Setup hooks [optional]
If you plan to contribute to capa, you may want to setup the hooks.
Run `scripts/setup-hooks.sh` to set the following hooks up:
- The `pre-commit` hook runs checks before every `git commit`.
@@ -84,4 +96,3 @@ Run `scripts/setup-hooks.sh` to set the following hooks up:
- The `pre-push` hook runs checks before every `git push`.
It runs `scripts/ci.sh` aborting the push if there are code style or rule linter offenses or if the tests fail.
This way you can ensure everything is alright before sending a pull request.

View File

@@ -4,11 +4,10 @@ See `capa -h` for all supported arguments and usage examples.
## tips and tricks
- [match only rules by given author or namespace](#only-run-selected-rules)
- [IDA Pro capa explorer](#capa-explorer)
- [IDA Pro rule generator](#rule-generator)
### only run selected rules
Use the `-t` option to run rules with the given metadata value (see the rule fields `rule.meta.*`).
For example, `capa -t william.ballenthin@mandiant.com` runs rules that reference Willi's email address (probably as the author), or
`capa -t communication` runs rules with the namespace `communication`.
### IDA Pro plugin: capa explorer
Please check out the [capa explorer documentation](/capa/ida/plugin/README.md).

2
rules

Submodule rules updated: fa77d81370...faa670ac38

247
scripts/bulk-process.py Normal file
View File

@@ -0,0 +1,247 @@
#!/usr/bin/env python
"""
bulk-process
Invoke capa recursively against a directory of samples
and emit a JSON document mapping the file paths to their results.
By default, this will use subprocesses for parallelism.
Use `-n/--parallelism` to change the subprocess count from
the default of current CPU count.
Use `--no-mp` to use threads instead of processes,
which is probably not useful unless you set `--parallelism=1`.
example:
$ python scripts/bulk-process /tmp/suspicious
{
"/tmp/suspicious/suspicious.dll_": {
"rules": {
"encode data using XOR": {
"matches": {
"268440358": {
[...]
"/tmp/suspicious/1.dll_": { ... }
"/tmp/suspicious/2.dll_": { ... }
}
usage:
usage: bulk-process.py [-h] [-r RULES] [-d] [-q] [-n PARALLELISM] [--no-mp]
input
detect capabilities in programs.
positional arguments:
input Path to directory of files to recursively analyze
optional arguments:
-h, --help show this help message and exit
-r RULES, --rules RULES
Path to rule file or directory, use embedded rules by
default
-d, --debug Enable debugging output on STDERR
-q, --quiet Disable all output but errors
-n PARALLELISM, --parallelism PARALLELISM
parallelism factor
--no-mp disable subprocesses
Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at: [package root]/LICENSE.txt
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import sys
import json
import logging
import os.path
import argparse
import multiprocessing
import multiprocessing.pool
import capa
import capa.main
import capa.render
logger = logging.getLogger("capa")
def get_capa_results(args):
"""
run capa against the file at the given path, using the given rules.
args is a tuple, containing:
rules (capa.rules.RuleSet): the rules to match
format (str): the name of the sample file format
path (str): the file system path to the sample to process
args is a tuple because i'm not quite sure how to unpack multiple arguments using `map`.
returns an dict with two required keys:
path (str): the file system path of the sample to process
status (str): either "error" or "ok"
when status == "error", then a human readable message is found in property "error".
when status == "ok", then the capa results are found in the property "ok".
the capa results are a dictionary with the following keys:
meta (dict): the meta analysis results
capabilities (dict): the matched capabilities and their result objects
"""
rules, format, path = args
logger.info("computing capa results for: %s", path)
try:
extractor = capa.main.get_extractor(path, format, disable_progress=True)
except capa.main.UnsupportedFormatError:
# i'm 100% sure if multiprocessing will reliably raise exceptions across process boundaries.
# so instead, return an object with explicit success/failure status.
#
# if success, then status=ok, and results found in property "ok"
# if error, then status=error, and human readable message in property "error"
return {
"path": path,
"status": "error",
"error": "input file does not appear to be a PE file: %s" % path,
}
except capa.main.UnsupportedRuntimeError:
return {
"path": path,
"status": "error",
"error": "unsupported runtime or Python interpreter",
}
except Exception as e:
return {
"path": path,
"status": "error",
"error": "unexpected error: %s" % (e),
}
meta = capa.main.collect_metadata("", path, "", format, extractor)
capabilities, counts = capa.main.find_capabilities(rules, extractor, disable_progress=True)
meta["analysis"].update(counts)
return {
"path": path,
"status": "ok",
"ok": {
"meta": meta,
"capabilities": capabilities,
},
}
def main(argv=None):
if argv is None:
argv = sys.argv[1:]
parser = argparse.ArgumentParser(description="detect capabilities in programs.")
parser.add_argument("input", type=str, help="Path to directory of files to recursively analyze")
parser.add_argument(
"-r",
"--rules",
type=str,
default="(embedded rules)",
help="Path to rule file or directory, use embedded rules by default",
)
parser.add_argument("-d", "--debug", action="store_true", help="Enable debugging output on STDERR")
parser.add_argument("-q", "--quiet", action="store_true", help="Disable all output but errors")
parser.add_argument(
"-n", "--parallelism", type=int, default=multiprocessing.cpu_count(), help="parallelism factor"
)
parser.add_argument("--no-mp", action="store_true", help="disable subprocesses")
args = parser.parse_args(args=argv)
if args.quiet:
logging.basicConfig(level=logging.ERROR)
logging.getLogger().setLevel(logging.ERROR)
elif args.debug:
logging.basicConfig(level=logging.DEBUG)
logging.getLogger().setLevel(logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger().setLevel(logging.INFO)
# disable vivisect-related logging, it's verbose and not relevant for capa users
capa.main.set_vivisect_log_level(logging.CRITICAL)
# py2 doesn't know about cp65001, which is a variant of utf-8 on windows
# tqdm bails when trying to render the progress bar in this setup.
# because cp65001 is utf-8, we just map that codepage to the utf-8 codec.
# see #380 and: https://stackoverflow.com/a/3259271/87207
import codecs
codecs.register(lambda name: codecs.lookup("utf-8") if name == "cp65001" else None)
if args.rules == "(embedded rules)":
logger.info("using default embedded rules")
logger.debug("detected running from source")
args.rules = os.path.join(os.path.dirname(__file__), "..", "rules")
logger.debug("default rule path (source method): %s", args.rules)
else:
logger.info("using rules path: %s", args.rules)
try:
rules = capa.main.get_rules(args.rules)
rules = capa.rules.RuleSet(rules)
logger.info("successfully loaded %s rules", len(rules))
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
samples = []
for (base, directories, files) in os.walk(args.input):
for file in files:
samples.append(os.path.join(base, file))
def pmap(f, args, parallelism=multiprocessing.cpu_count()):
"""apply the given function f to the given args using subprocesses"""
return multiprocessing.Pool(parallelism).imap(f, args)
def tmap(f, args, parallelism=multiprocessing.cpu_count()):
"""apply the given function f to the given args using threads"""
return multiprocessing.pool.ThreadPool(parallelism).imap(f, args)
def map(f, args, parallelism=None):
"""apply the given function f to the given args in the current thread"""
for arg in args:
yield f(arg)
if args.no_mp:
if args.parallelism == 1:
logger.debug("using current thread mapper")
mapper = map
else:
logger.debug("using threading mapper")
mapper = tmap
else:
logger.debug("using process mapper")
mapper = pmap
results = {}
for result in mapper(
get_capa_results, [(rules, "pe", sample) for sample in samples], parallelism=args.parallelism
):
if result["status"] == "error":
logger.warning(result["error"])
elif result["status"] == "ok":
meta = result["ok"]["meta"]
capabilities = result["ok"]["capabilities"]
# our renderer expects to emit a json document for a single sample
# so we deserialize the json document, store it in a larger dict, and we'll subsequently re-encode.
results[result["path"]] = json.loads(capa.render.render_json(meta, rules, capabilities))
else:
raise ValueError("unexpected status: %s" % (result["status"]))
print(json.dumps(results))
logger.info("done.")
return 0
if __name__ == "__main__":
sys.exit(main())

214
scripts/capa_as_library.py Normal file
View File

@@ -0,0 +1,214 @@
#!/usr/bin/env python3
import json
import collections
import capa.main
import capa.rules
import capa.engine
import capa.features
import capa.render.utils as rutils
from capa.engine import *
from capa.render import convert_capabilities_to_result_document
# edit this to set the path for file to analyze and rule directory
RULES_PATH = "/tmp/capa/rules/"
# load rules from disk
rules = capa.main.get_rules(RULES_PATH, disable_progress=True)
rules = capa.rules.RuleSet(rules)
# == Render ddictionary helpers
def render_meta(doc, ostream):
ostream["md5"] = doc["meta"]["sample"]["md5"]
ostream["sha1"] = doc["meta"]["sample"]["sha1"]
ostream["sha256"] = doc["meta"]["sample"]["sha256"]
ostream["path"] = doc["meta"]["sample"]["path"]
def find_subrule_matches(doc):
"""
collect the rule names that have been matched as a subrule match.
this way we can avoid displaying entries for things that are too specific.
"""
matches = set([])
def rec(node):
if not node["success"]:
# there's probably a bug here for rules that do `not: match: ...`
# but we don't have any examples of this yet
return
elif node["node"]["type"] == "statement":
for child in node["children"]:
rec(child)
elif node["node"]["type"] == "feature":
if node["node"]["feature"]["type"] == "match":
matches.add(node["node"]["feature"]["match"])
for rule in rutils.capability_rules(doc):
for node in rule["matches"].values():
rec(node)
return matches
def render_capabilities(doc, ostream):
"""
example::
{'CAPABILITY': {'accept command line arguments': 'host-interaction/cli',
'allocate thread local storage (2 matches)': 'host-interaction/process',
'check for time delay via GetTickCount': 'anti-analysis/anti-debugging/debugger-detection',
'check if process is running under wine': 'anti-analysis/anti-emulation/wine',
'contain a resource (.rsrc) section': 'executable/pe/section/rsrc',
'write file (3 matches)': 'host-interaction/file-system/write'}
}
"""
subrule_matches = find_subrule_matches(doc)
ostream["CAPABILITY"] = dict()
for rule in rutils.capability_rules(doc):
if rule["meta"]["name"] in subrule_matches:
# rules that are also matched by other rules should not get rendered by default.
# this cuts down on the amount of output while giving approx the same detail.
# see #224
continue
count = len(rule["matches"])
if count == 1:
capability = rule["meta"]["name"]
else:
capability = "%s (%d matches)" % (rule["meta"]["name"], count)
ostream["CAPABILITY"].setdefault(rule["meta"]["namespace"], list())
ostream["CAPABILITY"][rule["meta"]["namespace"]].append(capability)
def render_attack(doc, ostream):
"""
example::
{'ATT&CK': {'COLLECTION': ['Input Capture::Keylogging [T1056.001]'],
'DEFENSE EVASION': ['Obfuscated Files or Information [T1027]',
'Virtualization/Sandbox Evasion::System Checks '
'[T1497.001]'],
'DISCOVERY': ['File and Directory Discovery [T1083]',
'Query Registry [T1012]',
'System Information Discovery [T1082]'],
'EXECUTION': ['Shared Modules [T1129]']}
}
"""
ostream["ATTCK"] = dict()
tactics = collections.defaultdict(set)
for rule in rutils.capability_rules(doc):
if not rule["meta"].get("att&ck"):
continue
for attack in rule["meta"]["att&ck"]:
tactic, _, rest = attack.partition("::")
if "::" in rest:
technique, _, rest = rest.partition("::")
subtechnique, _, id = rest.rpartition(" ")
tactics[tactic].add((technique, subtechnique, id))
else:
technique, _, id = rest.rpartition(" ")
tactics[tactic].add((technique, id))
for tactic, techniques in sorted(tactics.items()):
inner_rows = []
for spec in sorted(techniques):
if len(spec) == 2:
technique, id = spec
inner_rows.append("%s %s" % (technique, id))
elif len(spec) == 3:
technique, subtechnique, id = spec
inner_rows.append("%s::%s %s" % (technique, subtechnique, id))
else:
raise RuntimeError("unexpected ATT&CK spec format")
ostream["ATTCK"].setdefault(tactic.upper(), inner_rows)
def render_mbc(doc, ostream):
"""
example::
{'MBC': {'ANTI-BEHAVIORAL ANALYSIS': ['Debugger Detection::Timing/Delay Check '
'GetTickCount [B0001.032]',
'Emulator Detection [B0004]',
'Virtual Machine Detection::Instruction '
'Testing [B0009.029]',
'Virtual Machine Detection [B0009]'],
'COLLECTION': ['Keylogging::Polling [F0002.002]'],
'CRYPTOGRAPHY': ['Encrypt Data::RC4 [C0027.009]',
'Generate Pseudo-random Sequence::RC4 PRGA '
'[C0021.004]']}
}
"""
ostream["MBC"] = dict()
objectives = collections.defaultdict(set)
for rule in rutils.capability_rules(doc):
if not rule["meta"].get("mbc"):
continue
mbcs = rule["meta"]["mbc"]
if not isinstance(mbcs, list):
raise ValueError("invalid rule: MBC mapping is not a list")
for mbc in mbcs:
objective, _, rest = mbc.partition("::")
if "::" in rest:
behavior, _, rest = rest.partition("::")
method, _, id = rest.rpartition(" ")
objectives[objective].add((behavior, method, id))
else:
behavior, _, id = rest.rpartition(" ")
objectives[objective].add((behavior, id))
for objective, behaviors in sorted(objectives.items()):
inner_rows = []
for spec in sorted(behaviors):
if len(spec) == 2:
behavior, id = spec
inner_rows.append("%s %s" % (behavior, id))
elif len(spec) == 3:
behavior, method, id = spec
inner_rows.append("%s::%s %s" % (behavior, method, id))
else:
raise RuntimeError("unexpected MBC spec format")
ostream["MBC"].setdefault(objective.upper(), inner_rows)
def render_dictionary(doc):
ostream = dict()
render_meta(doc, ostream)
render_attack(doc, ostream)
render_mbc(doc, ostream)
render_capabilities(doc, ostream)
return ostream
# ==== render dictionary helpers
def capa_details(file_path, output_format="dictionary"):
# extract features and find capabilities
extractor = capa.main.get_extractor(file_path, "auto", disable_progress=True)
capabilities, counts = capa.main.find_capabilities(rules, extractor, disable_progress=True)
# collect metadata (used only to make rendering more complete)
meta = capa.main.collect_metadata("", file_path, RULES_PATH, "auto", extractor)
meta["analysis"].update(counts)
capa_output = False
if output_format == "dictionary":
# ...as python dictionary, simplified as textable but in dictionary
doc = convert_capabilities_to_result_document(meta, rules, capabilities)
capa_output = render_dictionary(doc)
elif output_format == "json":
# render results
# ...as json
capa_output = json.loads(capa.render.render_json(meta, rules, capabilities))
elif output_format == "texttable":
# ...as human readable text table
capa_output = capa.render.render_default(meta, rules, capabilities)
return capa_output

View File

@@ -38,6 +38,12 @@ def main(argv=None):
)
parser.add_argument("-v", "--verbose", action="store_true", help="Enable debug logging")
parser.add_argument("-q", "--quiet", action="store_true", help="Disable all output but errors")
parser.add_argument(
"-c",
"--check",
action="store_true",
help="Don't output (reformatted) rule, only return status. 0 = no changes, 1 = would reformat",
)
args = parser.parse_args(args=argv)
if args.verbose:
@@ -50,12 +56,22 @@ def main(argv=None):
logging.basicConfig(level=level)
logging.getLogger("capafmt").setLevel(level)
rule = capa.rules.Rule.from_yaml_file(args.path)
rule = capa.rules.Rule.from_yaml_file(args.path, use_ruamel=True)
reformatted_rule = rule.to_yaml()
if args.check:
if rule.definition == reformatted_rule:
logger.info("rule is formatted correctly, nice! (%s)", rule.name)
return 0
else:
logger.info("rule requires reformatting (%s)", rule.name)
return 1
if args.in_place:
with open(args.path, "wb") as f:
f.write(rule.to_yaml().encode("utf-8"))
f.write(reformatted_rule.encode("utf-8"))
else:
print(rule.to_yaml().rstrip("\n"))
print(reformatted_rule)
return 0

View File

@@ -15,7 +15,9 @@ See the License for the specific language governing permissions and limitations
"""
import os
import sys
import time
import string
import difflib
import hashlib
import logging
import os.path
@@ -24,6 +26,7 @@ import itertools
import posixpath
import capa.main
import capa.rules
import capa.engine
import capa.features
import capa.features.insn
@@ -194,7 +197,7 @@ class DoesntMatchExample(Lint):
continue
try:
extractor = capa.main.get_extractor(path, "auto")
extractor = capa.main.get_extractor(path, "auto", disable_progress=True)
capabilities, meta = capa.main.find_capabilities(ctx["rules"], extractor, disable_progress=True)
except Exception as e:
logger.error("failed to extract capabilities: %s %s %s", rule.name, path, e)
@@ -232,7 +235,7 @@ class LibRuleNotInLibDirectory(Lint):
if "lib" not in rule.meta:
return False
return "/lib/" not in get_normpath(rule.meta["capa/path"])
return "lib/" not in get_normpath(rule.meta["capa/path"])
class LibRuleHasNamespace(Lint):
@@ -276,6 +279,32 @@ class FeatureNegativeNumber(Lint):
return False
class FormatSingleEmptyLineEOF(Lint):
name = "EOF format"
recommendation = "end file with a single empty line"
def check_rule(self, ctx, rule):
if rule.definition.endswith("\n") and not rule.definition.endswith("\n\n"):
return False
return True
class FormatIncorrect(Lint):
name = "rule format incorrect"
recommendation_template = "use scripts/capafmt.py or adjust as follows\n{:s}"
def check_rule(self, ctx, rule):
actual = rule.definition
expected = capa.rules.Rule.from_yaml(rule.definition, use_ruamel=True).to_yaml()
if actual != expected:
diff = difflib.ndiff(actual.splitlines(1), expected.splitlines(1))
self.recommendation = self.recommendation_template.format("".join(diff))
return True
return False
def run_lints(lints, ctx, rule):
for lint in lints:
if lint.check_rule(ctx, rule):
@@ -331,15 +360,25 @@ FEATURE_LINTS = (
)
def get_normpath(path):
return posixpath.normpath(path).replace(os.sep, "/")
def lint_features(ctx, rule):
features = get_features(ctx, rule)
return run_feature_lints(FEATURE_LINTS, ctx, features)
FORMAT_LINTS = (
FormatSingleEmptyLineEOF(),
FormatIncorrect(),
)
def lint_format(ctx, rule):
return run_lints(FORMAT_LINTS, ctx, rule)
def get_normpath(path):
return posixpath.normpath(path).replace(os.sep, "/")
def get_features(ctx, rule):
# get features from rule and all dependencies including subscopes and matched rules
features = []
@@ -390,6 +429,7 @@ def lint_rule(ctx, rule):
lint_meta(ctx, rule),
lint_logic(ctx, rule),
lint_features(ctx, rule),
lint_format(ctx, rule),
)
)
@@ -500,6 +540,7 @@ def main(argv=None):
action="store_true",
help="Enable thorough linting - takes more time, but does a better job",
)
parser.add_argument("-t", "--tag", type=str, help="filter on rule meta field values")
parser.add_argument("-v", "--verbose", action="store_true", help="Enable debug logging")
parser.add_argument("-q", "--quiet", action="store_true", help="Disable all output but errors")
args = parser.parse_args(args=argv)
@@ -516,15 +557,20 @@ def main(argv=None):
capa.main.set_vivisect_log_level(logging.CRITICAL)
logging.getLogger("capa").setLevel(logging.CRITICAL)
logging.getLogger("viv_utils").setLevel(logging.CRITICAL)
time0 = time.time()
try:
rules = capa.main.get_rules(args.rules)
rules = capa.main.get_rules(args.rules, disable_progress=True)
rules = capa.rules.RuleSet(rules)
logger.info("successfully loaded %s rules", len(rules))
except IOError as e:
logger.error("%s", str(e))
return -1
except capa.rules.InvalidRule as e:
if args.tag:
rules = rules.filter_rules_by_meta(args.tag)
logger.debug("selected %s rules", len(rules))
for i, r in enumerate(rules.rules, 1):
logger.debug(" %d. %s", i, r)
except (IOError, capa.rules.InvalidRule, capa.rules.InvalidRuleSet) as e:
logger.error("%s", str(e))
return -1
@@ -542,6 +588,10 @@ def main(argv=None):
}
did_violate = lint(ctx, rules)
min, sec = divmod(time.time() - time0, 60)
logger.debug("lints ran for ~ %02d:%02dm", min, sec)
if not did_violate:
logger.info("no suggestions, nice!")
return 0

View File

@@ -1,5 +1,13 @@
#!/usr/bin/env python2
"""
Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at: [package root]/LICENSE.txt
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
show-features
Show the features that capa extracts from the given sample,
@@ -55,14 +63,6 @@ Example::
insn: 0x10001027: number(0x1)
insn: 0x10001027: mnemonic(shl)
...
Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at: [package root]/LICENSE.txt
Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
"""
import sys
import logging
@@ -89,12 +89,12 @@ def main(argv=None):
]
format_help = ", ".join(["%s: %s" % (f[0], f[1]) for f in formats])
parser = argparse.ArgumentParser(description="detect capabilities in programs.")
parser = argparse.ArgumentParser(description="Show the features that capa extracts from the given sample")
parser.add_argument("sample", type=str, help="Path to sample to analyze")
parser.add_argument(
"-f", "--format", choices=[f[0] for f in formats], default="auto", help="Select sample format, %s" % format_help
)
parser.add_argument("-F", "--function", type=lambda x: int(x, 0), help="Show features for specific function")
parser.add_argument("-F", "--function", type=lambda x: int(x, 0x10), help="Show features for specific function")
args = parser.parse_args(args=argv)
logging.basicConfig(level=logging.INFO)
@@ -122,6 +122,50 @@ def main(argv=None):
else:
functions = filter(lambda f: f.va == args.function, functions)
if args.function not in [f.va for f in functions]:
print("0x%X not a function, creating it" % args.function)
vw.makeFunction(args.function)
functions = extractor.get_functions()
functions = filter(lambda f: f.va == args.function, functions)
if len(functions) == 0:
print("0x%X not a function")
return -1
print_features(functions, extractor)
return 0
def ida_main():
function = idc.get_func_attr(idc.here(), idc.FUNCATTR_START)
print("getting features for current function 0x%X" % function)
extractor = capa.features.extractors.ida.IdaFeatureExtractor()
if not function:
for feature, va in extractor.extract_file_features():
if va:
print("file: 0x%08x: %s" % (va, feature))
else:
print("file: 0x00000000: %s" % (feature))
return
functions = extractor.get_functions()
if function:
functions = filter(lambda f: f.start_ea == function, functions)
if len(functions) == 0:
print("0x%X not a function" % function)
return -1
print_features(functions, extractor)
return 0
def print_features(functions, extractor):
for f in functions:
for feature, va in extractor.extract_function_features(f):
print("func: 0x%08x: %s" % (va, feature))
@@ -138,8 +182,9 @@ def main(argv=None):
# may be an issue while piping to less and encountering non-ascii characters
continue
return 0
if __name__ == "__main__":
sys.exit(main())
if capa.main.is_runtime_ida():
ida_main()
else:
sys.exit(main())

View File

@@ -11,7 +11,6 @@ import sys
import setuptools
# halo==0.0.30 is the last version to support py2.7
requirements = [
"six",
"tqdm",
@@ -21,16 +20,18 @@ requirements = [
"termcolor",
"ruamel.yaml",
"wcwidth",
"halo==0.0.30",
"ida-settings==2.1.0",
]
if sys.version_info >= (3, 0):
# py3
requirements.append("halo")
requirements.append("networkx")
requirements.append("smda==1.5.13")
else:
# py2
requirements.append("enum34==1.1.6") # v1.1.6 is needed by halo 0.0.30 / spinners 0.0.24
requirements.append("halo==0.0.30") # halo==0.0.30 is the last version to support py2.7
requirements.append("vivisect==0.1.0")
requirements.append("viv-utils")
requirements.append("networkx==2.2") # v2.2 is last version supported by Python 2.7
@@ -39,18 +40,30 @@ else:
# this sets __version__
# via: http://stackoverflow.com/a/7071358/87207
# and: http://stackoverflow.com/a/2073599/87207
with open(os.path.join("capa", "version.py"), "rb") as f:
with open(os.path.join("capa", "version.py"), "r") as f:
exec(f.read())
# via: https://packaging.python.org/guides/making-a-pypi-friendly-readme/
this_directory = os.path.abspath(os.path.dirname(__file__))
with open(os.path.join(this_directory, "README.md"), "r") as f:
long_description = f.read()
setuptools.setup(
name="flare-capa",
version=__version__,
description="The FLARE team's open-source tool to identify capabilities in executable files.",
long_description="",
long_description=long_description,
long_description_content_type="text/markdown",
author="Willi Ballenthin, Moritz Raabe",
author_email="william.ballenthin@mandiant.com, moritz.raabe@mandiant.com",
url="https://www.github.com/fireeye/capa",
project_urls={
"Documentation": "https://github.com/fireeye/capa/tree/master/doc",
"Rules": "https://github.com/fireeye/capa-rules",
"Rules Documentation": "https://github.com/fireeye/capa-rules/tree/master/doc",
},
packages=setuptools.find_packages(exclude=["tests"]),
package_dir={"capa": "capa"},
entry_points={
@@ -72,12 +85,15 @@ setuptools.setup(
]
},
zip_safe=False,
keywords="capa",
keywords="capa malware analysis capability detection FLARE",
classifiers=[
"Development Status :: 3 - Alpha",
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Information Technology",
"License :: OSI Approved :: Apache Software License",
"Natural Language :: English",
"Programming Language :: Python :: 2",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Topic :: Security",
],
)

View File

@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -9,6 +10,7 @@
import os
import sys
import os.path
import binascii
import contextlib
import collections
@@ -77,7 +79,44 @@ def get_viv_extractor(path):
vw = capa.main.get_workspace(path, "sc64", should_save=False)
else:
vw = capa.main.get_workspace(path, "auto", should_save=True)
return capa.features.extractors.viv.VivisectFeatureExtractor(vw, path)
extractor = capa.features.extractors.viv.VivisectFeatureExtractor(vw, path)
fixup_viv(path, extractor)
return extractor
def fixup_viv(path, extractor):
"""
vivisect fixups to overcome differences between backends
"""
if "3b13b" in path:
# vivisect only recognizes calling thunk function at 0x10001573
extractor.vw.makeFunction(0x10006860)
@lru_cache()
def get_smda_extractor(path):
from smda.SmdaConfig import SmdaConfig
from smda.Disassembler import Disassembler
import capa.features.extractors.smda
config = SmdaConfig()
config.STORE_BUFFER = True
disasm = Disassembler(config)
report = disasm.disassembleFile(path)
return capa.features.extractors.smda.SmdaFeatureExtractor(report, path)
@lru_cache()
def get_miasm_extractor(path):
import capa.features.extractors.miasm
with open(path, "rb") as f:
buf = f.read()
print("Using miasm!!!!")
return capa.features.extractors.miasm.MiasmFeatureExtractor(buf)
@lru_cache()
@@ -128,6 +167,8 @@ def get_data_path_by_name(name):
return os.path.join(CD, "data", "Practical Malware Analysis Lab 21-01.exe_")
elif name == "al-khaser x86":
return os.path.join(CD, "data", "al-khaser_x86.exe_")
elif name == "al-khaser x64":
return os.path.join(CD, "data", "al-khaser_x64.exe_")
elif name.startswith("39c05"):
return os.path.join(CD, "data", "39c05b15e9834ac93f206bc114d0a00c357c888db567ba8f5345da0529cbed41.dll_")
elif name.startswith("499c2"):
@@ -144,8 +185,16 @@ def get_data_path_by_name(name):
return os.path.join(CD, "data", "c91887d861d9bd4a5872249b641bc9f9.exe_")
elif name.startswith("64d9f"):
return os.path.join(CD, "data", "64d9f7d96b99467f36e22fada623c3bb.dll_")
elif name.startswith("82bf6"):
return os.path.join(CD, "data", "82BF6347ACF15E5D883715DC289D8A2B.exe_")
elif name.startswith("pingtaest"):
return os.path.join(CD, "data", "ping_täst.exe_")
elif name.startswith("77329"):
return os.path.join(CD, "data", "773290480d5445f11d3dc1b800728966.exe_")
elif name.startswith("3b13b"):
return os.path.join(CD, "data", "3b13b6f1d7cd14dc4a097a12e2e505c0a4cff495262261e2bfc991df238b9b04.dll_")
else:
raise ValueError("unexpected sample fixture")
raise ValueError("unexpected sample fixture: %s" % name)
def get_sample_md5_by_name(name):
@@ -164,6 +213,8 @@ def get_sample_md5_by_name(name):
return "c8403fb05244e23a7931c766409b5e22"
elif name == "al-khaser x86":
return "db648cd247281954344f1d810c6fd590"
elif name == "al-khaser x64":
return "3cb21ae76ff3da4b7e02d77ff76e82be"
elif name.startswith("39c05"):
return "b7841b9d5dc1f511a93cc7576672ec0c"
elif name.startswith("499c2"):
@@ -180,8 +231,15 @@ def get_sample_md5_by_name(name):
return "c91887d861d9bd4a5872249b641bc9f9"
elif name.startswith("64d9f"):
return "64d9f7d96b99467f36e22fada623c3bb"
elif name.startswith("82bf6"):
return "82bf6347acf15e5d883715dc289d8a2b"
elif name.startswith("77329"):
return "773290480d5445f11d3dc1b800728966"
elif name.startswith("3b13b"):
# file name is SHA256 hash
return "56a6ffe6a02941028cc8235204eef31d"
else:
raise ValueError("unexpected sample fixture")
raise ValueError("unexpected sample fixture: %s" % name)
def resolve_sample(sample):
@@ -195,14 +253,14 @@ def sample(request):
def get_function(extractor, fva):
for f in extractor.get_functions():
if f.__int__() == fva:
if extractor.function_offset(f) == fva:
return f
raise ValueError("function not found")
def get_basic_block(extractor, f, va):
for bb in extractor.get_basic_blocks(f):
if bb.__int__() == va:
if extractor.block_offset(bb) == va:
return bb
raise ValueError("basic block not found")
@@ -369,6 +427,12 @@ FEATURE_PRESENCE_TESTS = [
True,
),
("kernel32-64", "function=0x1800202B0", capa.features.insn.API("RtlCaptureContext"), True),
# insn/api: x64 nested thunk
("al-khaser x64", "function=0x14004B4F0", capa.features.insn.API("__vcrt_GetModuleHandle"), True),
# insn/api: call via jmp
("mimikatz", "function=0x40B3C6", capa.features.insn.API("LocalFree"), True),
("c91887...", "function=0x40156F", capa.features.insn.API("CloseClipboard"), True),
# TODO ignore thunk functions that call via jmp?
# insn/api: resolve indirect calls
("c91887...", "function=0x401A77", capa.features.insn.API("kernel32.CreatePipe"), True),
("c91887...", "function=0x401A77", capa.features.insn.API("kernel32.SetHandleInformation"), True),
@@ -379,16 +443,21 @@ FEATURE_PRESENCE_TESTS = [
("mimikatz", "function=0x40105D", capa.features.String("SCardTransmit"), True),
("mimikatz", "function=0x40105D", capa.features.String("ACR > "), True),
("mimikatz", "function=0x40105D", capa.features.String("nope"), False),
("773290...", "function=0x140001140", capa.features.String(r"%s:\\OfficePackagesForWDAG"), True),
# insn/regex, issue #262
("pma16-01", "function=0x4021B0", capa.features.Regex("HTTP/1.0"), True),
("pma16-01", "function=0x4021B0", capa.features.Regex("www.practicalmalwareanalysis.com"), False),
# insn/string, pointer to string
("mimikatz", "function=0x44EDEF", capa.features.String("INPUTEVENT"), True),
# insn/string, direct memory reference
("mimikatz", "function=0x46D6CE", capa.features.String("(null)"), True),
# insn/bytes
("mimikatz", "function=0x40105D", capa.features.Bytes("SCardControl".encode("utf-16le")), True),
("mimikatz", "function=0x40105D", capa.features.Bytes("SCardTransmit".encode("utf-16le")), True),
("mimikatz", "function=0x40105D", capa.features.Bytes("ACR > ".encode("utf-16le")), True),
("mimikatz", "function=0x40105D", capa.features.Bytes("nope".encode("ascii")), False),
# IDA features included byte sequences read from invalid memory, fixed in #409
("mimikatz", "function=0x44570F", capa.features.Bytes(binascii.unhexlify("FF" * 256)), False),
# insn/bytes, pointer to bytes
("mimikatz", "function=0x44EDEF", capa.features.Bytes("INPUTEVENT".encode("utf-16le")), True),
# insn/characteristic(nzxor)
@@ -396,6 +465,9 @@ FEATURE_PRESENCE_TESTS = [
("mimikatz", "function=0x40105D", capa.features.Characteristic("nzxor"), False),
# insn/characteristic(nzxor): no security cookies
("mimikatz", "function=0x46D534", capa.features.Characteristic("nzxor"), False),
# insn/characteristic(nzxor): xorps
# viv needs fixup to recognize function, see above
("3b13b...", "function=0x10006860", capa.features.Characteristic("nzxor"), True),
# insn/characteristic(peb access)
("kernel32-64", "function=0x1800017D0", capa.features.Characteristic("peb access"), True),
("mimikatz", "function=0x4556E5", capa.features.Characteristic("peb access"), False),
@@ -421,6 +493,12 @@ FEATURE_PRESENCE_TESTS = [
("mimikatz", "function=0x4556E5", capa.features.Characteristic("calls to"), False),
]
FEATURE_PRESENCE_TESTS_IDA = [
# file/imports
# IDA can recover more names of APIs imported by ordinal
("mimikatz", "file", capa.features.file.Import("cabinet.FCIAddFile"), True),
]
FEATURE_COUNT_TESTS = [
("mimikatz", "function=0x40E5C2", capa.features.basicblock.BasicBlock(), 7),
("mimikatz", "function=0x4702FD", capa.features.Characteristic("calls from"), 0),
@@ -454,7 +532,10 @@ def do_test_feature_count(get_extractor, sample, scope, feature, expected):
def get_extractor(path):
if sys.version_info >= (3, 0):
raise RuntimeError("no supported py3 backends yet")
if False: # TODO: How to decide which backend to use?
extractor = get_smda_extractor(path)
else:
extractor = get_miasm_extractor(path)
else:
extractor = get_viv_extractor(path)
@@ -526,3 +607,8 @@ def z499c2_extractor():
@pytest.fixture
def al_khaser_x86_extractor():
return get_extractor(get_data_path_by_name("al-khaser x86"))
@pytest.fixture
def pingtaest_extractor():
return get_extractor(get_data_path_by_name("pingtaest"))

View File

@@ -44,7 +44,7 @@ def get_ida_extractor(_path):
@pytest.mark.skip(reason="IDA Pro tests must be run within IDA")
def test_ida_features():
for (sample, scope, feature, expected) in FEATURE_PRESENCE_TESTS:
for (sample, scope, feature, expected) in FEATURE_PRESENCE_TESTS + FEATURE_PRESENCE_TESTS_IDA:
id = make_test_id((sample, scope, feature, expected))
try:

View File

@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
# Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -18,7 +19,6 @@ import capa.features
from capa.engine import *
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_main(z9324d_extractor):
# tests rules can be loaded successfully and all output modes
path = z9324d_extractor.path
@@ -28,7 +28,6 @@ def test_main(z9324d_extractor):
assert capa.main.main([path]) == 0
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_main_single_rule(z9324d_extractor, tmpdir):
# tests a single rule can be loaded successfully
RULE_CONTENT = textwrap.dedent(
@@ -57,7 +56,34 @@ def test_main_single_rule(z9324d_extractor, tmpdir):
)
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_main_non_ascii_filename(pingtaest_extractor, tmpdir, capsys):
# on py2.7, need to be careful about str (which can hold bytes)
# vs unicode (which is only unicode characters).
# on py3, this should not be needed.
#
# here we print a string with unicode characters in it
# (specifically, a byte string with utf-8 bytes in it, see file encoding)
assert capa.main.main(["-q", pingtaest_extractor.path]) == 0
std = capsys.readouterr()
# but here, we have to use a unicode instance,
# because capsys has decoded the output for us.
if sys.version_info >= (3, 0):
assert pingtaest_extractor.path in std.out
else:
assert pingtaest_extractor.path.decode("utf-8") in std.out
def test_main_non_ascii_filename_nonexistent(tmpdir, caplog):
NON_ASCII_FILENAME = "täst_not_there.exe"
assert capa.main.main(["-q", NON_ASCII_FILENAME]) == -1
if sys.version_info >= (3, 0):
assert NON_ASCII_FILENAME in caplog.text
else:
assert NON_ASCII_FILENAME.decode("utf-8") in caplog.text
def test_main_shellcode(z499c2_extractor):
path = z499c2_extractor.path
assert capa.main.main([path, "-vv", "-f", "sc32"]) == 0
@@ -112,7 +138,6 @@ def test_ruleset():
assert len(rules.basic_block_rules) == 1
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_match_across_scopes_file_function(z9324d_extractor):
rules = capa.rules.RuleSet(
[
@@ -176,7 +201,6 @@ def test_match_across_scopes_file_function(z9324d_extractor):
assert ".text section and install service" in capabilities
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_match_across_scopes(z9324d_extractor):
rules = capa.rules.RuleSet(
[
@@ -239,7 +263,6 @@ def test_match_across_scopes(z9324d_extractor):
assert "kill thread program" in capabilities
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_subscope_bb_rules(z9324d_extractor):
rules = capa.rules.RuleSet(
[
@@ -264,7 +287,6 @@ def test_subscope_bb_rules(z9324d_extractor):
assert "test rule" in capabilities
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_byte_matching(z9324d_extractor):
rules = capa.rules.RuleSet(
[
@@ -287,7 +309,6 @@ def test_byte_matching(z9324d_extractor):
assert "byte match test" in capabilities
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_count_bb(z9324d_extractor):
rules = capa.rules.RuleSet(
[
@@ -311,7 +332,6 @@ def test_count_bb(z9324d_extractor):
assert "count bb" in capabilities
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_fix262(pma16_01_extractor, capsys):
# tests rules can be loaded successfully and all output modes
path = pma16_01_extractor.path
@@ -322,7 +342,6 @@ def test_fix262(pma16_01_extractor, capsys):
assert "www.practicalmalwareanalysis.com" not in std.out
@pytest.mark.xfail(sys.version_info >= (3, 0), reason="vivsect only works on py2")
def test_not_render_rules_also_matched(z9324d_extractor, capsys):
# rules that are also matched by other rules should not get rendered by default.
# this cuts down on the amount of output while giving approx the same detail.

View File

@@ -0,0 +1,29 @@
# Copyright (C) 2020 FireEye, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: https://github.com/fireeye/capa/blob/master/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
from fixtures import *
@parametrize(
"sample,scope,feature,expected",
FEATURE_PRESENCE_TESTS,
indirect=["sample", "scope"],
)
def test_miasm_features(sample, scope, feature, expected):
do_test_feature_presence(get_miasm_extractor, sample, scope, feature, expected)
@parametrize(
"sample,scope,feature,expected",
FEATURE_COUNT_TESTS,
indirect=["sample", "scope"],
)
def test_miasm_feature_counts(sample, scope, feature, expected):
do_test_feature_count(get_miasm_extractor, sample, scope, feature, expected)

View File

@@ -69,46 +69,63 @@ def test_rule_yaml_complex():
assert r.evaluate({Number(6): {1}, Number(7): {1}, Number(8): {1}}) == False
def test_rule_yaml_descriptions():
def test_rule_descriptions():
rule = textwrap.dedent(
"""
rule:
meta:
name: test rule
features:
meta:
name: test rule
features:
- and:
- description: and description
- number: 1 = number description
- string: mystring
description: string description
- string: '/myregex/'
description: regex description
# TODO - count(number(2 = number description)): 2
- or:
- description: or description
- and:
- number: 1 = This is the number 1
- string: This program cannot be run in DOS mode.
description: MS-DOS stub message
- string: '/SELECT.*FROM.*WHERE/i'
description: SQL WHERE Clause
- count(number(2 = AF_INET/SOCK_DGRAM)): 2
- or:
- and:
- offset: 0x50 = IMAGE_NT_HEADERS.OptionalHeader.SizeOfImage
- offset: 0x34 = IMAGE_NT_HEADERS.OptionalHeader.ImageBase
description: 32-bits
- and:
- offset: 0x50 = IMAGE_NT_HEADERS64.OptionalHeader.SizeOfImage
- offset: 0x30 = IMAGE_NT_HEADERS64.OptionalHeader.ImageBase
description: 64-bits
description: PE headers offsets
- offset: 0x50 = offset description
- offset: 0x34 = offset description
- description: and description
- and:
- description: and description
- offset/x64: 0x50 = offset/x64 description
- offset/x64: 0x30 = offset/x64 description
"""
)
r = capa.rules.Rule.from_yaml(rule)
assert (
r.evaluate(
{
Number(1): {1},
Number(2): {2, 3},
String("This program cannot be run in DOS mode."): {4},
String("SELECT password FROM hidden_table WHERE user == admin"): {5},
Offset(0x50): {6},
Offset(0x30): {7},
}
def rec(statement):
if isinstance(statement, capa.engine.Statement):
assert statement.description == statement.name.lower() + " description"
for child in statement.get_children():
rec(child)
else:
assert statement.description == statement.name + " description"
rec(r.statement)
def test_invalid_rule_statement_descriptions():
# statements can only have one description
with pytest.raises(capa.rules.InvalidRule):
capa.rules.Rule.from_yaml(
textwrap.dedent(
"""
rule:
meta:
name: test rule
features:
- or:
- number: 1 = This is the number 1
- description: description
- description: another description (invalid)
"""
)
)
== True
)
def test_rule_yaml_not():
@@ -265,7 +282,8 @@ def test_lib_rules():
),
]
)
assert len(rules.function_rules) == 1
# lib rules are added to the rule set
assert len(rules.function_rules) == 2
def test_subscope_rules():

View File

@@ -0,0 +1,30 @@
# Copyright (C) 2020 FireEye, Inc. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at: [package root]/LICENSE.txt
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import sys
from fixtures import *
@parametrize(
"sample,scope,feature,expected",
FEATURE_PRESENCE_TESTS,
indirect=["sample", "scope"],
)
def test_smda_features(sample, scope, feature, expected):
with xfail(sys.version_info < (3, 0), reason="SMDA only works on py3"):
do_test_feature_presence(get_smda_extractor, sample, scope, feature, expected)
@parametrize(
"sample,scope,feature,expected",
FEATURE_COUNT_TESTS,
indirect=["sample", "scope"],
)
def test_smda_feature_counts(sample, scope, feature, expected):
with xfail(sys.version_info < (3, 0), reason="SMDA only works on py3"):
do_test_feature_count(get_smda_extractor, sample, scope, feature, expected)