Five extractors (ghidra, dnfile, viv, binja, ida) stored Format in
global_features during __init__ and also included extract_file_format
in FILE_HANDLERS. This caused find_file_capabilities to emit the Format
feature twice, inflating feature counts. Removing extract_file_format
from FILE_HANDLERS in all five extractors ensures Format is emitted
once via global_features only.
When resolve_dotnet_token returns an InvalidToken (e.g. malformed or
out-of-range MethodSpec table/row index), the assert on line 51 raised
AssertionError instead of gracefully returning None. Replaced the assert
with the isinstance guard pattern already used elsewhere in the same file.
`get_dotnet_table_row` used `if row_index - 1 <= 0` to guard against invalid
indices. Because .NET metadata tables are 1-indexed, row_index=1 is the first
valid row, but the condition is equivalent to `row_index <= 1`, silently
rejecting it and making the first row of every table unreachable.
Changed to `if row_index <= 0`, which correctly rejects only the zero/null
token and leaves all valid rows accessible. Added four unit tests against the
real dd9098ff91717f4906afe9dafdfa2f52.exe_ sample to verify the guard
boundary: row_index=1 returns the first row, row_index=0 returns None, all
row indices 1..N succeed, and an out-of-bounds index returns None.
Replace the header from source code files using the following script:
```Python
for dir_path, dir_names, file_names in os.walk("capa"):
for file_name in file_names:
# header are only in `.py` and `.toml` files
if file_name[-3:] not in (".py", "oml"):
continue
file_path = f"{dir_path}/{file_name}"
f = open(file_path, "rb+")
content = f.read()
m = re.search(OLD_HEADER, content)
if not m:
continue
print(f"{file_path}: {m.group('year')}")
content = content.replace(m.group(0), NEW_HEADER % m.group("year"))
f.seek(0)
f.write(content)
```
Some files had the copyright headers inside a `"""` comment and needed
manual changes before applying the script. `hook-vivisect.py` and
`pyinstaller.spec` didn't include the license in the header and also
needed manual changes.
The old header had the confusing sentence `All rights reserved`, which
does not make sense for an open source license. Replace the header by
the default Google header that corrects this issue and keep capa
consistent with other Google projects.
Adapt the linter to work with the new header.
Replace also the copyright text in the `web/public/index.html` file for
consistency.