fix: register all data-ref addresses for imports in Ghidra helpers

The original code stored only one IAT address per import (addr=0 fallback
on master, addr=first with break in prior fix). When an import has multiple
data references, instruction-level lookups could miss the one actually
referenced, breaking API feature extraction and causing spurious
cross-section-flow characteristics.

Collect all data-ref addresses into a list and register the import under
each, matching how map_fake_import_addrs already stores all refs. Also
preserves ex_loc registration when no data refs exist.
This commit is contained in:
Willi Ballenthin
2026-05-08 09:52:26 +02:00
committed by Willi Ballenthin
parent 99b3cfe096
commit 5a60f3a0f8
+4 -7
View File
@@ -123,14 +123,10 @@ def get_file_imports() -> dict[int, list[str]]:
import_dict: dict[int, list[str]] = {}
for f in get_current_program().getFunctionManager().getExternalFunctions():
addr = None
addrs: list[int] = []
for r in f.getSymbol().getReferences():
if r.getReferenceType().isData():
addr = r.getFromAddress().getOffset() # gets pointer to fake external addr
break
if addr is None:
continue
addrs.append(r.getFromAddress().getOffset())
ex_loc = f.getExternalLocation().getAddress() # map external locations as well (offset into module files)
@@ -142,7 +138,8 @@ def get_file_imports() -> dict[int, list[str]]:
fstr[0] = "*" if "<EXTERNAL>" in fstr[0] else fstr[0][:-4]
for name in capa.features.extractors.helpers.generate_symbols(fstr[0], fstr[1]):
import_dict.setdefault(addr, []).append(name)
for addr in addrs:
import_dict.setdefault(addr, []).append(name)
if ex_loc:
import_dict.setdefault(ex_loc.getOffset(), []).append(name)