The ImpHash was introduced in 2014 by FireEye . It has since been used by many malware analysts and implemented in tools like VirusTotal to identify similar malware samples by their imports. In theory, if programs use the same imports, they use similar source code.
.NET samples usually only import mscoree.dll, such that there is only a handful of different ImpHashes for all .NET binaries. Therefore, the ImpHash cannot be used here. This motivated us to find an alternative, the TypeRefHash (TRH). To show the imported DLLs, functions and the TypeRef table, we used the online tool penet.io.
.NET files store imported namespaces of their referenced types in a so-called Metadata table. We can use these to construct an identifier like the ImpHash. Similar to the combination of DLL/function name in the Import table, the TypeRef table contains a list with type names and their corresponding namespace. For example a .NET binary may import the type DebuggerBrowsableState from the namespace System.Diagnostics.
To calculate the TRH we extract the TypeRef table and resolve the indices to the corresponding strings.
Order the entries by TypeNamespace and then by TypeName. Concatenate the TypeNamespaces and TypeNames with a dash. In case that the namespace is empty, the concatenated string starts with the dash. Join all strings with commas and calculate the SHA256 hashsum of the resulting UTF8 byte-string.
We use SHA256, instead of MD5 which is used for the ImpHash, as we already see MD5 collisions on our data sets. We order the entries in the table to prevent attacks where a different TypeRefHash could be created for a sample by just reordering the table. A similar attack was shown for the ImpHash by Balles and Sharfuddin . We chose a dash and a comma as the seperators, as they are not valid in namespaces and type names in .NET.
Imagine we have a .NET sample with the following simplified TypeRef table:
# TypeName (Resolved) TypeNamespace (Resolved) 0 CompilationRelaxationsAttribute System.Runtime.CompilerServices 1 RuntimeCompatibilityAttribute System.Runtime.CompilerServices 2 TargetFrameworkAttribute System.Runtime.Versioning 3 DebuggingModes 4 AssemblyFileVersionAttribute System.Reflection
This results in the following ordered and concatenated strings. It should be noted that TypeRefs that have an empty namespace are sorted to the beginning of the list.
-DebuggingModesSystem System.Reflection-AssemblyFileVersionAttribute System.Runtime.CompilerServices-CompilationRelaxationsAttribute System.Runtime.CompilerServices-RuntimeCompatibilityAttribute System.Runtime.Versioning-TargetFrameworkAttribute
This is concatenated to the following final string:
The resulting TRH is the SHA256 hashsum of the above string.
You can find the TRH reference implementation in the PeNet library here.Continue reading