No, they can't exist.
And it doesn't matter if it's written in C or not.
from Reversing: Secrets of Reverse Engineering
Published by Wiley Publishing, Inc.
Native Code Decompilation: An Unsolvable Problem?Compilation is a more or less well-defined task. A program source file is analyzed
and is checked for syntactic validity based on (hopefully) very strict
language specifications. From this high-level representation, the compiler generates
an intermediate representation of the source program that attempts to
classify exactly what the program does, in compiler-readable form. The program
is then analyzed and optimized to improve its efficiency as much as possible,
and it is then converted into the target platform’s assembly language. There
are rarely question marks with regard to what the program is trying to do
because the language specifications were built so that compilers can easily
read and “understand” the program source code.
This is the key difference between compilers and decompilers that often
makes decompilation a far more indefinite process. Decompilers read machine
language code as input, and such input can often be very difficult to analyze.
With certain higher-level representations such as Java bytecode or .NET MSIL
the task is far more manageable, because the input representation of the program
includes highly detailed information regarding the program, particularly
regarding the data it deals with (think of metadata in .NET). The real
challenge for decompilation is to accurately generate a high-level language
representation from native binaries such as IA-32 binaries where no explicit
information regarding program data (such as data structure definitions and
other data type information) is available.
There is often debate on whether this is even possible or not. Some have compared
the native decompilation process to an attempt to bring back the cow
from the hamburger or the eggs from the omelet. The argument is that highlevel
information is completely lost during the compilation process and that the
executable binary simply doesn’t contain the information that would be necessary
in order to bring back anything similar to the original source code.
The primary counterargument that decompiler writers use is that detailed
information regarding the program must be present in the executable—otherwise,
the processor wouldn’t be able to execute it. I believe that this is true, but
only up to a point. CPUs can perform quite a few operations without understanding
the exact details of the underlying data. This means that you don’t
really have a guarantee that every relevant detail regarding a source program is
bound to be present in the binary just because the processor can correctly execute
it. Some details are just irretrievably lost during the compilation process.
Still, this doesn’t make decompilation impossible, it simply makes it more difficult,
and it means that the result is always going to be somewhat limited.
It is important to point out that (assuming that the decompiler operates correctly)
the result is never going to be semantically incorrect. It would usually be
correct to the point where recompiling the decompiler-generated output
would produce a functionally identical version of the original program. The
problem with the generated high-level language code is that the code is almost
always going to be far less readable than the original program source code.
Besides the obvious limitations such as the lack of comments, variable names,
and so on. the generated code might lack certain details regarding the program,
such as accurate data structure declarations or accurate basic type identification.
Additionally, the decompiled output might be structured somewhat differently
from the original source code because of compiler optimizations. In this
chapter, I will demonstrate several limitations imposed on native code decompilers
by modern compilers and show how precious information is often eliminated
from executable binaries.