The actual files resulting from compile process?
January 11, 2009 11:36 AM   Subscribe

what low-level files does a java or c++ compiler typically generate?

Hello, i am back in OO programming, and while ive worked with classes and such ive recently started to wonder exactly what the process that happens during compiling.

I am aware and understand the basics of scanner, lexer, preprocessing, parsing, semantic analysis, code generation, and code optimization. This all generally makes good practical sense.

What i am looking for is in the end, what set of files do you actually end up with after compiling? Does the compile process generate purely .dll files or all sorts of other types?

I am looking for a general overview that doesn't require me to know lower level programming to understand what is going on. I have not been able to find much general info on this topic (probably because i dont know the important key words). For example I would love to find an example of a simple program writen in java that the author then shows exactly what new files were generated from compiling this code. Any comments or good links for would be greatly appreciated!
posted by figTree to Computers & Internet (9 answers total) 1 user marked this as a favorite
 
Java is very simple. Each java class, generally one per .java file, compiles into a .class file. A jar file which you distribute is just a zip of those class files with an extra manifest file.

C/C++ is more complicated since the compiler generates intermediate object files (.o) and then the linker links them into dll's or exe's.
posted by smackfu at 11:44 AM on January 11, 2009


Response by poster: thanks smakfu, Bit simpler than I expected, in the case of java, what actually uses the .class files? (how do these break down even further? - how do these relate to .dlls?). In the case of C++, things don't get much lower level than .dll files no?
posted by figTree at 12:00 PM on January 11, 2009


To go into a little more detail:

A .class file contains bytecode for all the methods in the class, and a bit of extra goo. Any "linking" is done at runtime.

A .o or .obj file (they're equivalent, but different compilers do different things) contains machine code for every method/function, and a bit of extra goo. In the machine code all external symbols, like "printf" still are the string "printf"; this isn't resolved to an actual physical address until you link your .obj (and .lib, which is just a bundle of .obj's) files together into a dll or exe. Linking in C++ can be pretty complex, what with templates and all.
posted by aubilenon at 12:01 PM on January 11, 2009


Every compilation unit in C++ (the .cpps you feed into the compiler) get fully compiled into machine code during compilation (usually through some intermediate form, to keep separation between the compiler's front end & code-generating back end & let optimizations be applied more generically). Then you end up with a lot of small files (the .o files), each of which have a ton of references to functions/variables that don't actually exist in the .o file itself. The linker links one or more .o files together and makes sure the final binary either has every function/variable reference it needs or knows where to find it. This may somewhat help explain it.
posted by devilsbrigade at 12:05 PM on January 11, 2009


The compiler translates the source files into object files, which are (usually) an intermediary step to the final executable, shared library, etc.

In Java, .java source files become .class files that contain bytecode. This bytecode then gets run through the java virtual machines, which does a bunch of stuff. It can pull in other .class files to resolve external dependencies, it can interpret the bytecode, and it can compile the bytecode into native machine instructions. Usually it does some combination of all of the above.

In C++, you get machine native instructions in the object file. Your classes, in essence, are flattened into a struct and a bunch of function calls. Generating the new function names is called mangling. Each compiler has its own way. Here's the Wiki article on how MSVC++ does it. For example, from the wiki article, this member function declaration:
void __cdecl abc<def<int>,void*>::xyz(void);
becomes this function call:
xyz@?$abc@V?$def@H@@PAX@@
You need to do this because processors and i386 assembly don't really have a concept of objects. So this function name contains all information needed by the compiler to figure out the original function signature! Java actually does something similar if you've ever coded JNI methods. The "this" pointer, which is a pointer to the struct of member variables (plus other bookkeeping like rtti and the virtual function table), usually comes either as the first parameter, or more commonly in a machine register.

Anyway, these intermediate object files aren't usually runnable, except in the most basic of cases. Once you have a collection of them you need to link them all together with their external dependencies (shared libraries, DLLs, so, dylib, etc) using the aptly named "linker". It makes sure that every function call you make has a function definition available somewhere. This is called "resolving the symbol". It builds a single executable from the object files that has resolved all internal symbols and has load commands to load the external, shared libraries at runtime.
posted by sbutler at 12:09 PM on January 11, 2009


Here's the specification of .class files in java, if you want to know exactly what's in 'em. Jar files are (as someone else already mentioned) just .zip files with extension changed from .zip to .jar, and usually they have a META-INF folder with some information about what class file to run if they are an executable jar.
posted by delmoi at 12:42 PM on January 11, 2009


Java actually does something similar if you've ever coded JNI methods.

Java only "does" anything like the above when it's interacting with C using JNI. JNI doesn't represent the internal reality of the JVM, it's an interface for C.
posted by delmoi at 12:43 PM on January 11, 2009


Java only "does" anything like the above when it's interacting with C using JNI. JNI doesn't represent the internal reality of the JVM, it's an interface for C.

Opps. Didn't mean to imply anything about the internals of Java. Sorry for being ambiguous!

Actually, I've never bothered to figure out what the Java bytecode and JVM look like. The way Java works I'm not sure how helpful it would be. But knowing stuff about C/C++ object files and the linker is pretty important to figuring out wtf some errors mean.
posted by sbutler at 12:54 PM on January 11, 2009


This is really not all that complicated, or even interesting. What files a compiler generates depends on what you ask for. As people have mentioned, Java just translates code into .class files, which stores everything about a class. C++ will take a set of .o files and link them into a single binary (more detail).

DLLs are more of an advanced topic, and certainly not the "lowest level". If by that you mean "least information and processing left", the executable itself qualifies. But other interpretations could also qualify .s files, which allow you to write assembler and muck around with name mangling.
posted by pwnguin at 3:39 PM on January 11, 2009


« Older Ah. Yes. Where do the legs go on that thing again?   |   Can we and/or the kids be friends? Newer »
This thread is closed to new comments.