How do I write a .Net compiler
August 18, 2007 10:22 AM   Subscribe

Can anyone recommend a book (or online equivalent) that teaches the theory and practice of writing compilers, specifically those targeting the Microsoft CLR (ie .Net)?

I'm trying to build a compiler for an evil custom language used at my workplace (hideous details available on request.) Lexing and parsing isn't a problem, but I don't really know where to go from there. .Net seems to have a whole suite of libraries to help with this stuff but, again, I don't really know where to start.

Any reading suggestions would be welcome. (I located the source for the compiler, which I suspect is about as simple as it'll get, and which convinced me I should really try and learn the theory rather than just hack my way blindly through it.)
posted by Luddite to Computers & Internet (18 answers total) 3 users marked this as a favorite
Best answer: You probably want "The Dragon Book".
posted by david1230 at 10:37 AM on August 18, 2007


The Appel books have been criticised by some as being too much like a technical manual but I found them succient and clear. They may help you. Pick the one closest to the language you're using (probably Java in this case).
posted by gadha at 10:46 AM on August 18, 2007

I was going to suggest the dragon book too. Just keep in mind that it's not geared towards .net
posted by Deathalicious at 11:17 AM on August 18, 2007

do you really need to implement a full-blown compiler? "domain specific languages" are very much in fashion these days, but they're typically embedded rather than standalone.

personally, if i were doing this, i would look for a good functional language implementation built on .net and leverage that (possibly F#), because typically higher order functions make this kind of thing a lot easier (in effect you provide syntax that looks to the user as though they are writing direct commands but which, in practice, return higher order functions that then call the language itself to do the work - for example, combinator libraries).

however if you're not used to functional programming it's probably going to cost you as much time as writing a compiler anyway. so perhaps that's not so good.

another approach would be similar, but using a language which macro support (so you can define new syntax, which is then transformed into the base language - something like templates). i'm sure there must be a .net scheme or lisp (looks like bigloo will target .net), but perhaps the syntax is not to your liking.

you might find reading through these search results useful.

searching for .net and dsl turned up this toolkit - it seems to be very graphics oriented, but might be useful?
posted by andrew cooke at 11:29 AM on August 18, 2007

another article on the .net dsl toolkit.
posted by andrew cooke at 11:30 AM on August 18, 2007

Response by poster: Andrew: interesting ideas. I've never done functional programming in anger (just the basics.)

The reason I wanted to do the full-blown compiler was in order to be able to generate .Net assemblies and use all the whizziness of the Visual Studio IDE for debugging. The not-very-sophisticated programmers who'll be using the end product would be scared witless by any hints of more depth that accidentally stepping into functions would cause; do you think it would be possible to make a native-like experience building on top of a functional language?

(sorry for any delay in further replies; I must needs go to work for a while...)
posted by Luddite at 11:50 AM on August 18, 2007

Good question. First, it sounds like you know how and why compilers work, but need specific information on implementation. Second, it sounds like there are two actual problems here. The first is compiling your language down to .net assemblies. The second is the visual studio integration.

For the compilation part of it, check out some of the languages the mono project has implemented, since they'll be open source, and can probably give you a nice base to go from. Also, just talking with the mono guys through email or IRC could probably be helpful.

I can't help you with the visual studio integration part, although talking with ironpython people might help with that one, since I'm sure they've dealt with it.
posted by cschneid at 1:06 PM on August 18, 2007

ah. so they're used to running, say, vb code in the ide and then stopping and editing it? (i have seen this done with production code at a bank - it terrified me...).

the higher order function approach probably would be confusing.

but i guess i don't really understand what you are doing. it sounds like you need a "full blown" language. i would say that a dsl is more generally used for specifying something (perhaps "business rules") in a "user friendly" way. so it's not really intended to be run in a debugger. if you really need to implement a "full" language then that is a huge amount of work...

and wouldn't they also be confused by a new language - one that is not what they normally use?

i don't know how the macro approach would appear in a debugger. since a macro rewrites code that could also be confusing.

i know nothing about the toolkit i linked described in the articles, but that is a completely different approach and - being a ms visual studio related product - would give the kind of integration you want. but it may be too graphical? of all the things i mentioned that seems most likely to be useful.

writing a compiler is a significant amount of work - especially if, say, you've never even written a parser (maybe you have?). an interpreter is easier but, again, i am not sure you'd get the level of integration you want with the ide (watching an interpreter run is not the same as watching the program it is interpreting run!).

unfortunately i don't use .net myself so can't give more detailed help. you might try lambda or perhaps langsmiths (although the former may treat you more as a troll and the latter may be dominated these days by christoper diggins).
posted by andrew cooke at 1:09 PM on August 18, 2007

(also, i put a fair amount of effort into finding an easy way to write a language just a month or two ago. i didn't look at ms products, but at things like llvm, c-- and parrot. i finally decided that targetting parrot would be simplest, but finally gave up in frustration. i mention this just in case you wonder if there's anything that would make things much easier - i don't know of any easy route).
posted by andrew cooke at 1:14 PM on August 18, 2007

I was first going to suggest that you create a compiler to output C# or C++ and then compile that with the Microsoft compilers. But that probably wouldn't work if you want to debug natively in your new language.

So is CIL, formerly MSIL - microsoft's intermediate language what you need to learn about?
posted by DarkForest at 3:58 PM on August 18, 2007

The CIL/CLR/whatever is a stack machine, isn't it, much like the Java VM (and most other bytecode VMs)? I don't think the dragon book has much to say about code generation for stack-based architectures, since they're pretty rare in the real world (as opposed to the world of virtual machines where they're really common).

On the other hand, the advantage of stack machines is they're a lot easier to generate code for, since you don't need to worry as much about register allocation and spilling/loading (and since it's JIT-compiled you don't need to get into stuff like instruction scheduling, either). If you don't need to worry about generating highly optimized code, then the dragon book plus perhaps some papers on code generation for Java or other stack machines ought to give you enough information that you can traverse your parse tree and emit correct code.

If t weren't for the debugging requirement, I would definitely suggest compiling to a HLL target (C# or the like). It might actually be easier to do that and create your own lightweight debugger for your (apparently non-geek) end-users, rather than trying to get the Visual Studio debugger to do the right thing for them.
posted by hattifattener at 5:19 PM on August 18, 2007

I don't think the dragon book has much to say about code generation for stack-based architectures

a traditional compiler book should still make sense - you just have to generate the correct intermediate representation (cil/msil) and then you stop (you don't need the chapters on code generation etc since .net will do that for you). but a book specifically about .net would be better, obviously.
posted by andrew cooke at 6:00 PM on August 18, 2007

posted by andrew cooke at 6:01 PM on August 18, 2007

Best answer: no, this
posted by andrew cooke at 6:02 PM on August 18, 2007

sorry for posting so much, but once last comment. reading some of the reviews it sounds like you might be best with the book linked just above (Compiling for the .NET Common Language Runtime) and a more traditional book for the parsing part.
posted by andrew cooke at 6:05 PM on August 18, 2007

Response by poster: Thanks for responses. To clarify... what I've got at the moment is a hideous dsl used for writing user interfaces for industrial control (actually, control of broadcasting systems.) It's a painfully retarded pseudo-assembly language that's stored as resource strings in the (Win16!) interpreter executable, and is currently edited with the Borland C++ resource editor (v4, c.1993) and currently has no debugging facilities. I've written code to read the resource strings and dialog templates from an executable, and I'd like to compile the whole shebang into a new assembly that will then actually be debuggable, rather than the current approach of spattering a broken cprintf equivalent all over the place (and, you know, run in 32 bit protected mode, and not crash on excess whitespace.)

The 'language' itself is fairly straightforward to parse, so, if I fork out the £50 for the Dragon book on compilers then I suspect what I'm really after in addition to that is a resource that sets out the gooey innards of the .Net VM, assemblies and how to use the various libraries to spit these things out.

(Incidentally Java is out because our vile environment relies on some really kinky Win16/Win32s interop in order to work.)
posted by Luddite at 6:18 PM on August 18, 2007

Response by poster: ...and I see that while I've been concocting a reply you've actually summoned such a book from the bowels of the Amazon database. Why couldn't I do that? (*fume*). I shall looksee, thankyou. Although if anyone has any personal recommendations that would still be highly welcome, obviously.
posted by Luddite at 6:24 PM on August 18, 2007

I would take a look at Soot, the java optimization framework - if only because it has a lot of optimizations built in. Register spilling and filling, etc, and the end result is stack-based, as is your .Net.

I, too, am a bit unclear on what you are trying to do. Or, more specifically, I don't understand what part you don't understand. If it is just the process of designing the compiler that seems confusing, try searching google for graduate-level compiler courses. I'm sure you could find some useful lecture slides that will point you in the right direction.

And I'll second Appel. I never read it for the enjoyment of it, but it was useful reference material.
posted by mbatch at 11:44 AM on August 20, 2007

« Older Would wallpaper paste work?   |   What is a good user / "client-side" alternative to... Newer »
This thread is closed to new comments.