Feature Post

Top

CLR metadata myth in a managed module?

This answers questions:
  • What is a CLR metadata in a managed module?
  • How a .NET assembly works
  • Life cycle of a managed module
  • Win32 executable vs .NET executable
MythBusters certainly is contagious.

While I was browsing for the understanding of metadata in terms of .net technology, I wanted to know about the .net metadata myth; it is great to a grasp what the metadata is and the way a managed module is built, loaded and ran. I couldnt find much over bing/google. Except for a few blogs or that how the CLR loads objects; that were a bit challenging to understand.

Previous compilers/assemblers produced code targeted to specific CPU architecture, such as x86(which is our IBM compatible PC), Alpha or PowerPC. But all CLR(common lanaguage runtime) compliant compilers produce IL(intermediate lanaguage) code instead. Our good old VC++ compiler used to build unmanaged modules (exe/dll). Which means, that the .NET based languages always produce managed modules that require the CLR to execute.

I used to wonder, what exactly a managed module is? and what is IL? After going through several articles I would say I thought of sharing my understanding to the other side of the planet; so that someone may prove the proof of their existence by providing comments/suggestions in form of corrections/discussions.

Though I knew that IL is something, as it says an "intermediate" language, and CLR is the one that compilies the IL into native code. I was amazed to see that a managed executable or dll is not really an exe or dll. I just didnt know what exactly IL is in terms of .NET technology.

Actually the IL is referred to as the "managed code"; and the reason is that the CLR manages its lifetime and execution.

A managed module, which really is "not an exe"- is an intermediate-language, has the following six sections defined in the file.

1. PE(Portable Executable) Header - this tells the OS that its an exe.
2. .text section -
3. .idata section - Contains list of referenced/imported files.
4. CLR header
5. IL
6. Metadata

These sections are also there in a WIN32 exe, but it doesnt contain the CLR header, IL and the metadata section; which makes sense, because a native module which at this time is called unmanaged-module, does require the .NET CLR. A typical PE section of an unmanaged module(imagine win32 exe) has been layed out here.

Metadata is part of a managed module; so what exactly is a metadata? In layman's terms I would say a metadata is like a data defining another data; in IL terms metadata is simply a set of data tables, that describes what is defined in a module; for instance, its types and definition of referenced types/members.

Also, just so you may understand, in terms of older technology, the type library(.tlb) files can be called a metadata file; or the COM interface definition(*.idl) files may be called a metadata file. Though the tlb files and idl files used to be a separate file defining the objects metadata; but in case of a managed module these interface defintions are embedded into the managed module(exe/dll/etc).

When a managed module is compiled, mscoree.dll is referenced in the file's import(idata) section. MSCorEE.dll(Microsoft component object runtime execution engine) is a file that plays very important role in executing a managed module;

When a managed module is invoked, the OS(windows) "thinks" that its just another "normal"(win32) executable; then it tells the windows loader to load the file; the loader reads the idata section, and when it finds the reference of mscoree.dll, it knows that it now needs to load the managed module in CLR; and then eventually managed module's IL is JIT'ted(Just in time compilation) into native code; which means, MSCorEE.dll compiles IL to native CPU instructions.


Metadata section allows the CLR garbage collector to track the lifetime of objects. Since all managed modules run under CLR therefore, for any object the garbage collector is owned by the CLR; and therefore it provides the information regarding the type of object and metadata, and that which fields within that object refer to other objects.