March 12, 2009 2:00 PM by Daniel Chambers
There are times when I am surprised because I come across some basic principle or feature in a programming language that I just didn't know about but really should have (see the "Generics and Type-Safety" blog for an example). The most recent example of this was in my Enterprise .NET lecture where they asked us to define what boxing and unboxing was. I'd heard of it in relation to Java, because Java has non-object value types that need to be converted to objects sometimes (the process of "boxing") so they can be used with Java's crappy generics system. But since, in C#, even an int is an object with methods, I assumed that boxing and unboxing was not done in C#.
I was wrong. C# indeed does boxing and unboxing! At first, this didn't make sense. My incomplete understanding of boxing (in relation to Java) was that value types were stored only on the stack (yes, this is a little inaccurate) and when you needed to put them on the heap, you had to box them. In C#, I thought everything was an object, so this process would have been redundant.
Wrong. .NET (and therefore C#) has value types, which are boxed and unboxed transparently by the CLR. Value types in C# derive from the ValueType class which itself derives from Object. Structs in C# are automatically derived from ValueType for you (therefore you cannot do inheritance with structs). Unlike in Java, value types are still objects: they can have methods, fields, properties, events, etc.
Why are value types good? When .NET deals with a value type, it stores the object's data inline in memory. This means when the variable is on the stack, the data is stored directly in stack-space. When the variable is inside a heap object, the data is stored directly inside the heap object. This is different to reference types, where instead of the data being stored inline, a pointer to the data which is somewhere on the heap is stored inline. This means it takes longer to access a reference type than a value type as you have to read the pointer, then read the location the pointer points to.
Boxing kills this performance increase you get when you use value types. When you box (or more accurately, the CLR boxes) a value type, it essentially wraps it in a reference type object that is stored on the heap and then uses a reference to point to it. Your value type is now a reference type. So not only do you need to look up a reference to get to the final data, you have to spend time creating the wrapper object at runtime.
When does boxing happen? The main place to watch out for is when you pass a value type around as Object. A common place this might happen is if you use ArrayList. If you do, it's time to move on. :) .NET 2.0 introduced generics and you should use them. Generics play nice with value types, so try using a List<T> instead.
So what do I mean when I say "generics play nice with value types"? Unlike Java, whose generics system sucks (it does type erasure, which is half-arsed generics), .NET understands generics at runtime. This means when you define, for example, a List<int>, .NET realises that int is a value type and then will allocate ints inline inside the List as per the "inline storage" explanation above. This is lots better than Java or ArrayList's behaviour, where each element in the array is a pointer to a location on the heap and because the value type that had been added has been boxed.
In hindsight, especially when I think about it all from a C++ perspective, I should have known C# did value type boxing. How could it have value types and not? But I guess I just didn't join the dots.