DigitallyCreated
Blog

Only showing posts tagged with "Type-Safety"

Value Type Boxing in C#

There are times when I am surprised because I come across some basic principle or feature in a programming language that I just didn't know about but really should have (see the "Generics and Type-Safety" blog for an example). The most recent example of this was in my Enterprise .NET lecture where they asked us to define what boxing and unboxing was. I'd heard of it in relation to Java, because Java has non-object value types that need to be converted to objects sometimes (the process of "boxing") so they can be used with Java's crappy generics system. But since, in C#, even an int is an object with methods, I assumed that boxing and unboxing was not done in C#.

I was wrong. C# indeed does boxing and unboxing! At first, this didn't make sense. My incomplete understanding of boxing (in relation to Java) was that value types were stored only on the stack (yes, this is a little inaccurate) and when you needed to put them on the heap, you had to box them. In C#, I thought everything was an object, so this process would have been redundant.

Wrong. .NET (and therefore C#) has value types, which are boxed and unboxed transparently by the CLR. Value types in C# derive from the ValueType class which itself derives from Object. Structs in C# are automatically derived from ValueType for you (therefore you cannot do inheritance with structs). Unlike in Java, value types are still objects: they can have methods, fields, properties, events, etc.

Why are value types good? When .NET deals with a value type, it stores the object's data inline in memory. This means when the variable is on the stack, the data is stored directly in stack-space. When the variable is inside a heap object, the data is stored directly inside the heap object. This is different to reference types, where instead of the data being stored inline, a pointer to the data which is somewhere on the heap is stored inline. This means it takes longer to access a reference type than a value type as you have to read the pointer, then read the location the pointer points to.

Boxing kills this performance increase you get when you use value types. When you box (or more accurately, the CLR boxes) a value type, it essentially wraps it in a reference type object that is stored on the heap and then uses a reference to point to it. Your value type is now a reference type. So not only do you need to look up a reference to get to the final data, you have to spend time creating the wrapper object at runtime.

When does boxing happen? The main place to watch out for is when you pass a value type around as Object. A common place this might happen is if you use ArrayList. If you do, it's time to move on. :) .NET 2.0 introduced generics and you should use them. Generics play nice with value types, so try using a List<T> instead.

So what do I mean when I say "generics play nice with value types"? Unlike Java, whose generics system sucks (it does type erasure, which is half-arsed generics), .NET understands generics at runtime. This means when you define, for example, a List<int>, .NET realises that int is a value type and then will allocate ints inline inside the List as per the "inline storage" explanation above. This is lots better than Java or ArrayList's behaviour, where each element in the array is a pointer to a location on the heap and because the value type that had been added has been boxed.

In hindsight, especially when I think about it all from a C++ perspective, I should have known C# did value type boxing. How could it have value types and not? But I guess I just didn't join the dots.

Generics and Type-Safety

I ran into a little issue at work to do with generics, inheritance and type-safety. Normally, I am an absolute supporter of type-safety in programming languages; I have always found that type safety catches bugs at compile-time rather than at run-time, which is always a good thing. However, in this particular instance (which I have never encountered before), type-safety plain got in my way.

Imagine you have a class Fruit, from which you derive a number of child classes such as Apple and Banana. There are methods that return Collection<Apple> and Collection<Banana>. Here's a code example (Java), can you see anything wrong with it (excluding the horrible wantApples/wantBananas code... I'm trying to keep the example small!)?

Collection<Fruit> fruit;

if (wantApples)
    fruit = getCollectionOfApple();
else if (wantBananas)
    fruit = getCollectionOfBanana();
else
    throw new BabyOutOfPramException();

for (Fruit aFruit : fruit)
    aFruit.eat();

Look OK? It's not. The error is that you cannot hold a Collection<Apple> or Collection<Banana> as a Collection<Fruit>! Why not, eh? Both Bananas and Apples are subclasses of Fruit, so why isn't a Collection of Banana a Collection of Fruit? All Bananas in the Collection are Fruit!

At first I blamed this on Java's crappy implementation of generics. In Java, generics are a compiler feature and not natively supported in the JVM. This is the concept of "type-erasure", where all generic type information is erased during a compile. All your Collections of type T are actually just Collections of no type. The most frustrating place where this bites you is when you want to do this:

interface MyInterface
{
    private void myMethod(Collection<String> strings);
    private void myMethod(Collection<Integer> numbers);
}

Java will not allow that, as the two methods are indistinguishable after a compile, thanks to type-erasure. Those methods actually are:

interface MyInterface
{
    private void myMethod(Collection strings);
    private void myMethod(Collection numbers);
}

and you get a redefinition error. Of course, .NET since v2.0 has treated generics as a first-class construct inside the CLR. So the equivalent to the above example in C# would work fine since a Collection<String> is not the same as a Collection<Integer>.

Anyway, enough ranting about Java. I insisted to my co-worker that I was sure C# with its non-crappy generics would have allowed us to assign a Collection<Apple> to a Collection<Fruit>. However, I was totally wrong. A quick Google search told me that you absolutely cannot allow a Collection<Apple> to be assigned to a Collection<Fruit> or it will break programming. This is why:

Collection<Fruit> fruit;
Collection<Apple> apples = new ArrayList<Apple>();
fruit = apples; //Assume this line works OK
fruit.add(new Banana());

for (Apple apple : apples)
   apple.eat();

Can you see the problem? By storing a Collection<Apple> as a Collection<Fruit> we suddenly make it OK to add any type of Fruit to the Collection, such as the Banana on line 4. Then, when we foreach through the apples Collection (which now contains a Banana, thanks to line 4) we would get a ClassCastException because, holy crap, a Banana is not an Apple! We just broke programming.

So how can we make this work? In Java, we can use wildcards:

Collection<? extends Fruit> fruit;

if (wantApples)
    fruit = getCollectionOfApple();
else if (wantBananas)
    fruit = getCollectionOfBanana();
else
    throw new BabyOutOfPramException();

for (Fruit aFruit : fruit)
    aFruit.eat();

Disappointingly, C# does not support the concept of wildcards. The best I could do was this:

private void MyMethod()
{
    if (_WantApples)
        EatFruit(GetEnumerableOfApples());
    else if (_WantBananas)
        EatFruit(GetEnumerableOfBananas());
    else
        throw new BabyOutOfPramException();
}

private void EatFruit<T>(IEnumerable<T> fruit) where T : Fruit
{
    foreach (T aFruit in fruit)
        aFruit.eat();
}

Basically, we're declaring a generic method that takes any type of Fruit, and then the compiler is inferring the type to be used for the EatFruit method by looking at the return type of the two getter methods. This code is not as nice as the Java code.

You must be wondering, however, what if we added this line to the bottom of the above Java code:

fruit.add(new Banana());

What would happen is that Java would issue an error. This is because the generic type "? extends Fruit" actually means "unknown type that extends Fruit". The "unknown type" part is crucial. Because the type is unknown, the add method for Collection<? extends Fruit> actually looks like this:

public boolean add(null o);

Yes! The only thing that the add method can take is null, because a reference to anything can always be set to null. So even though we don't know the type, we know that no matter what type it is, it can always be null. Therefore, trying to pass a Banana into add() would fail.

The foreach loop works okay because the iterator that works inside the foreach loop must always return something that is at least a Fruit thanks to the "extends Fruit" part of the type definition. So it's okay to set a Fruit reference using a method that returns "? extends Fruit" because the thing that is returned must be at least a Fruit.

Although obviously wrong now, the assignment of Collection<Apple> to Collection<Fruit> seemed to make sense when I first encountered it. This has enlightened me to the fact that there are nooks and crannies in both C# and Java that I have yet to explore.