Thursday, May 19, 2011

Some LINQ to Object Internals – How does the C# compiler mix Expression Trees with the LINQ operators extension methods?

I’m build a lot of code on top of LINQ that uses Queryables and Expression Trees and I have to admit it gave me a hard time. This post is about sharing some of the stuff I have learned by looking at the LINQ source code and I hope that some of you will also share some of your understanding on this topic here.

For the sake of simplicity lets stick with LINQ to Objects and group all extension methods and LINQ providers that deal with IEnumerable as Enumerables and the ones that deal with IQueryable as Queryables. If you look at their source code you will see that the actual code that runs behind the scenes is quite simple and most of the magic is actually triggered by the signatures.

Queryables receive Expression<Delegate> and Enumerables delegates. By reading through the documentation the explanation for this is quite simple. An Enumerable is designed to work locally while a Queryable is designed in a way that you can serialize the query and execute it somewhere else.

The first example here will start with a very simple query:

var root = new int[] { }.AsQueryable();
var query = from r in root
    where r > 10
    orderby r descending
    select r;

In the brackgroung the compiler is emitting something equivalent to the following snippet (I have changed the code a bit to make it easier to read):

var parameter = Expression.Parameter(typeof(int), "item");
var q = new int[0].AsQueryable<int>().Where<int>(
     Expression.Lambda<Func<int, bool>>(
         Expression.GreaterThan(parameter, Expression.Constant(10, typeof(int)))
         , new ParameterExpression[] { parameter })
 )
 .OrderByDescending<int, int>(
     Expression.Lambda<Func<int, int>>(
         parameter = Expression.Parameter(typeof(int), "r"), 
 new ParameterExpression[] { parameter }));

When the AsQueryable extension method is called it receives an array of items with 0 elements. The where clause in C# will translate into a call to the Where extension method (in this case the one that applies to Queryable). Because that extension method receives Expression<Delegate> and not the Delegate the compiler translates the “r > 10” into the equivalent expression tree. That expression tree can then be serialized and sent to be executed elsewhere or it can be translated to code by calling Lamdba.Compile.

I was trying to understand why taking an Expression from an IQueryable and calling provider CreateQuery on another queryable over the same data type did not work. By looking to the code generated by the compiler the awnser seams obvious. Although a part of the Expression is constructed by using Expression Trees the most important part (the Where) is still a call to the generic Where on top of an array that contains zero elements. The only way to change this is to create a visitor that emits an equivalent call chain on another data source. If I was using a LINQ to Blah Blah Blah provider that serialized the query it might work because it would depend on the interpretation that the provider gave to the root of the query, that is int[0], maybe a customers[] would be interpreted as the table of customers :).

Have fun,

No comments: