Thursday, May 19, 2011

Some LINQ to Object Internals – How does the C# compiler mix Expression Trees with the LINQ operators extension methods?

I’m build a lot of code on top of LINQ that uses Queryables and Expression Trees and I have to admit it gave me a hard time. This post is about sharing some of the stuff I have learned by looking at the LINQ source code and I hope that some of you will also share some of your understanding on this topic here.

For the sake of simplicity lets stick with LINQ to Objects and group all extension methods and LINQ providers that deal with IEnumerable as Enumerables and the ones that deal with IQueryable as Queryables. If you look at their source code you will see that the actual code that runs behind the scenes is quite simple and most of the magic is actually triggered by the signatures.

Queryables receive Expression<Delegate> and Enumerables delegates. By reading through the documentation the explanation for this is quite simple. An Enumerable is designed to work locally while a Queryable is designed in a way that you can serialize the query and execute it somewhere else.

The first example here will start with a very simple query:

var root = new int[] { }.AsQueryable();
var query = from r in root
    where r > 10
    orderby r descending
    select r;

In the brackgroung the compiler is emitting something equivalent to the following snippet (I have changed the code a bit to make it easier to read):

var parameter = Expression.Parameter(typeof(int), "item");
var q = new int[0].AsQueryable<int>().Where<int>(
     Expression.Lambda<Func<int, bool>>(
         Expression.GreaterThan(parameter, Expression.Constant(10, typeof(int)))
         , new ParameterExpression[] { parameter })
 )
 .OrderByDescending<int, int>(
     Expression.Lambda<Func<int, int>>(
         parameter = Expression.Parameter(typeof(int), "r"), 
 new ParameterExpression[] { parameter }));

When the AsQueryable extension method is called it receives an array of items with 0 elements. The where clause in C# will translate into a call to the Where extension method (in this case the one that applies to Queryable). Because that extension method receives Expression<Delegate> and not the Delegate the compiler translates the “r > 10” into the equivalent expression tree. That expression tree can then be serialized and sent to be executed elsewhere or it can be translated to code by calling Lamdba.Compile.

I was trying to understand why taking an Expression from an IQueryable and calling provider CreateQuery on another queryable over the same data type did not work. By looking to the code generated by the compiler the awnser seams obvious. Although a part of the Expression is constructed by using Expression Trees the most important part (the Where) is still a call to the generic Where on top of an array that contains zero elements. The only way to change this is to create a visitor that emits an equivalent call chain on another data source. If I was using a LINQ to Blah Blah Blah provider that serialized the query it might work because it would depend on the interpretation that the provider gave to the root of the query, that is int[0], maybe a customers[] would be interpreted as the table of customers :).

Have fun,

Thursday, May 5, 2011

Beatifull Code Part 2 – Combining RIA Services with Reactive Extensions to build a push based search

A common scenario in most applications is looking up values based on the user input in a auto-complete fashion. If you want to do this with a RIA service you have to (let’s ignore MVVM for the sake of simplicity) subscribe to the TextChanged event, grab the Text value, get the query from the DomainContext, apply a Where clause using the given text, let’s say StartsWith, create a LoadOperation, wait for the completed event and on the completed event clear the contents of the previous search and write the new search result. Its really messy!

It is complex because you are converting a data source that is push based (the textbox pushes text to your application) in to a pull based logic and you have to handle all those async problems yourself. But imagine that your text box is pushing a projection of the data filtered by the typed value into the search result control. You can think of a “eih C# just put the xxxs that StartWith ‘text’ into the data grid!” How do we write this?

I use a very dummy user interface:

<sdk:DataGrid AutoGenerateColumns="True" Height="296" HorizontalAlignment="Left" Margin="26,178,0,0" Name="dataGrid1" VerticalAlignment="Top" Width="598" />
<TextBox Height="23" HorizontalAlignment="Left" Margin="26,139,0,0" Name="textBox1" VerticalAlignment="Top" Width="278" />
<sdk:Label Height="28" HorizontalAlignment="Left" Margin="26,105,0,0" Name="label1" VerticalAlignment="Top" Width="120" Content="Name" />

And this is the code that pushes the TextChanged event into the service and back to the datagrid:

  var observableText = Observable.FromEvent<TextChangedEventArgs>(this.textBox1, "TextChanged");

  var text = (from change in observableText
             select ((TextBox)change.Sender).Text)
             .DistinctUntilChanged()
             .Throttle(TimeSpan.FromSeconds(1));

  var queryStream = from input in text
                    select context.GetCustomersQuery().Match(i => i.FirstName.StartsWith(input));

  var loadStream = from query in queryStream
                   select CreateLoad(query);

  var resultEventStream = from load in loadStream
                          select
                               Observable.FromEvent<EventArgs>(load, "Completed");

  var resultStream = from ev in resultEventStream
                     select ev
                         .ObserveOnDispatcher()
                         .Subscribe(current =>
                         {
                             var completedOperation = (LoadOperation<IndividualCustomer>)current.Sender;
                             this.customers.Clear();
                             foreach (var c in completedOperation.Entities)
                             {
                                 this.customers.Add(c);
                             }
                         });


  this.topmost = resultStream.Subscribe(); 

  this.dataGrid1.ItemsSource = this.customers;

The CreateLoad is just a helper function to allow customizing the query:

private LoadOperation<IndividualCustomer> CreateLoad(EntityQuery<IndividualCustomer> q)
{
    return this.context.Load(q);
}

The algorithm is quite simple, I take the event from the TextBox and convert it to an Observable of Text inputs. Then I take that Observable and create an Observable of Queries, them into an Observable of queries loading and  finally to an Observable of the results of loading those queries. I subscribe to this final one and conceptually I have the text that the user typed pushing loaded data back to me :).

Have fun,

Pedro

Wednesday, May 4, 2011

Beatifull Code Part 1 – Implement the repository pattern asynchronously with RIA Services

 

I’ve bought the book “Beautiful Code” published by O’Reilly and edited by Andy Oram and Greg Wilson and it has inspired my work in the last days. It doesn’t actually teach writing beautiful code or it would just be another GoF or Fowler book, it teaches you how the best think and it puts you thinking like the best. Hopefully thinking like the best will make you one of the best :).

Ten years ago when I left school if one would ask me where the best programmers spend their time I guess that I would answer writting very complex very fast. Yet today I spent hours thinking on an interface and how would people use it. I did that because I realized (through the book) that the best delete code instead of writing it. Code is bad! Code is ugly! Code is money spent! The less the better! What I really want is to reuse the same code over and over to solve equivalent problems by hiding the details.

What I was trying to do is to bridge the gap between the way you change collections of objects and the way you use RIA services to change tables. I looked at IEnumerable<T>, at ICollection<T>, at IList<T>, at EntitySet<T>, all the change tracking interfaces and the paged collection interfaces and at the end the only phrase on my brain was what a mess!

It is amazing how a mess can be created from a very simple mistake, the idea that operations that act on sets are synchronous. Have you ever asked yourself why would Add return void? Why would it return “immediately”? Imagine you are adding one element to a collection that as 10^9 elements and must always be sorted, it can take quite some time until add returns control to the caller. Until that happens the caller is blocked, wasting resources doing nothing. But if you say Add cannot be synchronous what would it look like? Well if I was on the Desktop market then I would say Task<T> but since I am on the Silverlight I must come up with a different name to the same idea because I don’t want to clash with that name in the future. So I came up with IResult:

 

/// <summary>
/// Represents the result of a method that cannot be completed synchronously.
/// </summary>
/// <remarks>
/// For void methods we consider that if no Exception was thrown the method executed successfully.
/// </remarks>
public interface IResult
{
    /// <summary>
    /// Occurs when all the activities initiated by the method that return this instance are completed.
    /// </summary>
    event EventHandler<EventArgs> Completed;

    /// <summary>
    /// Gets the error that ocurred during the execution of the activities triggered by the method that 
    /// returned this instance or null if no error occurred.
    /// </summary>
    Exception Error
    {
        get;
    }

    /// <summary>
    /// Gets a value indicating whether the method that returned this instance terminated in error.
    /// </summary>
    bool HasError
    {
        get;
    }

    /// <summary>
    /// Gets a value indicating whether the method that returned this instance finished its execution.
    /// </summary>
    bool IsComplete
    {
        get;
    }

    /// <summary>
    /// Gets a tag that identifies the method that returned this instance.
    /// </summary>
    string Action
    {
        get;
    }

    /// <summary>
    /// Gets a value representing the input of the method that returned this instance.
    /// </summary>
    IEnumerable Request
    {
        get;
    }

    /// <summary>
    /// Blocks the current thread until the underlaying activities complete.
    /// </summary>
    void Wait();

    /// <summary>
    /// Blocks the current thread until the underlaying activities complete but setting a timeout.
    /// </summary>
    /// <param name="miliseconds">The amount of time in miliseconds that the current thread will wait for a result.</param>
    void Wait(int miliseconds);
}

IResult is basically a pull based mechanism for someone to pull asynchronous method call state. We will get to a push base mechanism on a future article. IResult is inspired in OperationBase from RIA but it attempts to hide the semantics linked with web requests.

Ok, now that we have IResult how would a Repository contract look like?

/// <summary>
/// A set of elements that cannot be changed synchronously and that interfaces with a persistent storage
/// using a collection like interface.
/// </summary>
/// <typeparam name="T">The Type of elements in the repository.</typeparam>
public interface IRepository<in T>
{
    /// <summary>
    /// Gets a value indicating whether the repository can be modified.
    /// </summary>
    bool IsReadOnly
    {
        get;
    }

    /// <summary>
    /// Gets a value indicating whether the repository was modified.
    /// </summary>
    bool IsChanged
    {
        get;
    }

    /// <summary>
    /// Gets a value indicating whether the repository allows adding elements.
    /// </summary>
    bool CanAdd
    {
        get;
    }

    /// <summary>
    /// Gets a value indicating whether the repository allows removing elements.
    /// </summary>
    bool CanRemove
    {
        get;
    }

    /// <summary>
    /// Gets the number of elements in the set asynchronously.
    /// </summary>
    IResult<int> Count
    {
        get;
    }

    /// <summary>
    /// Adds the given element to the set asynchronously.
    /// </summary>
    /// <param name="item">The element to add to the set.</param>
    /// <returns></returns>
    IResult Add(T item);

    /// <summary>
    /// Determines asynchronously whether the given element is in the set or not.
    /// </summary>
    /// <param name="item">The element to check whether it is on the set or not.</param>
    /// <returns>A representation of the current state of the activities initiated by the method call.</returns>
    IResult<bool> Contains(T item);

    /// <summary>
    /// Removes asynchronously the element from the set.
    /// </summary>
    /// <param name="item">The element to add to the set.</param>
    /// <returns>A representation of the current state of the activities initiated by the method call.</returns>
    IResult Remove(T item);

    /// <summary>
    /// Accepts the changes made to the set making them permanent.
    /// </summary>
    /// <returns>A representation of the current state of the activities initiated by the method call.</returns>
    IResult AcceptChanges();

    /// <summary>
    /// Discards the changes made to the set reverting to the values that where previously on the persistent media.
    /// </summary>
    /// <returns>A representation of the current state of the activities initiated by the method call.</returns>
    IResult RejectChanges();
}

I would recommend you to spend a couple of minutes thinking on Count. In a standard collection Count would return the number of elements that are in the data structure holding the data, but when we are counting records on a table through a service call that can take quite some time. It makes sense to return an IResult. When the Count operation is completed then the Completed event will be raised and the program can use that value. Until that happens it can do other stuff. In the next article we will see how we can tweak this interface to take advantage of pull base mechanisms and reactive extensions. I’m building a framework based on these ideas and I will release the source code if someone is interested in it.