L&M Solutions

Your SharePoint consultant
Home     LINQ4SP     About Us     Contact Us     Site Map      
LINQ4SP Query Evaluation

I am sure you have already heard about the Linq4SP project. We started this project in order to create a useful and practical layer above the SharePoint object model.

Conceptually, we were going to develop a query engine at first which allowed us to query SharePoint list items as objects. This version was to be an abstraction layer, above the CAML queries in order to simplify the work with them. During the development process we had to realize: what we have actually opened up was a Pandora’s Box. The endless possibilities of usage started to unfold as we dug deep into the rabbit’s hole. So, we decided to implement as many features as we could for the first beta version.

You can find here the complete list of features and releases we plan to issue. Download here the most recent version of the product.

Fortunately, SharePoint as a very complex and diverse system provides lots of features to be implemented :) Since possibilities are almost endless, we have to make a very clear and easily extending design. During the planning phase, we had to make several conceptual decisions. While decisions were made, we weren’t sure however how many real use situations we also closed by them.

I would like to share information in my blog about these design decisions and I would like to give examples on the correct and intended usage of Linq4SP.

Let me start with the basics…

Query evaluation
The evaluation of a Linq query is quite straightforward. The provider of the query source creates a new query object based on the given query expression, and returns it. When the client code tries to iterate through the results, the query object triggers the provider to evaluate the expression tree and to turn the tree into one or more CAML queries. At this point we should stop a little bit. How can the provider translate everything to CAML queries?

Distinguishing the place of execution
Actually it cannot be done. There are several operations which simply cannot be translated to CAML. At this point we faced an architectural design decision: what should we do, if some parts of the query are not supported by the CAML syntax?

We could choose from two approaches. One was to treat developers as fools and throw an exception when any part of the query expression was not supported by the CAML syntax. This would have been the easy one. We could have put the blame on the CAML syntax completely.  The other approach was to trust in developers’ knowledge and let the query parser intelligently evaluate the whole query. We have chosen this approach. But actually, how could be evaluated what is not supported by SharePoint?

We simply should split the query into two parts. One part should be translated to CAML query and thus executed remotely on the server. The other part cannot be evaluated to a CAML query. What can we do with that part? We left simply it to be executed locally on the client side when and after the results arrived.

So for the terminology: we call the part which can be translated to CAML query to be the remote query; we call the rest a local query. All in all, what cannot be put in the remote query remains to be executed locally, that is exactly the rest…

Examples   
In my examples I will use a simplified partner store with two tables.



I have two lists, Person and Company with the following columns:

Person
  • Age: Number
  • Gender: Choice { Male, Female }
  • Interests: MultiChoice { Movies, Games, Sports, Work, Studies }
  • Name: Text
  • Workplace: Lookup field to Company

Company
  • CompanyName: Text
  • WebSite: Url
  • Moderators: MultiUser

For both choice fields the Linq4SP class generator generates an enumeration type.

Remote query
Consider the following code block.
using (Context context = new Context())
{
    var q = from c in context.Company
    where c.WebSite == "http://www.lmsolutions.hu"
    select c;

    foreach (Company c in q)
    {
        ObjectDumper.DumpToTrace(c);
    }
}

Even in this tiny block of code there is lots of information.

First, notice that the context object is created inside a using statement. It means that the generated context class implements the IDisposable interface, and should be disposed after usage. I really suggest using the context this way. The context should be instantiated in per operation based fashion. Should not be stored, neither cached. Moreover, it is designed to store and cache operation wide objects and it is responsible for those lifecycles, just like SPWeb or SPSite.

The second to notice (from top to bottom) that the WebSite property of the queried Company is simple compared to a string. Yes, this comfort is part of the facilities provided by the UrlValue class. When a string is used against an UrlValue, it is parsed as an absolute or relative URI.

The third thing to notice is the ObjectDumper class which writes all the available properties and nested objects of an entity instance to the trace console.
In this case all parts of the query evaluate to remote query translated to CAML. The resulting CAML query follows (the ViewFields part was truncated):

<Where><Eq><FieldRef Name="WebSite" /><Value Type="URL">http://www.lmsolutions.hu</Value></Eq></Where>
Local query
The following code block is a bit trickier:
using (Context context = new Context())
{
    var q = from p in context.Person
    where p.Name.Split(' ').Length > 2
    select p;

    foreach (Person p in q)
    {
        ObjectDumper.DumpToTrace(p);
    }
}

We search for those persons who have more than two names (and we assume that the names are separated by a space). In this example none of the query operations can evaluate to a CAML query, because the Split method has no corresponding operator in the CAML syntax. Hence the remote query evolves to an empty select, and the local query lasts to contain the whole filter expression.

Strictly speaking: developers should be aware of what remains in the local query. Local query execution is slower because it needs the whole result set to be loaded into memory first. The executed CAML queries are always written on the trace console in debug mode. So, developers should check the final CAML against their own linq query to see what left to be executed on the client side.

.. to be continued with query execution and lookup queries.

Daniel Leiszen, L&M Solutions
Tuesday, July 22, 2008 02:04:41 PM