June 06, 2009 2:00 PM by Daniel Chambers (last modified on January 13, 2011 12:24 PM)
Most of the queries you do in your application are probably static queries. The parameters you set on the query probably change, but the actual query itself doesn't. That's why compiled queries are so cool, because you can pre-compile and reuse a query over and over again and just vary the parameters (see my last blog for more information).
But sometimes you might need to construct a query at runtime. By this I mean not just changing the parameter values, but actually changing the query structure. A good example of this would be a filter, where, depending on what the user wants, you dynamically create a query that culls a set down to what the user is looking for. If you've only got a couple of filter options, you can probably get away with writing multiple compiled queries to cover the permutations, but it only takes a few filter options before you've got a lot of permutations and it becomes unmanageable.
A good example of this is file searching. You can filter a list of files by name, type, size, date modified, etc. The user may only want to filter by one of these filters, for example with "Awesome" as the filename. But the user may also want to filter by multiple filters, for example, "Awesome" as the filename, but modified after 2009/07/07 and more than 20MB in size. To create a static query for each permutation would result in 16 queries (4 squared)!
My first foray into creating dynamic queries is a bit less ambitious than the above example, however. I have a scenario where I need to pull out a number of Tag objects from the database by their IDs. However, the number of the Tag objects needed is determined by the user. They may select 3 Tags, or they may select 6 Tags, or they may select 4 tags; it's up to them.
The most boring approach is, of course, to get each Tag out of the database individually with its own query (the "get each Tag individually" approach):
IList<Tag> list = new List<Tag>(); foreach (int tagId in WantedTagIds) { int localTagId = tagId; Tag theTag = (from tag in context.Tag where tag.Account.ID == AccountId && tag.ID == localTagId select tag).First(); list.Add(theTag); }
You could compile that query to make it run faster, but it's still a slow operation. If the user wants to get 6 Tags, you need to query the database 6 times. Not very cool.
This is where dynamic queries can step in. If the user asks for 3 Tags, you can generate a where clause that gets all three Tags in one go; essentially: tag.ID == 10 || tag.ID == 12 || tag.ID == 14. That way you get all three Tags in one query to the database. So, I wrote some generic type-safe code to perform exactly that: generating a where clause expression from a list of IDs so that a Tag with any of those IDs is retrieved.
To understand how I did this, you need to understand how the where clause in an LINQ expression works. It is easiest to understand if you look at the method-chain form of LINQ rather than the special C# syntax. It looks like this:
IQueryable<Tag> tags = context.Tag.AsQueryable() .Where(tag => tag.ID == 10);
The Where method takes a parameter that looks like this: Expression<Func<Tag, bool>>. Notice how the Func delegate is wrapped in an Expression? This means that instead of creating an actual anonymous method for the Func delegate, the compiler will instead convert your lambda expression into an Expression Tree.
An Expression Tree is a representation of your expression in an object tree. Here is an object tree that shows the main objects in the expression tree generated by the compiler for the lambda expression in the above example's Where method:
The LambdaExpression has a collection of ParameterExpressions, which are the parameters on the left side of the => symbol in the code. The actual Body of the lambda is made up of a BinaryExpression of type Equals, whose Right side is a ConstantExpression that contains the value of 10, and whose Left side is a MemberExpression. A MemberExpression represents the access of the ID property on the tag parameter.
So if we wanted to represent a more complex expression such as:
tag => tag.ID == 10 || tag.ID == 12 || tag.ID == 14
this is what the expression tree would look like. It looks a bit daunting, but computers are very good at trees, so writing code to generate such a tree is not too difficult with the help of a little recursion.
I defined a special utility method that allows you to create an expression tree like the one above that results in a Where expression that accepts a particular tag so long as its ID is in a certain set of IDs. The method is generic and reusable across anywhere where you need a Where filter that gets "this value, or this value, or this value... etc". The public method looks like this:
public static Expression<Func<TValue, bool>> BuildOrExpressionTree<TValue, TCompareAgainst>( IEnumerable<TCompareAgainst> wantedItems, Expression<Func<TValue, TCompareAgainst>> convertBetweenTypes) { ParameterExpression inputParam = convertBetweenTypes.Parameters[0]; Expression binaryExpressionTree = BuildBinaryOrTree(wantedItems.GetEnumerator(), convertBetweenTypes.Body, null); return Expression.Lambda<Func<TValue, bool>>(binaryExpressionTree, new[] { inputParam }); }
It's stuffed full of generics which makes it look more complicated than it really is. Here's how you call it:
List<int> ids = new List<int> { 10, 12, 14 }; Expression<Func<Tag, bool>> whereClause = BuildOrExpressionTree<Tag, int>(wantedTagIds, tag => tag.ID);
As I explain how it works, I suggest you keep an eye on the last expression tree diagram. The method defines two generic types, one called TValue which represents the value you are comparing, in this case the Tag class. The other generic type is called TCompareAgainst and is the type of the value you are comparing against, in this case int (because the Tag.ID property is an int).
You pass the method an IEnumerable<TCompareAgainst>, which in our case is an IEnumerable<int>, because we have a list of IDs we are comparing against.
The second parameter ("convertBetweenTypes") can be a bit confusing; let me explain. The expression we are defining for the Where clause takes a Tag and returns a bool (hence the Func<Tag, bool> typed expression). Since the set of values we are comparing against are ints, we can't just do an == between the Tag and an int. To be able to do this comparison, we need to somehow "convert" the Tag we receive into an int for comparison. This is where the second parameter comes in. It defines an Expression that takes a Tag and returns an int (or in generic terms takes a TValue and returns a TCompareAgainst). When you write tag => tag.ID, the compiler generates an Expression Tree that contains a MemberExpression that accesses ID on the tag ParameterExpression. This means wherever we need to do a Tag == int, we instead do a Tag.ID == int by substituting the Tag.ID MemberExpression generated in the place of the Tag. Here's a diagram that explains what I'm ranting about.
The main purpose of this method is to create the final LambdaExpression that the method returns. It does this by attaching the expression tree built by the BuildBinaryOrTree method (we'll get into this in a second) and the ParameterExpression from the convertBetweenTypes to the final LambdaExpression object.
The BuildBinaryOrTree method looks like this:
private static Expression BuildBinaryOrTree<T>( IEnumerator<T> itemEnumerator, Expression expressionToCompareTo, Expression expression) { if (itemEnumerator.MoveNext() == false) return expression; ConstantExpression constant = Expression.Constant(itemEnumerator.Current, typeof(T)); BinaryExpression comparison = Expression.Equal(expressionToCompareTo, constant); BinaryExpression newExpression; if (expression == null) newExpression = comparison; else newExpression = Expression.OrElse(expression, comparison); return BuildBinaryOrTree(itemEnumerator, expressionToCompareTo, newExpression); }
It takes an IEnumerator that enumerates over the wantedItems list (from the BuildOrExpressionTree method), an expression to compare each of these wanted items to (which is the compiler-generated MemberExpression from BuildOrExpressionTreeMethod), and an expression from a previous recursion (starts off as null).
The method creates an Equals BinaryExpression that compares the expressionToCompareTo and the current itemEnumerator value. It then joins this in an OrElse BinaryExpression comparison with the expression from previous recursions. It then takes this new expression and passes it down to the next recursive call. This process continues until itemEnumerator is exhausted at which point the final expression tree is returned.
Once this returned expression tree is placed in its LambdaExpression by the BuildOrExpressionTree method, you end up with a pretty expression tree like this one shown previously. We can then use this expression tree in the where clause of a LINQ method chain query.
Here's the final "generated where clause" query in action:
using (DHEntities context = new DHEntities()) { int[] wantedTagIds = new[] {12, 24, 1, 4, 32, 19}; Expression<Func<Tag, bool>> whereClause = ExpressionTreeUtil.BuildOrExpressionTree<Tag, int>(wantedTagIds, tag => tag.ID); IQueryable<Tag> tags = context.Tag.Where(whereClause); IList<Tag> list = tags.ToList(); }
So how much better is this approach, which is decidedly more complex than the simple "get each Tag at a time" approach? Is it worth the effort? I performed some benchmarks similar to the ones I did in the last blog to find out.
In one benchmark run, I ran these queries, each a hundred times, each getting out the same 6 tags:
I then ran the benchmark 100 times so that I could get more reliable averaged values. These are the results I got:
Average | Standard Deviation | |
---|---|---|
"Get Each Tag Individually" Query Loop (Uncompiled) | 3212.2ms | 40.2ms |
"Get Each Tag Individually" Query Loop (Compiled) | 1349.3ms | 24.2ms |
"Generated Where Clause" Query Loop | 197.8ms | 5.3ms |
As you can see, the Generated Where Clause approach is quite a lot faster than the individual queries. We can see compiling the Individual query helps, but not enough to beat the Generated Where Clause query, which is faster even though it is recompiled each time! (You can't precompile a dynamic query, obviously). The Generated Where Clause query is 6.8 times faster than the compiled Individual query and a whopping 16.2 times faster than the uncompiled Individual query.
Even though dynamic queries are lots harder than normal static queries, because you have to manually mess with Expression Trees, there are large payoffs to be had in doing so. When used in the appropriate place, dynamic queries are faster than static queries. They could also potentially make your code cleaner, especially in the case of the filter example I talked about at the beginning of this blog. So consider getting up to speed with Expression Trees. It's worth the effort.
Submit Comment | Comments RSS Feed
August 08, 2011 1:09 PM
So, let's say you have multiple parameters for the where clause... how would one go about implementing that? Is it as simple as creating two where expressions? Great blog post!
August 08, 2011 1:23 PM
Ok, answered my own question... the answer is yes. You can create multiple where clauses and chain them. Another question would be how to do and clauses...
Ilyas
March 23, 2012 3:00 PM
Why not do the following instead:
Using context = New NewProData_RC3Entities
Dim Ids As New List(Of Integer)
Ids.Add(1)
Ids.Add(2)
Dim query = context.Addresses.Where(Function(x) Ids.Contains(x.Address_PK))
Dim result1 = query.ToList()
End Using