Saturday, February 28, 2009

Getting deep on Filtering

This is the second post in a series on filtering. This is really getting into the gory details of filtering and may seem a little dry. Skip it if it bores you.

Terms and Definitions
  • Search: you start with nothing and get something
    • Think of web search. You start with a blank screen and get some results back
  • Filter: You start with something and get less
    • You have a full collection of music. By setting up a criteria you will see less than everything
  • Find: You start with something and move around in it
    • Try it in Word or in your browser. You find the first instance of your search string but you can still see the rest of the document.
Attributes, Operators and operands

‘Artist equals NOFX’ follows the normal pattern for a filter criterion on songs. When we break it down into pieces we will use the following terms:

· Attribute: Artist is an attribute just as Rating, Play count, and Release Year

· Operator: Equals is the operator. Typically filter criteria uses operators such as Contains, equals, above, below, and between.

· Operand: In the example above, ‘NOFX’ is the operand. Operand is just a fancy word for the value(s) we want to operate on.

Boolean filtering

At the end of the day, filtering is really about finding the right subset of data, and if you want to get in deep, you should read up on set theory. For this discussion that will not be necessary though. We don’t really care about mathematics and what people could possibly do, we care about making a filtering mechanism that works well for the majority of cases. ‘Works well’ means that it is easy to use and complete enough to cover 95% of the scenarios of the people who use our products.

You will nonetheless need to understand a little about AND, OR and grouping.

If you have a basket of apples, some green, some red, and those apples are of different sizes and weight, you can subdivide the apples into smaller groups by those properties. You could for instance pick up only green apples, or only red apples that are larger than a tennis ball. To express the criteria by which you picked the apples, the latter could be expressed as

Color is ‘red’ AND size = Larger than tennis ball.

Now say you weren’t looking at apples but bell peppers. They come in small, medium, and large and in Green, Yellow, Orange, and Red. Now if you are like me, you prefer small green ones. But if you are more colorful you may like yellow, orange, red in medium and large. To get those you would use the criteria:

Color is Yellow, Color is Orange, Color is Red, Size is Medium, Size is large.

Now, without any additional information about which information goes together you have effectively used an OR

Color is Yellow OR Color is Orange OR Color is Red OR Size is Medium OR Size is large.

and may end up with Small Yellows or Medium Greens. Actually, the only kind you will not end up with are small green ones, which is good because those would then be left for me :) but you clearly got more than what you intended.

However, if you only use AND you not get any bell peppers at all, clearly also not what you intended.

Color is Yellow AND Color is Orange AND Color is Red AND Size is Medium AND Size is large.

What you want is a mix of AND and OR with some parenthesis to demarcate which pieces goes together

(Color is Yellow OR Color is Orange OR Color is Red) AND (Size is Medium OR Size is large).

With this criteria in place we have the foundation for all the filtering we need. If you look closer at it you will see that we have OR between all criteria for the same attribute, parenthesis around all the criteria that is related to the same attribute and AND between parenthesis. This is a core concept which we will dive much more into once we get to the Active Directory Administrative Center filtering: Criteria on the same attribute are OR’ed together and grouped, Criteria on different attributes are AND’ed

Next we will look at different kinds of filtering and after that we will look at some product examples.

No comments:

Post a Comment