Here we are talking about filtering list items again :). I got two comment suggesting the use of yield return.

James Curran said:

Don’t build a new list if you don’t have to.

private static IEnumerable EvenSlicker_FilterPositiveItems(IEnumerable t)
{
foreach(int n in t)
if (n > 0)
yield return n;

}

The advantage is if you try using the method this this:

foreach(int n in FilterPositiveItems(myList) {…}

Your way is O(2N) while mine is O(N)

and Configurator said:

Once again, I have to reiterate what I’ve stated here before.

I know you don’t like the ‘yield’ keyword, but it’s things like this that it’s meant for.

All of your examples create a copy of your list in memory, filtering only the needed results.

However, there are two methods that are a lot better unless you actually need a List:

1. Use the yield keyword directly. This function will return an IEnumreable object that when enumerated finds the results you want.

private static IEnumerable FilterPositives(IEnumerable numbers) {
foreach (int num in numbers)
if (num > 0)
yield return num;
}

2. Use LINQ’s Where method. The following two functions will return the same result as my previous example.

private static List FilterPositives2(List numbers) {
return numbers.Where(intpredicate); // intpredicate is your method from earlier
}

private static List FilterPositives3(List numbers) {
return numbers.Where(num => num > 0);
}

First of all thanks to Configurator and James for opening up a discussion. As configurator said "I am not fond of the Yield keyword" but I do recognize it must be used sometimes or we will never have had the opportunity to enjoy LINQ. However It must be said that I don’t like Yield for 2 reasons.

  1. Code Readability. I think that using yield makes debugging and understanding of the code mush harder.
  2. Its not Free. The compiler works pretty hard on creating many variables in the background, check it out below.

I took the code that was suggested by both of them and created a simple console application:

   1: class Program
   2: {
   3:     static void Main(string[] args)
   4:     {
   5:         List<int> t = new List<int> { 3, 4, -3, -4, 6, 1, -65, 2, -5, 2, -4 };
   6:         t = (List<int>)FilterPositives(t);
   7:     }
   8:  
   9:     private static IEnumerable FilterPositives(IEnumerable<int> numbers)
  10:     {
  11:         foreach (int num in numbers)
  12:             if (num > 0)
  13:                 yield return num;
  14:     }
  15: }

I then took the .EXE file and threw it into the reflector. Here is the code that was created by the compiler:

   1: internal class Program
   2: {
   3:     // Methods
   4:     private static IEnumerable FilterPositives(IEnumerable<int> numbers)
   5:     {
   6:         <FilterPositives>d__1 d__ = new <FilterPositives>d__1(-2);
   7:         d__.<>3__numbers = numbers;
   8:         return d__;
   9:     }
  10:  
  11:     private static void Main(string[] args)
  12:     {
  13:         List<int> <>g__initLocal0 = new List<int>();
  14:         <>g__initLocal0.Add(3);
  15:         <>g__initLocal0.Add(4);
  16:         <>g__initLocal0.Add(-3);
  17:         <>g__initLocal0.Add(-4);
  18:         <>g__initLocal0.Add(6);
  19:         <>g__initLocal0.Add(1);
  20:         <>g__initLocal0.Add(-65);
  21:         <>g__initLocal0.Add(2);
  22:         <>g__initLocal0.Add(-5);
  23:         <>g__initLocal0.Add(2);
  24:         <>g__initLocal0.Add(-4);
  25:         List<int> t = <>g__initLocal0;
  26:         t = (List<int>) FilterPositives(t);
  27:     }
  28:  
  29:     // Nested Types
  30:     [CompilerGenerated]
  31:     private sealed class <FilterPositives>d__1 : IEnumerable<object>
  32:                                                  , IEnumerable
  33:                                                  , IEnumerator<object>
  34:                                                  , IEnumerator
  35:                                                  , IDisposable
  36:     {
  37:         // Fields
  38:         private int <>1__state;
  39:         private object <>2__current;
  40:         public IEnumerable<int> <>3__numbers;
  41:         public IEnumerator<int> <>7__wrap3;
  42:         private int <>l__initialThreadId;
  43:         public int <num>5__2;
  44:         public IEnumerable<int> numbers;
  45:  
  46:         // Methods
  47:         [DebuggerHidden]
  48:         public <FilterPositives>d__1(int <>1__state)
  49:         {
  50:             this.<>1__state = <>1__state;
  51:             this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId;
  52:         }
  53:  
  54:         private void <>m__Finally4()
  55:         {
  56:             this.<>1__state = -1;
  57:             if (this.<>7__wrap3 != null)
  58:             {
  59:                 this.<>7__wrap3.Dispose();
  60:             }
  61:         }
  62:  
  63:         private bool MoveNext()
  64:         {
  65:             bool CS$1$0000;
  66:             try
  67:             {
  68:                 switch (this.<>1__state)
  69:                 {
  70:                     case 0:
  71:                         break;
  72:  
  73:                     case 2:
  74:                         goto Label_0081;
  75:  
  76:                     default:
  77:                         goto Label_009F;
  78:                 }
  79:                 this.<>1__state = -1;
  80:                 this.<>7__wrap3 = this.numbers.GetEnumerator();
  81:                 this.<>1__state = 1;
  82:                 while (this.<>7__wrap3.MoveNext())
  83:                 {
  84:                     this.<num>5__2 = this.<>7__wrap3.Current;
  85:                     if (this.<num>5__2 <= 0)
  86:                     {
  87:                         continue;
  88:                     }
  89:                     this.<>2__current = this.<num>5__2;
  90:                     this.<>1__state = 2;
  91:                     return true;
  92:                 Label_0081:
  93:                     this.<>1__state = 1;
  94:                 }
  95:                 this.<>m__Finally4();
  96:             Label_009F:
  97:                 CS$1$0000 = false;
  98:             }
  99:             fault
 100:             {
 101:                 this.System.IDisposable.Dispose();
 102:             }
 103:             return CS$1$0000;
 104:         }
 105:  
 106:         [DebuggerHidden]
 107:         IEnumerator<object> IEnumerable<object>.GetEnumerator()
 108:         {
 109:             Program.<FilterPositives>d__1 d__;
 110:             if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) 
 111:                   && (this.<>1__state == -2))
 112:             {
 113:                 this.<>1__state = 0;
 114:                 d__ = this;
 115:             }
 116:             else
 117:             {
 118:                 d__ = new Program.<FilterPositives>d__1(0);
 119:             }
 120:             d__.numbers = this.<>3__numbers;
 121:             return d__;
 122:         }
 123:  
 124:         [DebuggerHidden]
 125:         IEnumerator IEnumerable.GetEnumerator()
 126:         {
 127:             return this.System.Collections.Generic.IEnumerable<System.Object>.GetEnumerator();
 128:         }
 129:  
 130:         [DebuggerHidden]
 131:         void IEnumerator.Reset()
 132:         {
 133:             throw new NotSupportedException();
 134:         }
 135:  
 136:         void IDisposable.Dispose()
 137:         {
 138:             switch (this.<>1__state)
 139:             {
 140:                 case 1:
 141:                 case 2:
 142:                     break;
 143:  
 144:                 default:
 145:                     break;
 146:                     try
 147:                     {
 148:                     }
 149:                     finally
 150:                     {
 151:                         this.<>m__Finally4();
 152:                     }
 153:                     break;
 154:             }
 155:         }
 156:  
 157:         // Properties
 158:         object IEnumerator<object>.Current
 159:         {
 160:             [DebuggerHidden]
 161:             get
 162:             {
 163:                 return this.<>2__current;
 164:             }
 165:         }
 166:  
 167:         object IEnumerator.Current
 168:         {
 169:             [DebuggerHidden]
 170:             get
 171:             {
 172:                 return this.<>2__current;
 173:             }
 174:         }
 175:     }
 176: }
 177:  

Look at the amount of  CXXp that was created. where as if you take the third method I proposed:

   1: static void Main(string[] args)
   2: {
   3:     List<int> t = new List<int> { 3, 4, -3, -4, 6, 1, -65, 2, -5, 2, -4 };
   4:     t = TheSlick_FilterPositiveItems(t);
   5: }
   6:  
   7: private static List<int> TheSlick_FilterPositiveItems(List<int> t)
   8: {
   9:     return t.FindAll(n => n > 0);
  10: }

This is what will be created by the compiler:

   1: internal class Program
   2: {
   3:     // Methods
   4:     private static void Main(string[] args)
   5:     {
   6:         List<int> <>g__initLocal0 = new List<int>();
   7:         <>g__initLocal0.Add(3);
   8:         <>g__initLocal0.Add(4);
   9:         <>g__initLocal0.Add(-3);
  10:         <>g__initLocal0.Add(-4);
  11:         <>g__initLocal0.Add(6);
  12:         <>g__initLocal0.Add(1);
  13:         <>g__initLocal0.Add(-65);
  14:         <>g__initLocal0.Add(2);
  15:         <>g__initLocal0.Add(-5);
  16:         <>g__initLocal0.Add(2);
  17:         <>g__initLocal0.Add(-4);
  18:         List<int> t = <>g__initLocal0;
  19:         t = TheSlick_FilterPositiveItems(t);
  20:     }
  21:  
  22:     private static List<int> TheSlick_FilterPositiveItems(List<int> t)
  23:     {
  24:         return t.FindAll(delegate (int n) {
  25:             return n > 0;
  26:         });
  27:     }
  28: }
  29:  

No extra variables, classes, switch cases and other things you just don’t need. and this is only for such a simple program, imagine what we will get when things get complicated.

 

If you haven’t read the first article you can find it here.

Let me know what do you think.

 

Amit

Tags :

10 Responses to “Filtering List Items – The Yield Return Solution”


  1. Alexander

    Said on January 5, 2009 :

    In my opinion you are trying to reinvent the wheel and doing premature performance optimation.

    The standard way of filtering in .Net 3.5 is
    new[] { 3, 4, -3, -4, 6, 1, -65, 2, -5, 2, -4 }.Where(i > 0);

    When you have to use 2.0 it will be:
    new List(new int[] { 3, 4, -3, -4, 6, 1, -65, 2, -5, 2, -4 }).FindAll(i => i > 0);

    The fastest way to do this is a for loop instead of a generator or a foreach loop, but that performance optimization is not needed in almost all cases.

    Do you realize that the difference in the amount of code in the 2 examples is made by the foreach loop?
    And do you know the second example uses almost exactly the same code in the class System.Generic.List?

    You can better uses the Stopwatch class to time it when you really want to know about the difference. When you use reflector it won’t make sence when you don’t know how the code works.

  2. Eddie Velasquez

    Said on January 5, 2009 :

    When dealing with massive amounts of data, returning lists or arrays of objects is just not feasable. IEnumerable and yield return also allow you to return data as soon as it is available from the source and not wait for the whole data set to be returned. DirectoryInfo.GetFiles() is just one of many good examples of a perfornce killers caused by returning arrays or lists instead of IEnumerable.
    I think using yield doesn’t make code any harder to understand or debug. It just a new concept that after a while becomes second nature.

  3. Niki

    Said on January 5, 2009 :

    I don’t think this comparison makes any sense: In one case, you’re writing the filtering algorithm, in the other case you’re just calling a library function. Of course the function call generates less IL code than the actual algorithm, what’s the point?

    Anyway, what do I care how much intermediate code is generated at some level that I will almost never have to look at? All I care about is readability in the source code (which is pretty good with yield, as I see it) and performance, and in that respect, the two code samples aren’t doing the same thing: They’re both O(N) in runtime, but List.FindAll is O(N) in memory usage while the yield-code (or Enumerable.Where) is O(1). If you’re working with very long lists or you don’t need all the elements, yield is obviously better, while List.FilterAll would be better if you need random access to the results or if you have to iterate through the list more than once.

    And, one last point: Of course there are “extra variables, classes” in the second sample (the one using List.FindAll). What do you think a delegate is, under the hood? The difference is just that Reflector can reverse-engineer those, while it can’t (or at least doesn’t) reverse-engineer the classes required for “yield”.

  4. mycall

    Said on January 5, 2009 :

    Of course yield can do things FindAll can’t like make lightweight ‘Erlang like’ threading using C# Iterators

    http://www.reddit.com/r/programming/comments/7miln/lightweight_erlang_like_threading_using_c/

  5. configurator

    Said on January 5, 2009 :

    First of all, I’d like to nitpick:

    1. Use IEnumerable as the result of the function, not IEnumerable.
    2. You are casting the result to a List! That is simply wrong and shows a misunderstanding of the entire concept.
    The method returns an IEnumerable of a type that is automatically generated by the compiler. It cannot be cast to a list. It can either be enumerated, or you can call one of the extension methods such as ToList or ToArray to get all values.
    The idea behind yield is for lazy evaluation. Suppose you have a function that accepts a list of numbers, and returns the first one that is good for whatever business decision you need. Now, that function could either accept a full List, which would mean all of the input has to be precalculated, or it could accept an IEnumerable. If the input is an IEnumerable, you can use yield to cause lazy evaluation; if that function returns the fifth item in a million item long list, only five items ever need to be calculated – and not all million.
    Also, I never compared my method to FindAll. I compared it to the IEnumerable extension method Where. If you call Where, and if you use yield, the exact same thing happens; the only difference is if the compiler auto-generated is used or the built-in WhereEnumerableIterator class is – which is pretty much the same class. You don’t need to use yield in this case. This is not a good example for the use of yield; this is quite a bad example; yield here would only complicate matters. The Where method, however, would be a perfect fit. It would return a lazily-loaded enumeration of the items that match your query. It could also be implemented with yield, yielding (no pun intended) the same result.

  6. SuperJason

    Said on January 5, 2009 :

    But it’s so darn cool! Doesn’t that count for something?

  7. Remco

    Said on January 5, 2009 :

    I don’t agree with your point 2 “Its not Free”. Memory allocation is pretty cheap so allocating 1 extra enumerator instance of a fixed size won’t cost you much.
    You will only (bearly) notice the overhead if you search lots and lots of very small lists.

    If you want to apply a sequence of filters the FindAll based version can very quickly start to perform worse than the yield based version because of all the intermediary lists that get constructed.

    To put it in extremes (hoping the code won’t get mangled…):

    [code]
    class Program
    {
    static void Main(string[] args)
    {
    // run and wait untill it crashes with an OutOfMemoryException…
    // List evenInts = GetInts().FindAll(x => x%2 == 0);
    // evenInts.ForEach(Console.WriteLine);

    // run and see the numbers scroll by…
    foreach (int i in FilterEvens(GetInts2()))
    Console.WriteLine(i);
    }

    static List GetInts()
    {
    List ints = new List(int.MaxValue);
    for (int i = 0; i < int.MaxValue; i++)
    ints.Add(i);
    return ints;
    }

    static IEnumerable GetInts2()
    {
    for (int i = 0; i <= int.MaxValue; i++)
    yield return i;
    }

    static IEnumerable FilterEvens(IEnumerable ints)
    {
    foreach (int x in ints)
    if (x % 2 == 0)
    yield return x;
    }
    }
    [/code]

  8. Al

    Said on January 5, 2009 :

    This site has too many ads. As soon as the page loads, I immediately want to leave. I understand you want to be compensated for your time and bandwidth, but there is a line and you’ve crossed it.

    I’m especially put off by the Google text ads BETWEEN the title of your post and the content of the post. I have to scroll down to see anything besides the site navigation and the many, many ads. That’s bad.

  9. Amit

    Said on January 6, 2009 :

    Thanks all for the great comments and ideas.

    My sole point and I took the Yield keyword as an example that you have to consider the “cost” of what you write. sometimes one of the solutions will be better than the other where in a different occasion another will prove better.

    My next article will take all the options offered and time them so we can put a “price tag” on every solution.

    Amit

  10. Amit

    Said on January 6, 2009 :

    @ Configurator

    I would like to say that you are correct about the List Casting, it is a bad error by me which I had not noticed. Thanks for the heads up.

Post a Comment