Here is something neat I found out.

Say you are writing an application and one of the requirements is to allow File System search. You could always start using loops and such. I thought to myself why not do it in LINQ? I played around with it and in fact it is not so hard.

Lets see how it is done. Here is method that allows finding a specific file name in side a directory.

   1: private List SearchFilesByName(string DirectoryPath, string FileName) 
   2: { 
   3:     return (from file in new DirectoryInfo(DirectoryPath).GetFiles() 
   4:             where file.Name == FileName select file).ToList(); 
   5: }

Basically we Query the FileInfo[] which is returned from the GetFiles() method and compare the file name.

Here is another example for using a Query for the file extension, it is very much the same except for the condition:

   1: private List SearchFilesByName(string DirectoryPath, string Extention) 
   2: { 
   3:     return (from file in new DirectoryInfo(DirectoryPath).GetFiles() 
   4:             where file.Extension == Extention 
   5:             select file).ToList(); 
   6: }

Of course this is only usable for one directory but you can easily expand it and make it recursive. I bet you are as lazy as I am so here is something :)

 

   1: private List SearchFilesByExtention(DirectoryInfo Directoryinf, string Extention) 
   2: { 
   3:     List res = new List(); 
   4:     DirectoryInfo[] Dirs = Directoryinf.GetDirectories();
   5:  
   6:     //Check the files that are in the current derectory 
   7:     res = (from file in Directoryinf.GetFiles() 
   8:            where file.Extension == Extention 
   9:            select file).ToList();
  10:  
  11:     //Recursevly go over all the other directories 
  12:     foreach (DirectoryInfo d in Dirs) 
  13:     { 
  14:         res.AddRange(SearchFilesByExtention(d, Extention)); 
  15:     } 
  16:     return res; 
  17: }

Nice isn’t it? If I have a bug Please Comment.

Amit.

Tags :

16 Responses to “How to Search the File System Using LINQ Queries”


  1. Derek

    Said on July 16, 2008 :

    I think F#’s sequence comprehensions really excel here.

    let SearchFiles (directory:DirectoryInfo) predicate =
    { for file in (directory.GetFiles()) when (predicate file) -> file
    for dir in (directory.GetDirectories()) ->> (SearchFiles dir predicate) };;

    let SearchFilesByExtension directory extension =
    SearchFiles directory (fun (f:FileInfo) -> f.Extension = extension);;

  2. David Kemp

    Said on July 16, 2008 :

    Why not just use one of the overloads from DirectoryInfo.GetFiles ( ) to search for a file name by pattern?
    Also, making IEnumerable SearchFilesByName will mean you don’t have to create an unnecessary List object…

  3. configurator

    Said on July 16, 2008 :

    I don’t like the way you have to combine lists to create the combined list. (That sentence made a lot of sense, didn’t it?)

    I prefer using IEnumerable. I’d do:
    private IEnumerable SearchFilesByExtention(DirectoryInfo Directoryinf, string Extention) {
    foreach (Whatever result in {the select})
    yield return result;

    foreach (DirectionInfo subdir in Dirs)
    foreach (Whatever result in SearchFilesByExtention(subdir, Extention))
    yield return result;
    }

    You can wrap this around with a function that does ToList(), to return a List, but this will make sure that the memory doesn’t have to be copied again and again, for each subdir.

  4. AJ.NET

    Said on July 16, 2008 :

    Hi,

    if you embrace LINQ, why not functional programming in all it’s beauty? e.g:

    private IEnumerable{FileInfo} SearchFilesByExtention2(DirectoryInfo Directoryinf, string Extention)
    {
    foreach (DirectoryInfo d in Directoryinf.GetDirectories())
    {
    //Check the files that are in the current derectory
    var files = (from file
    in Directoryinf.GetFiles()
    where file.Extension == Extention
    select file);
    foreach (var f in files)
    yield return f;

    //Recursevly go over all the other directories
    foreach (var f in SearchFilesByExtention(d, Extention))
    yield return f;
    }
    }

    Less code, more efficient in terms of memory consumption (no temporary list), lazily evaluated.

    And BTW: GetDirectories comes with an overload that does the recursion in one step… . But that’s not the point here, is it?

    SCNR,
    AJ.NET

  5. AJ.NET

    Said on July 16, 2008 :

    Sorry, wrong code. Well, actually not but this one is shorter:

    private IEnumerable SearchFilesByExtention2(DirectoryInfo Directoryinf, string Extention)
    {
    foreach (DirectoryInfo d in Directoryinf.GetDirectories())
    {
    //Check the files that are in the current derectory
    foreach (var f in Directoryinf.GetFiles().Where(file => file.Extension == Extention))
    yield return f;

    //Recursevly go over all the other directories
    foreach (var f in SearchFilesByExtention(d, Extention))
    yield return f;
    }
    }

  6. Amit

    Said on July 16, 2008 :

    I am sorry but I am not a fan of the yield return, I think it makes the code very unreadable, so if I don’t have to I won’t use it. But nice ideas guys

  7. configurator

    Said on July 16, 2008 :

    “I am not a fan of the yield return” – that sentence is quite infuriating in my opinion.
    The yield return, while making the code different, is a very powerful tool. It enables lazy evaluation, which is very useful and is used a lot in linq, and it also enables you to not have to copy memory over and over again;
    If we have two lists and we want to combine them, we need to copy them, which has a memory cost.
    If we simply yield their values, we use the same original lists, but get an enumeration of the values from both.

    I digress. My point is that a good developer (and I consider you a good developer according to what I’ve read in this blog) must be willing to accept and learn changes in the technology. While you don’t have to like it, I at least excpet you to appreciate where it reduces memory usage or processing time and in this case it can reduce both.

  8. superjason

    Said on July 16, 2008 :

    Just beware of the performance implications of something like this. What happens when you search your entire drive using an innocent query? Isn’t it reading in a ton of folders and files?

  9. Amit

    Said on July 16, 2008 :

    @ Configurator

    I agree, the yield is powerful. If indeed I was facing a Performance Issue I would have considered Using it, otherwise I would prefer not to.
    Anyhow this is just an example, I did not think of performance when composing this example.

    But still its nice to see that you pay attention to stuff like this.

  10. LINQ Master

    Said on July 16, 2008 :

    Nice article. Here’s another example, using the extension methods, instead of having to write out all the from/where/select code:
    http://blog.linqexchange.com/post/2008/06/How-to-Use-LINQ-to-Filter-a-List-of-Files-by-Date.aspx

  11. David

    Said on July 16, 2008 :

    Instead of doing recursion you can also use the Queue. This helps to keep the call stack down and memory usage. Though with this small example I doubt it really has any large effect.

    private static IEnumerable SearchFilesByExtension(DirectoryInfo searchPath, string extension)
    {
    Queue dirs = new Queue();
    dirs.Enqueue(searchPath);

    while (dirs.Count > 0)
    {
    // Get the matching files for the next directory in the queue.
    var currDir = dirs.Dequeue();
    foreach (var file in currDir.GetFiles().Where(file => file.Extension == extension))
    {
    yield return file;
    }

    // Queue up the sub-directories
    foreach (var dir in currDir.GetDirectories())
    {
    dirs.Enqueue(dir);
    }
    }
    }

    This concept is not my own though. I cannot remember who’s blog I read this technique on, if anyone knows please let me know so I can give that individual the proper credit.

  12. AJ.NET

    Said on July 21, 2008 :

    “I am sorry but I am not a fan of the yield return, I think it makes the code very unreadable, so if I don’t have to I won’t use it.”
    Sorry, but I can’t leave this one uncommented…

    Generally speaking I’m with you that yield break/return is awkward (2 words making a keyword?) and that the idea of leaving a method and comming back later to resume is weird to any decent curly-brace developer (i.e. with the “usual” background in imperative OO languages). If it stopped there, I wouldn’t have bothered answering.

    However, you started this post as “somethingorother _LINQ_” and LINQ is based on functional concepts. yield realizes one of these functional concepts, namely lazy evaluation. It’s not the only functional concept (take lambdas, closures, etc.) but leaving it out is like OO without inheritance. Still valuable to some degree but the overall concept is utterly incomplete.

    I guess what I’m trying to say is that if you look at LINQ without looking at functional programming you’re mising a big part of the picture.

    http://ajdotnet.wordpress.com/2008/04/27/i-cant-help-but-notice/

  13. Amit

    Said on July 21, 2008 :

    @ AJ.NET

    I get what you are saying. I have used the yield on a number of occasions but still if I don’t have to I wont use it even if it will make my life easier. In my opinion it is all a matter of maintainability.

    Amit

  14. Derik Whittaker

    Said on July 25, 2008 :

    You can get a video how-to on this same thing over at http://www.dimecasts.net/Casts/CastDetails/15

  15. Allen

    Said on October 29, 2008 :

    There was a flaw in the code using yield (AJ.NET) plus I added a filenamepart to the query

    private IEnumerable SearchFilesByExtention(DirectoryInfo directoryInf, string extention, string fileNamePart)
    {
    //Check the files that are in the current directory
    foreach (var f in directoryInf.GetFiles().Where(file => file.Extension == extention && file.Name.Contains(fileNamePart)))
    yield return f;

    foreach (DirectoryInfo d in directoryInf.GetDirectories())
    {
    //Recursevly go over all the other directories
    foreach (var f in SearchFilesByExtention(d, extention,fileNamePart))
    yield return f;
    }

1 Trackback(s)

  1. Jul 20, 2008: Weekly Link Post 51 « Rhonda Tipton’s WebLog

Post a Comment