Ever needed to parse a web page and get all the Links in it (href’s)? the easy way is to use this regular expression to get the href:Regex r = new Regex(“href.*)”;

for those of you who don’t know this means get me something that starts with -href- and then: whatever… that’s what the -.*- is for. The problem is that now we have to work on the results in order to get the actual link.

Extra work? I don’t think so…

We want to use groups, so the regular expression will look like this:”href.*?”(?<HREF>.*?)”Or in code: (we need to add \ for some escape characters)

Regex MyRegex = new Regex("href.*?"(?<href>.*?)"", RegexOptions.Multiline);

The RegexOptions.Multiline means that we can provide a multiline string as the

input of the Regular expression. lets break it down:


The beginning is the same -href.*- get everything that starts with href now comes the twist.

the -?”- means stop on the first ” you find, if we drop the -?- he will stop on the last -“- he finds (greedy!!!).

Now comes the definition of the group: -(?<HREF>.*?) the syntax for defining a group is :


What comes after the Group name is the regular expression for the group, in our case the end looks like this:


which means get everything until the first ” you see.

that way we will get the “clean” URL inside the HREF group!

To use the groups use this code:

public static void GetMatches(string s)


    Regex MyRegex = new Regex("href.*?"(?<href>.*?)"", RegexOptions.Multiline);

    MatchCollection mc1 = MyRegex.Matches(s);


    foreach (Match m1 in mc1)


        Console.WriteLine("URL: {0}", m1.Groups["href"].Value);



Credit to Shahar A.

Have fun!!


Tags :

2 Responses to “How to use Regular Expressions”

  1. Andreja Ilic

    Said on September 15, 2008 :

    Very nice… Recently I had problem similar like this, and I used my classes.

    What I was going to ask is: do you know how fast this is? Is it implemented like grammar or like string manipulation?

    Andreja Ilic

  2. Amit

    Said on September 16, 2008 :

    It is a s fast as it gets :).
    Especially if you add


    The compiler will create a special function according to the expression you entered so most of the work is done in compile time.

Post a Comment