Geeks With Blogs
Rahul Anand's Blog If my mind can conceive it, and my heart can believe it, I know I can achieve it.

Regular Expression is a language independent feature supported by many languages, notably PERL, Java, JavaScript, C# etc. The support for Regular Expression is extensive under PERL and thus there is a term coined PCRE (Perl Compatible Regular Expression).

.NET has followed the similar pattern writing syntax.

 

The Base Class Library includes a namespace (System.Text.RegularExpressions) where a set of classes have been exposed to utilize the power of regular expressions.

 

Summarizing the widely used classes to utilize the power of regular expressions under C#:

 

Use static methods of Regex class or instance method to match a pattern or replace a pattern. After successful match the result of a regular expression is a collection (MatchCollection) of Match objects. Within each Match object is a collection (GroupCollection) of Group objects. Each Group object within the GroupCollection represents either the entire match or a sub-match that was defined via parenthesis. Within each Group object is a collection (CaptureCollection) of Capture objects. Each Capture object contains the results from a single subexpression capture.

 

I will try to explain each of them with some example for better understanding.

 

Regex class provides several static methods to enable you check for match or get matches without even instantiating the Regex object.

 

Escape: Escapes all meta-characters within a pattern string.

Unescape: Un-escapes any escaped meta-characters within a pattern string.

 

IsMatch: A Boolean value is returned depending on whether the pattern is matched in the string or not.

Match: A match instance is returned for the first string matched as defined by pattern.

Matches: A collection of matches are returned as MatchCollection

Replace: Replaces the first occurrence of the pattern in the string.

Split: Split the strings over pattern to get an array of strings.

 

Except Escape and Unescape all of the above methods are also available as instance members of Regex class. The static methods are provided to allow an isolated, single use of a regular expression without explicitly creating a Regex object.

 

Let’s write some sample code to see how regular expression works in C#:

 

Sample1:

 

string content = "123abbbabbaaa123baaaabbbbcccaaa123cccbbb123";

// Match more than once occurrence of 'a'

string pattern = "a+";

if(Regex.IsMatch(content, pattern))

Console.WriteLine("Pattern Found");

else

      Console.WriteLine("Pattern Not Found");

 

Output:  Pattern Found

 

 

Sample2:

 

string content = "123abbbabbaaa123baaaabbbbcccaaa123cccbbb123";

// Match one or more than one occurrence of 'a', using ^ and $ enforces that

// whole string must be matched

string pattern = "^a+$";

if(Regex.IsMatch(content, pattern))

      Console.WriteLine("Pattern Found");

else

      Console.WriteLine("Pattern Not Found");

 

Output:  Pattern Not Found

 

 

Sample3:

 

string content = "123abbbabbaaa123baaaabbbbcccaaa123cccbbb123";

// Match all digits (at least one) which is preceded by one or more than one

// occurrence of 'a' and optionally followed by 'b'

string pattern = @"a+(\d+)b*";

MatchCollection mc = Regex.Matches(content, pattern);

string spacer = "";

if(mc.Count > 0)

{

      Console.WriteLine("Printing matches...");

      for(int i =0; i < mc.Count; i++)

      {

            spacer = "";

            Console.WriteLine();

            Console.WriteLine(spacer+ "Match["+i+"]: "+ mc[i].Value);                    

            Console.WriteLine(spacer+ "Printing groups for this match...");

            GroupCollection gc = mc[i].Groups;

            for(int j =0; j < gc.Count; j++)

            {

                  spacer = " ";

                  Console.WriteLine(spacer+ "Group["+j+"]: "+ gc[j].Value);                                

                  Console.WriteLine(spacer+ "Printing captures for this group...");

                  CaptureCollection cc = gc[j].Captures;

                  for(int k =0; k < cc.Count; k++)

                  {

                        spacer = "  ";

                        Console.WriteLine(spacer+ "Capture["+k+"]: "+ cc[k].Value);                              

                  }

            }                            

      }

}

else

{

      Console.WriteLine("Pattern Not Found");

}

 

Output:

Printing matches...

 

Match[0]: aaa123b

Printing groups for this match...

 Group[0]: aaa123b

 Printing captures for this group...

  Capture[0]: aaa123b

 Group[1]: 123

 Printing captures for this group...

  Capture[0]: 123

 

Match[1]: aaa123

Printing groups for this match...

 Group[0]: aaa123

 Printing captures for this group...

  Capture[0]: aaa123

 Group[1]: 123

 Printing captures for this group...

  Capture[0]: 123

 

 

Here the first two samples are simple enough to understand. In ‘Sample 1’ I am just checking whether one or more than one consecutive ‘a’ (i.e. at least 2 ‘a’s) occurs in the given string.  And similarly in ‘Sample 2’ I am checking whether the given string starts and ends with ‘a’ and may contains more ‘a’s (which is false as we have other characters also in the given string).

 

Now let us examine the third sample, here you might be thinking why .NET has so many different classes Match, Group, and then Capture. With this example the difference between these are not very evident. Here we find Match[0], Group[0] and Capture[0] all containing the same value.

 

Let us see what MSDN says about these:

 

Match: The Match class represents the results of a regular expression matching operation.

Group: The Group class represents the results from a single capturing group.

Capture: The Capture class contains the results from a single subexpression capture.

 

In simple words Match produces everything that is matched by given pattern, when regular expression search is performed on the given string. There can be multiple occurrence of pattern in the given string, in that case you can get the collection of such matches by calling the Matches API provided by Regex class.

 

The Group is a part of given pattern enclosed by the ‘(‘ and ‘)’. So a group contains the part of matched string which is matched by the subpattern enclosed under brackets. Exception to this the Group[0] always contains the whole match (same as the value of Match).

 

The Capture is a part of string matched by the group expression i.e. the string matched by a subexpression of group expression. To understand it better I will slightly modify the ‘Sample 3’ to make single digit match as a subexression of the group.

 

Sample4:

 

string content = "123abbbabbaaa123baaaabbbbcccaaa123cccbbb123";

// Match all 1,2 or 3 (at least once) which is preceded by one or more than one

// occurrence of 'a' and optionally followed by 'b'

string pattern = @"a+(1|2|3)+b*";

MatchCollection mc = Regex.Matches(content, pattern);

string spacer = "";

if(mc.Count > 0)

{

      Console.WriteLine("Printing matches...");

      for(int i =0; i < mc.Count; i++)

      {

            spacer = "";

            Console.WriteLine();

            Console.WriteLine(spacer+ "Match["+i+"]: "+ mc[i].Value);                    

            Console.WriteLine(spacer+ "Printing groups for this match...");

            GroupCollection gc = mc[i].Groups;

            for(int j =0; j < gc.Count; j++)

            {

                  spacer = " ";

                  Console.WriteLine(spacer+ "Group["+j+"]: "+ gc[j].Value);                                

                  Console.WriteLine(spacer+ "Printing captures for this group...");

                  CaptureCollection cc = gc[j].Captures;

                  for(int k =0; k < cc.Count; k++)

                  {

                        spacer = "  ";

                        Console.WriteLine(spacer+ "Capture["+k+"]: "+ cc[k].Value);                              

                  }

            }                            

      }

}

else

{

      Console.WriteLine("Pattern Not Found");

}

 

Output:

 

Printing matches...

 

Match[0]: aaa123b

Printing groups for this match...

 Group[0]: aaa123b

 Printing captures for this group...

  Capture[0]: aaa123b

 Group[1]: 3

 Printing captures for this group...

  Capture[0]: 1

  Capture[1]: 2

  Capture[2]: 3

 

Match[1]: aaa123

Printing groups for this match...

 Group[0]: aaa123

 Printing captures for this group...

  Capture[0]: aaa123

 Group[1]: 3

 Printing captures for this group...

  Capture[0]: 1

  Capture[1]: 2

  Capture[2]: 3

 

You must be getting confused over the strings matched by Group[1] in this example. Again quoting from MSDN:

 

Because Group can capture zero, one, or more strings in a single match (using quantifiers), it contains a collection of Capture objects. Because Group inherits from Capture, the last substring captured can be accessed directly (the Group instance itself is equivalent to the last item of the collection returned by the Captures property).

 

Soon I will add more in this posting.....

 

For an overview of Regular Expression >> Basics of Regular Expression

 

Posted on Tuesday, August 16, 2005 8:36 AM C# | Back to top


Comments on this post: Regular Expression in C#

# re: Regular Expression in C#
Requesting Gravatar...
Nice article.
Is there anyway to get a regular expression given a string?
Left by Mahesh on Mar 01, 2006 4:38 PM

# re: Regular Expression in C#
Requesting Gravatar...
pretty cool article. finally I get what the Captures are there for! I have only needed groups so far, but I've been wondering what captures might be good for since a long time.. couldn't figure it out using the MSDN and other docs.. *shame on me* thanks!
Left by Max on May 30, 2006 6:31 PM

# re: Regular Expression in C#
Requesting Gravatar...
good article and good samples.
I use to practice regular expressions a program regex-coach.
Left by Javier lema on Sep 27, 2006 12:34 PM

# re: Regular Expression in C#
Requesting Gravatar...
Rahul, thank you for a great article.

I agree with Max, this one is the best explanation of the Captures I could find out over the Internet.
Left by Regular Expression Creator on Sep 23, 2007 2:50 AM

# re: Regular Expression in C#
Requesting Gravatar...
\b[A-Za-z]{1,2}[0-9][A-Za-z0-9]? *[0-9][AaBbD-Hd-hJjLlNnP-Up-uW-Zw-z]{2}\b

The above regular expression works fine on many tools on the web but fails to check in c# coding whats wrong with it?
Left by Vivek on Jun 12, 2008 5:01 AM

# re: Regular Expression in C#
Requesting Gravatar...
great samples...
and studied article...
good effort...
Left by mansur ehmad on Sep 03, 2008 1:25 AM

# re: Regular Expression in C#
Requesting Gravatar...
Nice article..
Enough Explanation...
Good Flow...
Suitable examples...
with output is important
Left by Gowdhaman on Jan 16, 2009 1:08 AM

# re: Regular Expression in C#
Requesting Gravatar...
Yeah this article is the best. Thank You very much for the explanation.
Left by Türker Öztürk on Nov 27, 2010 1:30 PM

# re: Regular Expression in C#
Requesting Gravatar...
Hi, good article and good samples.
Left by news on Dec 06, 2010 8:09 PM

# re: Regular Expression in C#
Requesting Gravatar...
Thanks for the article, simple and yet useful.
Left by Niels Heurlin on Dec 13, 2010 9:59 AM

# re: Regular Expression in C#
Requesting Gravatar...
Hi,

I was reading your article and I would like to appreciate you for making it very simple and understandable. This article gives me a basic idea of Regular Expressions in C# and it will help me a lot. I have found another nice post over internet related to this post. You may check it by visiting this link...
http://mindstick.com/Blog/100/Regular%20Expressions%20in%20C

Thanks Everyone!!

Thank you very much!
Left by Ajay Singh on Nov 16, 2011 10:16 AM

# re: Regular Expression in C#
Requesting Gravatar...

Hi,
I want an regular expression to eliminate the white space from an xml file having style tags.

Eg: "Font-Style : italic" should be converted to "Font-Style:italic"
How can i achieve this?
Left by Salma on Feb 07, 2013 5:48 PM

Your comment:
 (will show your gravatar)


Copyright © Rahul Anand | Powered by: GeeksWithBlogs.net