Regular Expressions in C# – Practical Usage

This is the second post in the C# regular expression series and it follows up on “Regular Expressions in C# – The Basics” which explained the theory behind Regular expressions in C#. In this post we look at how to make practical use of regular expressions in our C# code.

This post touches on four major regular expression subjects:

  • String Comparison – does a string contain a particular sub-string?
  • Splitting a string into segments – we will take an IPv4 address and retrieve its dotted components
  • Replacement – modifying an input string
  • Stricter input validation – how to harden your expressions

String Comparison – finding valid HTML tags

One of the essential functions of expressions are their ability to find if a string is contained inside another one. The RegEx.Matches method tests if a given string matches the pattern.

We start with a simple example: finding out where the letter “a” is mentioned in a sentence:

string Input = "apples make for great party accessories";
Regex FindA = new Regex("a");

foreach(Match Tag in FindA.Matches(Input))
    Console.WriteLine("Found 'a' at {0}",Tag.Index);

That was almost too easy. Regular expressions really shine if you don’t know exactly what you are looking for but you can describe it. In the following example we will look for all valid HTML tags in an input string.

What is a valid HTML tag? <code>, </code>, <b>,<img src=””>, </br> are all valid HTML tags.

Regex HTMLTag = new Regex(@”(<\/?[^>]+>)”);

To break this down:

  1. All valid HTML tags start with a “<”
  2. They might or not have a forward slash (we need to escape the forward slash) \/?
  3. There is at least one or more characters which are not “>”
  4. The tag ends with a “>”

The following code example searches for all valid HTML tags in the input string:

using System;
using System.Text.RegularExpressions;

namespace RegularExpression
    class MainClass
        public static void Main(string[] args)
            Regex HTMLTag = new Regex(@"(<\/?[^>]+>)");

            string Input = "<b><i><a href=''>Ipod News</a></b></i>";

            foreach(Match Tag in HTMLTag.Matches(Input))
                Console.WriteLine("Found {0}",Tag.Value);

Resulting in:

Found <b>
Found <i>
Found <a href=’’>
Found </a>
Found </b>
Found </i>

Splitting a string into parts

Parentheses () not only allow you to group your expressions into parts they allow you to split a single string into multiple segments which we can inspect individually. To demonstrate we will use a regular expression to split an IPv4 address into its components.

A decimal TCP/IP address looks like XXX.XXX.XXX.XXX with X being a decimal number. Each column has at least 1 digit, and a maximum of 3. So a single column can be described as “(\d{1-3})“. There are four columns, each seperated by a dot. The dot (.) has a special meaning in regex so we need to escape it. (\.)

The Regex.Match method returns a new Match instance. We can now testMatch.Success to see if the input string matched the TCP/IP address pattern. Through the Match.Groups property can we then extract each of the four IP address columns.The zero entry in the Groups property is alway the complete match, in this case “″. The [1] entry contains the first groups contents, [2] the second etc.

string IPMatchExp = @"(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})";
Match theMatch  = Regex.Match("",IPMatchExp);
if (theMatch.Success)

String Replacement

Often it is useful to manipulate a string, by replacing the matched pattern with something new. The RegEx.Replace method allows us to specify a pattern to look for and a replacement string.

The following example matches the last character and space following each word and replaces it with “b_”.

Regex Replacer = new Regex(@"\w "); // Single [a-zA-Z] followed by a space
string Input  = "ax bx sax dam pom";
string Output = Replacer.Replace(Input,"b_"); // Replace all items found with a b and underscore

Substitution Patterns

What to do if you would like to flip parts of a string? C# offers several substitution patterns for this. Substitution patterns can only be used in a replacement string, and are used in combination with grouping.

They are useful if you would like to format the results of the match. A common task is to flip two words around. In the below example we flip the name “Molly Malone” into “Malone Molly”:

Regex Replacer = new Regex(@"(\w*) (\w*)");
string Input  = "Molly Mallone";
string Output = Replacer.Replace(Input,"$2 $1");

The regular expression is defined as two groups of words (\w*) separated by a space. Each group can be referred to with a substitution pattern. $1 refers to the first group, $2 to the second (and if we had defined more $3 would be the third etc).

Input validation – we have to be more strict

Often we need to check if the data inputed or read from a file matches a definition so that we know its valid. But for this to work we need to ensure that our expressions only match a valid input. Many expressions of convenience are defined too loose. If we are to use them for input validation we need to harden them.

The pattern we used in an earlier example neatly broke down a valid IP address. But it wasn’t very strict and there are many combinations that would have matched that aren’t valid IP addresses. 999.999.999.999 is not a valid IPv4 address but it would have matched our pattern (@”(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})”). So we couldn’t have used it for testing for a valid IP address.

So what is a valid match? We need to define this first.

A valid IP address range is from to (with each column being represented by a byte).

At this point there are two things we can do: we can validate the results returned by our expressions with a few additional lines of C# code or we modifying our regular expression to become stricter. As this post is about regular expressions we will modify our expression to match only valid IP addresses.

How do we define valid ? 0,9,10,19,100,199,200,249,255 are all valid inputs for each column. 300 isn’t valid, and neither is 299. To keep things simple, we don’t allow 09 as a valid input.

  • Single digit: 0 – 9 :   [0-9]
  • Double digit: 10 – 99: [1-9][0-9]
  • Triple digit 1:  100 – 199:  1[0-9]{2}
  • Triple digit 2: and 200 – 249:  2[0-4][0-9]
  • Triple digit 3: 250 – 255 25[0-5]

The single ([0-9])and double digit ([1-9][0-9]) combinations can be combined into:[1-9]?[0-9]. (Read as: The first 1-9 is optional, occurs 0 or 1 time)

So a single column can be defined as: (([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.) Note the “.” at the end.

On the final column we do not need a “dot”. We can save some space by repeating the first expression three times, but we need to write out the fourth in full. Thus our expressions becomes: ([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.{3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])

Not exactly easy to read, but lets test to see if it works as expected. The following example program tries all column combinations from 0-999.

using System;
using System.Text.RegularExpressions;

namespace RegularExpression
    class MainClass
        public static void Main(string[] args)
            string IPTestExp = @"(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|255[0-5])\.){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])";

            for (int Lp = 0; Lp < 999; Lp++)
                string IPAddress = String.Format("{0}.{0}.{0}.{0}",Lp);

                if (Regex.Match(IPAddress,IPTestExp).Success)
                    Console.WriteLine("{0} is valid",IPAddress);
                    Console.WriteLine("{0} is invalid",IPAddress);

For brevity the program ends at the first invalid combination. If we had let it run it would have shown 256-999 as invalid. is valid is valid is valid is valid is valid is invalid

This took a bit of work but we now have a single line test to see if a string is a valid IPv4 address.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: