A Vision
The job given me by the Almighty Programmer was gatekeeper. The clouds parted below me and I could see a long sinewy line of expressions marching toward me in single file. Some looked like dates, others like digits and some (to be honest) looked like gibberish. One by one, they would try to get past me but I know no fear – for I am the RegularExpressionValidator.
The RegularExpressionValidator Control
The purpose of the RegularExpressionValidator control is to filter out unwanted or invalid input. It can be the first line of defense against input that is not formatted the way you want to see it. Correctly formatted expressions get through with no fanfare. Exceptions, however, cause the control to display its error message and the postback process is halted.
To demonstrate how this control works, let’s drop a few controls on a web form. I’m using Visual Studio .NET with code-behind and my language is VB. I’ve set my form to Flow Control but all that isn’t really necessary to understand the principles involved. Visual Studio has declared my control automatically with the following statement:
Protected WithEvents
RegularExpressionValidator1 As System.Web.UI.WebControls.RegularExpressionValidator
The rest is as easy as 1,2,3.
1. First I drop a text box on my form. This is where I enter my test expression. I’ll give the control an id of “txtText”.
<asp:TextBox id="txtTest" runat="server"> </asp:TextBox>
2. Next I drop a RegularExpressionValidator control on the form. I give this control an id of “RegularExpressionValidator1” since I’m not feeling very creative today. I set the “ControlToValidate” property to the id of the control I want to validate (which is the textbox I just created). I set the “ErrorMessage” property to what I want to display if the expression is not valid. I set the “ValidExpression” property to a regular expression that I want to use to compare against the input. In this case, I’ve just set the expression to the letter “m”.
<asp:RegularExpressionValidator id=”RegularExpressionValidator1″ runat=”server” ControlToValidate=”txtTest” ErrorMessage=”Sorry. You are not a valid expression!” ValidationExpression=”m”> </asp:RegularExpressionValidator>
3. Next I drop a button control on the form. This is just so we have a place for the cursor to go after we leave the textbox.
<asp:Button id=”Button1″ runat=”server” Text=”Submit”> </asp:Button>
Now we are ready to test our control that should only allow the letter “m” (lower case) to get by. If we enter “m” and press TAB or click the SUBMIT button, everything looks good – no error messages. If we leave the textbox blank, there is no error message either because the control only checks input that exists, not the existence of input. If, however, we enter any other character or digit the conrol’s error message will be displayed. In this case, the error message is “Sorry. You are not a valid expression!”
Now that we know how to use the control, what we really need is a primer on regular expressions. Then we can easily drop a RegularExpressionValidator control on our form, associate it with an entry in a text box and set the “ValidationExpression” property to what we want the input to look like.
Regular Expressions
The subject of regular expressions is often confusing but we are going to take an approach that will give you a basic understanding upon which you can build. Becoming proficient with regular expressions takes a lot of practice, just like anything else.
Regular expressions let us to search for certain patterns. In the case of our validator control, if we find the pattern in the textbox, the control is satisfied and let’s the text through to the promise land. If we do NOT find that pattern, the control is not happy and displays its error message. (This is not all we can do with regular expressions. We can also “search and replace” and reformat text. Just be aware that you can use regular expressions to find all instances of some word or phrase and replace it with another or reformat it. You can use them to mine documents for email addresses or URL’s. I’ll discuss that more fully in a future article.)
Character Matching
The easiest kind of matching we can do is “character matching”. That’s what we did when we put the letter “m” in our validator control. We are simply asking if the text is the letter “m” and nothing else. So the lowercase letter “m” passes but the uppercase “M” does not. If you enter “mom”, it does not pass because “mom” is not “m” – it is something more than just “m”.
Character matching is not limited to a single character. If you use the regular expression “mom” then the only expression that matches will be “mom”. “MOM” will not work, “mommy” will not work and “I want my mom” will not work because none of these expressions is exactly equivalent to “mom”. OK, I think we are clear on that point.
A period is a special character that matches any single character. So the regular expression “m.m” would match “mom” or “mam” or “m9m”.
You can search for a string that has a single character from a group of predetermined characters. For example: “m[ao]m” would match up with “mam” or “mom” because the middle letter is in the group [ao], but “m9m” would not match because “9” is not an “a” or an “o”.
You can search for a string that has a single character that is in a range. For example: “m[a-y]m” will match up with “mam” or “mbm” or “mcm” or any other letter in the middle so long as it is in the range [a-y]. The letter “z” is not in that range and all uppercase letters are not in that range.
Here is a list of the most common special characters that relate to character matching:
[xyz] Match any one character enclosed in the character set. Would match if the single letter was x or y or z. [a-e] Match any one character in the range a-e (or 1-9, etc). Would match if the character was greater or equal to a but less than or equal to e. [^xyz] Match any one character not enclosed in the character set. Would match any character exxcept x, y, and z. . Match any character except n. w Match any word character. Equivalent to [a-zA-Z_0-9]. W Match any non-word character. Equivalent to [^a-zA-Z_0-9]. d Match any digit. Equivalent to [0-9]. D Match any non-digit. Equivalent to [^0-9]. s Match any space character. Equivalent to [ trnvf]. S Match any non-space character. Equivalent to [^ trnvf]
Repetition Matching
The question mark is a special character that matches zero or one instances of the character that precedes it. So the regular expression “moms?” would match “mom” because the letter “s” appears zero times after “mom”. It would match “moms”, of course.
The asterisk is a special character that matches zero or more instances of the character that precedes it. So the regular expression “moms*” would match “mom” because the “s” appears zero times. It would match “moms” because the “s” appears once. It would also match “momsssssssss”.
The plus sign is a special character that matches one or more instances of the character that precedes it. So the regular expression “moms+” would match “moms” or “momsssss” but not “mom” because the s has not occurred at least once.
If you wanted to find a match on n instances of the preceding character, you can use {n}. For instance, the regular expression “moms{3}” would only match “momsss” because there are exactly three instances of the letter “s”. It would not match “moms” or “momss”. If you wanted to match the word “moon”, you could use the regular expression “moon” (the most direct method) or you could use “mo{2}n”.
Here are some common examples of repetition matching:
{x} Match exactly x occurrences of a regular expression. d{5} Matches 5 digits such as 12345. {x,} Match x or more occurrences of a regular expression. s{2,} Matches at least 2 space characters. {x,y} Matches x to y number of occurrences of a regular expression. d{2,3} Matches at least 2 but no more than 3 digits. ? Match zero or one occurrences. Equivalent to {0,1}. as?b Matches “ab” or “a b”. * Match zero or more occurrences. Equivalent to {0,} + Match one or more occurrences. Equivalent to {1,}
Matching Special Characters
We’ve already introduced several “special characters”. Special characters are those which have a special meaning. In the above discussion, the period, the asterisk, the plus sign are all “special characters”. To match a special character, you have to precede the special character with a “” in the regular expression. Here is a list of the most common ways to match a character which would otherwise have special meaning.
n Matches a new line. f Matches a form feed. r Matches a carriage return. t Matches horizontal tab. v Matches vertical tab. ? Matches ? * Matches * + Matches + . Matches . Matches
Alternation and Grouping
Alternation and grouping is used to develop more complex regular expressions. Grouping a clause to create a clause. May be nested. “(ab)?(c)” matches “abc” or “c”.
Alternation combines clauses into one regular expression and then matches any of the individual clauses. “(ab)|(cd)|(ef)” matches “ab” or “cd” or “ef”.
More Examples and References
1. You can find many ready-made regular expression samples at http://www.regexlib.com/. These includes regular expressions for dates, email addresses, url’s, zip codes and things like that. Some of these are better than others so make sure you test and understand the expression before you put it into production.
2. Microsoft has a fairly dense, if somewhat disorganized, coverage of the subject at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/jscript7/html/jsreconIntroductionToRegularExpressions.asp
Conclusion
The RegularExpressionValidator Control is powerful and useful if you have a basic understanding of regular expressions. There are benefits to using this control over other methods. (1) Depending on the browser being used, client-side code will be generated for the validation. This means expressions will be validated at the client without having to make the round trip to the server. (Up-Level browsers (IE 4.0+) will have the validation rendered in JavaScript/DHTML to enable client-side validation, while Down-Level browsers will provide strictly server-side validation). (2) Pattern matching may be faster than other methods. For instance, a date validator is much faster than testing for a valid date using the IsDate() function.
I will talk in more depth about regular expressions in a future article. There is much more to this subject than just the validator control but I think this will get you off to a good start. If you have any ideas, complaints or suggestions, please email me at:
Roger D. McCook
McCook Software, Inc. – Atlanta, GA
Visit Roger McCook’s Web site at http://www.mccooksoftware.com.