One important use of Regular Expressions (Regex) is to verify fields submitted via a form. In this article, we attempt to write an expression that is able to verify the user’s first name, middle name, last name or just names in general.
The expression should allow names such as “Mary”, “Mr. James Smith” and “Mrs O’Shea” for example. So the challenge here is to allow spaces, periods and single quotation marks in the name field and reject any other characters.
Elimination Technique
Instead of coming up with a set of rules to specify the possible combination of legal characters, we try to identify and detect all illegal characters in the name field. I came up with the following list:
Punctuations: ~`!@#$%^&*()=+{}|:;”/?,
Numerics: Any digit ranging from 0 to 9
Notice that I left out the empty space ( ), period (.) and single quotation mark (‘) because we are allowing these 3 characters to pass the verification. In other words, the verification will fail if the name field contains any of the punctuations or numerics above.
The Regular Expression
Are you ready for the hardcore part? The regex pattern I came up with is as follows:
([[:digit:]]|[~`!@#$%^&*()_=+{}|\:;"/?,]|[|]|-)+
Scary? No. Let me briefly explain what this pattern means. The expression can be represented by:
(expression1 | expression2 | expression3 | expression4 | expression5)
What we are trying to do here is to match the name field to the patterns in expression 1, 2, 3, 4 or 5. If you look at the regex closely, you will see that expression1 is actually [[:digit:]].
Expression2 is:
[~`!@#$%^&*()_=+{}|\:;"/?,]
Notice that I added a backslash () before each of the 5 characters “()+|”. By backslashing these characters, I am telling the function to treat the characters as it is and not as special built-in characters. For example, the brackets “()” actually means grouping in regex but if I backslash it, ie “()”, it simply means that I want to match “(” and “)”.
Expression3 is “[“, expression 4 is “]” and expression 5 is “-“. We left out the 3 characters “[]-” in expression2 just to avoid confusion because we already used “[]” as the outer brackets. As for “-“, we left it out because it is normally used as a range within the brackets “[]”, like so [A-Z].
Implementation
To implement it in PHP, we write the code as follows:
$pattern = '([[:digit:]]|[~`!@#$%^&*()_=+{}|\:;"/?,]|[|]|-)+';
$name = stripslashes({$_POST['name_field']});
if (ereg($pattern,{$_POST['name_field']})) {
echo "write your error message";
}
We stripslashed the name field just in case you have magic quotes turned on. If magic quotes is turned on, the single quotation mark will be passed as ‘ instead just ‘. The ereg function will look for digits and illegal punctuations in the $_POST name field. If an error is found, we can do something such as alerting the user of the error.
Conclusion
Hopefully, this article can give you some insight into regex and save you some time when verifying name fields. You can modify the regex to have stricter rules for example, you may not want the name field to start with a space or a period. That’s all for now. Cheers.
Add to Del.icio.us | DiggThis | Yahoo! My Web | Furl
Bernard Peh is a Web Developer based in Melbourne. He works with experienced web designers and developers everyday, designing and developing commercial websites. He specialises mainly in SEO and PHP programming. Visit his blog at Melbourne PHP