[ fromfile: regexp.xml id: regexphonerecog ]
In almost any application, there is a need for an easy but general purpose way to specify conditions that must be satisfied by input data at runtime. For example:
In a U.S. Address, every ZIP code can have five digits, followed by an optional dash (-) and four more digits.
A U.S. phone number consists of ten digits, usually grouped 3+3+4, with optional parentheses and dashes and an optional initial 1.
A U.S. state abbreviation must be 1 from the set of 50 approved abbreviations.
How can you impose conditions such as these on incoming data in an object-oriented way?
Suppose that you want to write a program that recognizes phone number formats and could accept a variety of phone numbers from various countries. You would need to take the following things into consideration.
For any U.S./Canada format numbers, we extract AAA EEE NNNN, where A = area code, E = exchange, and N = number.
These have been standardized so that each of the three pieces has a fixed length.
For phone numbers in other countries[91] assume that there must be CC MM (or CCC MM) followed by either NN NN NNN or NNN NNNN, where C = country code, M = municipal code, and N = localNumberDigits.
There might be dashes, spaces, or parentheses delimiting number clusters.
There might be +
or 00
in front of the country
code.
Imagine how you would write this program using the standard tools available to you in C++. It would be necessary to write lengthy parsing routines for each possible format. Example 14.5 shows the desired output of such a program.
Example 14.5. src/regexp/testphone.txt
src/regexp> ./testphone Enter a phone number (or q to quit): 16175738000 validated: (US/Canada) +1 617-573-8000 Enter a phone number (or q to quit): 680111111111 validated: (Palau) + 680 (0)11-11-11-111 Enter a phone number (or q to quit): 777888888888 validated: (Unknown - but possibly valid) + 777 (0)88-88-88-888 Enter a phone number (or q to quit): 86333333333 validated: (China) + 86 (0)33-33-33-333 Enter a phone number (or q to quit): 962444444444 validated: (Jordan) + 962 (0)44-44-44-444 Enter a phone number (or q to quit): 56777777777 validated: (Chile) + 56 (0)77-77-77-777 Enter a phone number (or q to quit): 351666666666 validated: (Portugal) + 351 (0)66-66-66-666 Enter a phone number (or q to quit): 31888888888 validated: (Netherlands) + 31 (0)88-88-88-888 Enter a phone number (or q to quit): 20398478 Unknown format Enter a phone number (or q to quit): 2828282828282 Unknown format Enter a phone number (or q to quit): q src/regexp>
Example 14.6 is a procedural C-style solution that shows how to use QRegExp to handle this problem.
Example 14.6. src/regexp/testphoneread.cpp
[ . . . . ] QRegExp filtercharacters ("[\\s-\\+\\(\\)\\-]"); QRegExp usformat ("(\\+?1[- ]?)?\\(?(\\d{3})\\)?[\\s-]?(\\d{3})[\\s-]?(\\d{4})"); QRegExp genformat ("(00)?([[3-9]\\d{1,2})(\\d{2})(\\d{7})$"); QRegExp genformat2 ("(\\d\\d)(\\d\\d)(\\d{3})"); QString countryName(QString ccode) { if(ccode == "31") return "Netherlands"; else if(ccode == "351") return "Portugal"; [ . . . . ] //Add more codes as needed ..." else return "Unknown - but possibly valid"; } QString stdinReadPhone() { QString str; bool knownFormat=false; do { cout << "Enter a phone number (or q to quit): "; cout.flush(); str = cin.readLine(); if (str=="q") return str; str.remove(filtercharacters); if (genformat.exactMatch(str)) { QString country = genformat.cap(2); QString citycode = genformat.cap(3); QString rest = genformat.cap(4); if (genformat2.exactMatch(rest)) { knownFormat = true; QString number = QString("%1-%2-%3") .arg(genformat2.cap(1)) .arg(genformat2.cap(2)) .arg(genformat2.cap(3)); str = QString("(%1) + %2 (0)%3-%4").arg(countryName(country)) .arg(country).arg(citycode).arg(number); } } [ . . . . ] if (not knownFormat) { cout << "Unknown format" << endl; } } while (not knownFormat) ; return str; } int main() { QString str; do { str = stdinReadPhone(); if (str != "q") cout << " validated: " << str << endl; } while (str != "q"); return 0; } [ . . . . ]
Remove these characters from the string that the user supplies. |
|
All U.S. format numbers have country-code 1, and have 3 + 3 + 4 = 10 digits. Whitespaces, dashes and parantheses between these digit groups are ignored, but they help to make the digit groups recognizable. |
|
Landline country codes in Europe begin with 3 or 4, Latin America with 5, Southeast Asia and Oceania with 6, East Asia with 8, and Central, South and Western Asia with 9. Country codes may be 2 or 3 digits long. Local phone numbers typically have 2(or 3) + 2 + 7 = 11(or 12) digits. This program does not attempt to interpret city codes. |
|
The last 7 digits will be be arranged as 2 + 2 + 3. |
|
Ensures the user-entered phone string complies with a regular expression, and extracts the proper components from it. Returns a properly formatted phone string. |
|
Keep asking until you get a valid number. |
|
Remove all dashes, spaces, parens, and so on. |
In a stream-based program like this, the complete response of the user is examined by the QRegExp after s/he has typed it and pressed the [Enter] key. There is no way to prevent the user from entering inappropriate characters into the input stream.
[91] The phone number situation in Europe is quite complex and specialists have been working for years to develop a system that would work, and be acceptable, to all EU members. You can get an idea of what is involved by visiting this Wikipedia page.
Generated: 2012-03-02 | © 2012 Alan Ezust and Paul Ezust. |