14.3.3. Exercises: Regular Expressions

  1. Many operating systems come with a reasonably good list of words that are used by various programs to check spelling. On *nix systems the word list is generally named (at least indirectly) "words". You can locate the file on your *nix system by typing the command

     locate words | grep dict 

    Piping the output of this command through grep reduces the output to those lines that contain the string "dict".

    After you locate your system word list file, write a program that will read lines from the file and, using a suitable regex (or, if that is not possible, another device), display all the words that:

    1. Begin with a pair of repeated letters.

    2. End in "gory".

    3. Have more than one pair of repeated letters.

    4. Are palindromes.

    5. Consist of letters arranged in strictly increasing alphabetic order (e.g., knot).

    If you cannot find such a suitable word list on your system, you can use the file, handouts/canadian-english-small.gz, in the source package from our dist directory. After you download it, you must uncompress it with the command: gunzip canadian-english-small.gz.

  2. Write a program that extracts the hyperlinks from HTML files using regular expressions. A hyperlink looks like this:

     <a href="http://www.web.www/location/page.html">The Label</a>

    For each hyperlink encountered in the input file, print just the URL and label, separated by a tab.

    Keep in mind that optional whitespace can be found in different parts of the above example pattern. Test your program on a variety of different web pages that contain hyperlinks and verify that your program does indeed catch all the links.

  3. You have just changed companies and you want to reuse some open source code that you wrote for the previous company. Now you need to rename all the data members in your source code. The previous company wanted data members named like this: mVarName, but your new company wants them named like this: m_varName. Use QDirIterator and perform a substitution in the text of each visited file so that all data members conform to the new company's coding standards.

[ fromfile: regexp.xml id: None ]