OOFILE | Downloads | Purchasing | Press | Services | Company Information | Soapbox | References | F.A.Q. | HOME |
This sample tests the word parser, and could be used to test any parsers that you create. Parsers are useful for finding lists of words for keyword indices.
#include "oofwords.h" // contains parser #include "oofile.h" // just to get EXIT_SUCCESS on all platforms
int main()
{
cout << "OOFILE Validation Suite - Test 27\n"
<< "This tests the word parser used in generating keyword indices.\n\n";
Here we set up a test string to start off with.
const char testStr[] = "This is a test Cabbage";
This set of strings is the stop list. These are the words we want the parser to ignore when cretaing our list of words (words to be stopped). This test uses the word cabbage as it is not common and can easily be inserted for test purposes and stand out so we can find it ourselves!
const char * stopList[] = {"cabbage"};
Here we declare our word parser and pass it the string to parse and the stoplist. 3 is the minimum word length we have asked for -> any words smaller than this will be ignored. Declaring the parser in this way also causes it to run. We end up with an array of strings.
Testing: parser with stoplist
oofWordParser wp(testStr,3,stopList,1);
Now we print the results.
cout << "The following parses '" << testStr << "' skipping:\n" << " 'is' and 'a' because they are less than the minimum word length of 3\n" << " 'cabbage' because it is on the list of 'stop words' which are ignored\n\n";
The function start() takes us to the first string in the list. The function more() tests if there are any more in the list.The function next() takes us to the next one. Using these, we can go through the array one by one until we've gone through them all (more() returns false).
for (wp.start();wp.more();wp.next()) {
cout << wp.word() << endl;
}
Here we assign a slightly longer string, containing the word in our stoplist and a similar word.
Note: this is a very simple parser and will not pick 'cabbages' to be similar to 'cabbage' and take it out. To do this, we must have a parser that does stemming and trims cabbages to cabbage and then recognises it as being on the stoplist.
char testStr2[] = "Even longer and sillier string about cabbage and cabbages and Kings";
Now we parse.
cout << "\n\n\nNow reparse with a different string, retaining the stoplist & min length\n" << "'" << testStr2 << "'\n" << "Note: a fancier word parser that does 'stemming' would also remove 'cabbages'\n" << "by trimming it down to 'cabbage' and then dropping it because 'cabbage' is on\n" << "the stop words list.\n\n"; wp.start(testStr2);
Then we show how to directly output the whole list of words (quicker than our previous iteration method).
cout << wp << endl; cout << endl <<"Test Completed" << endl;
return EXIT_SUCCESS; }
(c) Copyright A.D. Software 1994-2000 (All Rights Reserved).
Last Updated: 9th September 2001