OOFILE | Downloads | Purchasing | Press | Services | Company Information | Soapbox | References | F.A.Q. | HOME

OOFTST 28 - keyword indexing

This sample tests the use of keyword indexing to retrieve by words within fields. See also ooftst27 for a simple test of the word parser. 


#include "oofile.h"	// the general oofile library
#include "ooftst01.h"	// the declarations for the database classes we will be using in this test.  
int main()
{
	cout << "OOFILE Validation Suite - Test 28\n"
		 << "Use keyword indexing to retrieve by words within fields." << endl;  

First we must declare the global variables for our database and table. Note that we use a typical ctree-style connection

	dbConnect_ctree    theDB;
	dbPeople     People;  

We set a few of the options for our database. We force it to use exclusive access, despite filesharing library (FPUTFGET option), so we can rebuild the indices.

	theDB.useSeparateFiles();	
	theDB.useExclusiveAccess();	
							  

This test creates People.dat, People.idx & Blobs in the same directory. Unfortunately, this sample is only set for Macintosh pathnames, which it assigns to the filename variables.

	#define kExistsName ":test28:People.dat"
	#define kDatabaseName ":test28"  

We must attach a keyword index to a couple of fields BEFORE the openConnection or newConnection because it is part of the field declaration. This is simply done by calling the function indexWords() on the appropriate fields. Please note that we are using the default word parser here. You could also pass your own specialised parser in as a parameter, eg:
                      People.LastName.indexWords(new MyFancyParser)

Table-level Index

The indices we declare below can be used to search either of these specific fields, but can also be used to search on all the fields in the table. This means we can simply declare that we wish to search for the words contained in an array or delimited list etc and it will search every field that it could possibly be found in (useful for a database that isn't well cross-referenced!).

		People.LastName.indexWords();
		People.Description.indexWords();  

Now we create the database or open and clear it.

	if (dbConnect::fileExists(kExistsName)) {
		theDB.openConnection(kDatabaseName);
		cout << "Deleting previous test data\n";
		People.deleteAll();
	}
	else {
		theDB.newConnection(kDatabaseName);
	}  

We then suppress the indices to add test data (later we will rebuild the indices).

	People.suppressIndices();	
	cout << "Adding test data\n\n";
	People.AddTestData();  

Now we set the initial sort order by lastName. Then print it out so we can see its initial condition. Note that we dumped all the fields of People this time, from now on we will use dbViews to restrict the fields listed 

	People.setSortOrder(People.LastName);
	cout << "Listing entire database\n" << theDB << endl << endl;

Here we declare our dbView, restricting our view of the database to just the name fields of the records.  searchSelection doesn't work if don't specify false = not a shared selection, so we pass false in as the second parameter.

	dbView justNames(People, false);	
	justNames << People.LastName << People.OtherNames;

We then set a search for the database, looking for records that contain the word "bloke" in the field decription. This is easily accomplished with the funtion hasWord().

Testing: function hasWord() on keyword file non-indexed

	People.search(People.Description.hasWord("bloke"));
	cout << "People with 'bloke' in their Description (keyword file non-indexed):\n"
		 << justNames << endl;  

Then we try the function hasWordStartsWith(). Passing a partial word to this will select all records that contain any word that starts with that partial word.

Testing: function hasWordStartsWith() on keyword file non-indexed

	People.search(People.Description.hasWordStartsWith("blo"));
	cout << "People with word starting with 'blo' in their Description (John Smith):\n"
		 << justNames << endl;  

Now we rebuild the indices so we can test the database again and see the differences, if any.

	cout << "Now rebuilding the database indices, to see if above faster\n";
	People.rebuild();  

Here we retest the hasWord() function.

Testing: function hasWord on indexed keword file

	People.search(People.Description.hasWord("bloke"));
	cout << "People with 'bloke' in their Description (John Smith):\n"
		 << justNames << endl;  

Here we test for anyone with the full word of "blo" (should not return anything)

	People.search(People.Description.hasWord("blo"));
	cout << "People with 'blo' in their Description (nobody):\n"
		 << justNames << endl;  

Now we test for the partial word "blo", again using hasWordStartsWith().

Testing: function hasWordStartsWith() on indexed keyword file

	People.search(People.Description.hasWordStartsWith("blo"));
	cout << "People with word starting with 'blo' in their Description (John Smith):\n"
		 << justNames << endl;  

We again test for a keyword, this time the word 'Andy".

	People.search(People.Description.hasWord("Andy"));
	cout << "People with 'Andy' in their Description (Andy & Trissa Dent):\n"
		 << justNames << endl;  

Now we refine the above search for anyone that also contains a second, specified word.

Testing: function searchSelection on indexed keyword file

	People.searchSelection(People.Description.hasWord("married"));
	cout << "People with 'Andy' in their Description and also (via searchSelection) 'married' (Trissa Dent):\n"
		 << justNames << endl;  

Now we're going to add some data to one of the records and then reindex. We know what the output of the database was just before we did this (printed above), so we just move to the first record and add the string given below.

We add the information by using the operator "+=". This appends the given string to the specified field.

	People.start();
	People.Description += " BTW Andy's not a bad bloke either.";

We use saveRecord() to reindex -> it reorganises as it saves.

Testing: reindexing, after addition of new data, by using saveRecord()

	People.saveRecord();

We now try the search for the word "bloke" once more. Note that this word was added to the record above.

	People.search(People.Description.hasWord("bloke"));
	cout << "People with 'bloke' in their Description (John Smith & Trissa Dent):\n"
		 << justNames << endl;  

We'll then try to search the selection to see if the field contains any of a number of words we supply in a delimited list. the function we use is hasAnyWordsDelimited(). We must pass in the delimited list and then the delimiter we have used (this way, we can use any delimiter we like). We still must use the same, consistant delimiter withint the whole list, but can use a different delimiter next time we call the function (though for continuity and readability we would tend to use the same for all).

Testing: function hasAnyWordsDelimited()

	People.search(People.Description.hasAnyWordsDelimited("bloke/software", '/'));
	cout << "People with 'bloke' or 'software' in their Description (John Smith, Andy & Trissa Dent):\n"
		 << justNames << endl;  

Now we retest for the same words, but the strings must contain all the words, not just any combination (as above). This, of course, requires the function hasAllWordsDelimited and also requires the delimited list and the delimiter we use.

Testing: function hasAllWordsDelimited()

	People.search(People.Description.hasAllWordsDelimited("bloke/software", '/'));
	cout << "People with 'bloke' and 'software' in their Description (nobody):\n"
		 << justNames << endl;  

The next function we test is very similar to the above, but instead of whole words, we pass in a number of partial words and the field must contain a word that starts with each one of these partial strings. Once again, we must pass in a delimited string of words and also the delimiter we are to use. This function is useful when we know there is a word with a beginning but aren't sure of the exact phrasing of it. It means we can pass in, for example, "Comput" and not have to worry if the actual word is "Computer" or "Computing" or "Computational" etc.

Testing: function hasAllWordsStartsWithDelimited()

	People.search(People.Description.hasAllWordsStartsWithDelimited("blo/so", '/'));
	cout << "People with 'blo' and 'so' in their Description (John Smith):\n"
		 << justNames << endl;  

This next function is similar to hasAnyWordsDelimited, but instead of passing a delimited list, we pass an array. Below, we first initialise our array of 2 strings, then pass this array to the required function called: hasAnyWordOf(). Note that we must pass both the array and the number of members in that array.

Testing: function hasAnyWordsOf()

	const char* schWords[2] = {"giraffe", "boots"};
	People.search(People.Description.hasAnyWordsOf(schWords, 2));
	cout << "People with 'giraffe' or 'boots' in their Description (John Smith):\n"
		 << justNames << endl;  

The next is the same, except that all words from the given array must be in that field for it to be selected.

Testing: function hasAllWordsOf()

	People.search(People.Description.hasAllWordsOf(schWords, 2));
	cout << "People with 'giraffe' and 'boots' in their Description (nobody):\n"
		 << justNames << endl;  

Now comes the time where we use the information that these keyword indices work on the full table. (see explanation).

	cout << "\n\n******** Now search table, regardless of field *********\n\n";  

As you can see, we can specify a search directly on the table, it doesn't have to be attached to any specific field. Any field that contains the required selection will mean that record is added to the selection.

Just as we did previously, we test the function hasWord(), but for the whole record. This means that Trissa Dent will also be in the selection.

Testing: function hasWord() using table-level index.

	People.search(People.hasWord("bloke"));
	cout << "People with 'bloke' (John Smith & Trissa Dent):\n"
		 << justNames << endl;  

We try again the function hasAnyWordsDelimited(), using the table-level index. Note that we won't find any records here that weren't in the first test of this function as the only fields containing these are the fields searched previously.

Testing: hasAnyWordsDelimited() using table-level index

	People.search(People.hasAnyWordsDelimited("bloke/software", '/'));
	cout << "People with 'bloke' or 'software' (John Smith, Andy & Trissa Dent):\n"
		 << justNames << endl;  

Now we retest (prev) the function hasAllWordsDelimited(), on our table-level index.

Testing: function hasAllWordsDelimited() using table-level index

	People.search(People.hasAllWordsDelimited("bloke/software", '/'));
	cout << "People with 'bloke' and 'software' (nobody):\n"
		 << justNames << endl;  

We now retest (prev) the function hasAllWordsStartsWithDelimited()

Testing: function hasAllWordsStartsWithDelimited() using table-level index

	People.search(People.hasAllWordsStartsWithDelimited("blo/so", '/'));
	cout << "People with 'blo' and 'so' (John Smith):\n"
		 << justNames << endl;  

We use our previously declared array of strings to retest (prev) the function hasAnyWordsOf(), still using our table-level index.

Testing: function hasAnyWordsOf() using table-level index

	People.search(People.hasAnyWordsOf(schWords, 2));
	cout << "People with 'giraffe' or 'boots' (John Smith):\n"
		 << justNames << endl;  

Now we retest (prev) our function hasAllWordsOf()

Testing: function hasAllWordsOf() using table-level index

	People.search(People.hasAllWordsOf(schWords, 2));
	cout << "People with 'giraffe' and 'boots' (nobody):\n"
		 << justNames << endl;

Now we create a new array so we can test a different selection for hasAllWordsOf().   Note that this will select Trissa's record twice. Once for containing the word "married" in the Description field and also for containing the word "Trissa" in the OtherNames field. As you can see, then, the table-level index is very useful for creating cross-field searches quickly and easily.

	const char* schWords2[2] = {"Trissa", "married"};
	People.search(People.hasAllWordsOf(schWords2, 2));
	cout << "People with 'Trissa' and 'married' (Trissa Dent):\n"
		 << justNames << endl;  
	cout << "Test Completed" << endl;
	  
	return EXIT_SUCCESS;
}  

 

Feature index

(c) Copyright A.D. Software 1994-2000 (All Rights Reserved).
Last Updated: 9th September 2001