Main Page | Modules | Namespace List | Class Hierarchy | Alphabetical List | Class List | File List | Class Members | File Members | Related Pages

oofWordParser Class Reference
[OOFILE utility classes and functions.]

#include <oofwords.h>

Inheritance diagram for oofWordParser:

Inheritance graph
[legend]
Collaboration diagram for oofWordParser:

Collaboration graph
[legend]
List of all members.

Detailed Description

Base class for user-replaceable word parser.

Used mainly to search tables and word-indexable fields. Default behaviour in this base class is used if no overriding default object is attached to a table or dbConnection.

includes stop-word list and minimum length. Doesn't do any stemming or more complex parsing.

A word is generated from the isalnum() chars between each isspace() separators. Any non alnum chars inside a word are skipped.

mpString remains a pointer to the start of the current word. mWord is updated (in theory) with the filtered copy of the word.

Bug:
SPOTTED 14 Nov 2000 - calling start() again after using parser doesn't update mWord
To aid in search implementation, generateWordArray() creates a separate list of strings which is retained and can be accessed by generatedStrings() and generatedCount()
See also:
dbTable::searchSelContainsAnyOf for example of use.
Note:
not to be regarded as good example of programming style until parser cleaned up and storage neater!

Todo:
cleaner abstraction of word source iteration

work out why minWordLength returns a reference and add setter if necessary


Public Member Functions

 oofWordParser (const char *stringToParse=0, unsigned short minWordLen=3, const char **stopWords=0, unsigned short numStopWords=0)
virtual ~oofWordParser ()
 dtor.

virtual void start ()
 sets the oofWordParser to the start of the string.

virtual void start (const char *stringToParse)
 call to start parsing different string

virtual void next (void)
 Find the next word, updating the mWord pointer returned by word();.

virtual bool more () const
 return true if any words left.

virtual const char * word () const
 returns the current word.

virtual const char * wordString () const
 returns the original string.

virtual unsigned short & minWordLength ()
 returns/sets the minimum word length cut-off.

virtual bool isValidWord (const char *) const
virtual void generateSearchArray ()
const char ** generatedStrings () const
 return array of pointers to individual words.

unsigned long generatedCount () const
void extract (std::ostream &)
void incRefs ()
void decRefs ()

Protected Types

enum  { maxWordLength = 40 }

Protected Attributes

const char * mpString
 pointer to start of words to process, may point to mGeneratedStrings entries

const char * mpStringStart
 pointer to original strings

char mWord [maxWordLength]
unsigned short mWordLength
unsigned short mMinWordLength
const char ** mStopWords
 passed in list of stop words to ignore

unsigned short mNumStopWords
const char ** mGeneratedStrings
 owned strings generated for subsearch call

unsigned long mGeneratedCount
unsigned long mIterStrings
unsigned int mReferences


Member Enumeration Documentation

anonymous enum [protected]
 

Enumeration values:
maxWordLength 


Constructor & Destructor Documentation

oofWordParser::oofWordParser const char *  stringToParse = 0,
unsigned short  minWordLen = 3,
const char **  stopWords = 0,
unsigned short  numStopWords = 0
 

oofWordParser::~oofWordParser  )  [virtual]
 

dtor.

Delete the const char* strings owned in mGeneratedStrings, if we ever generated them!

Todo:
fix mGeneratedWords management so deletes without complaint


Member Function Documentation

void OOF_mixRefCount::decRefs  )  [inherited]
 

void oofWordParser::extract std::ostream &   ) 
 

unsigned long oofWordParser::generatedCount  )  const [inline]
 

const char ** oofWordParser::generatedStrings  )  const [inline]
 

return array of pointers to individual words.

void oofWordParser::generateSearchArray  )  [virtual]
 

Todo:
make more efficient

void OOF_mixRefCount::incRefs  )  [inline, inherited]
 

bool oofWordParser::isValidWord const char *   )  const [virtual]
 

unsigned short & oofWordParser::minWordLength  )  [virtual]
 

returns/sets the minimum word length cut-off.

bool oofWordParser::more  )  const [virtual]
 

return true if any words left.

Todo:
make this more reliable, preparse the next word at next() and start()

void oofWordParser::next void   )  [virtual]
 

Find the next word, updating the mWord pointer returned by word();.

Todo:
refactor to isolate the common parsing code in start() & here
Note:
I don't know who wrote this - it wasn't Andy!

void oofWordParser::start const char *  stringToParse  )  [virtual]
 

call to start parsing different string

void oofWordParser::start  )  [virtual]
 

sets the oofWordParser to the start of the string.

Todo:
Fix existing bug where re-starts do NOT reset the content of mWord. Note that both start(const char*) and next() have a copy to mWord which further filters the letters!!!!

const char * oofWordParser::word  )  const [virtual]
 

returns the current word.

const char * oofWordParser::wordString  )  const [virtual]
 

returns the original string.


Member Data Documentation

unsigned long oofWordParser::mGeneratedCount [protected]
 

const char** oofWordParser::mGeneratedStrings [protected]
 

owned strings generated for subsearch call

unsigned long oofWordParser::mIterStrings [protected]
 

unsigned short oofWordParser::mMinWordLength [protected]
 

unsigned short oofWordParser::mNumStopWords [protected]
 

const char* oofWordParser::mpString [protected]
 

pointer to start of words to process, may point to mGeneratedStrings entries

const char* oofWordParser::mpStringStart [protected]
 

pointer to original strings

unsigned int OOF_mixRefCount::mReferences [protected, inherited]
 

const char** oofWordParser::mStopWords [protected]
 

passed in list of stop words to ignore

char oofWordParser::mWord[maxWordLength] [protected]
 

unsigned short oofWordParser::mWordLength [protected]
 


The documentation for this class was generated from the following files:
Generated on Thu Jan 1 22:10:14 2004 for OOFILE by doxygen 1.3.5