OOFILE  1.9
Public Member Functions | Protected Types | Protected Attributes | List of all members

Base class for user-replaceable word parser. More...

#include <oofwords.h>

Inheritance diagram for oofWordParser:
Inheritance graph
[legend]
Collaboration diagram for oofWordParser:
Collaboration graph
[legend]

Public Member Functions

void decRefs ()
 
void extract (std::ostream &)
 
unsigned long generatedCount () const
 
const char ** generatedStrings () const
 return array of pointers to individual words. More...
 
virtual void generateSearchArray ()
 
void incRefs ()
 
virtual bool isValidWord (const char *) const
 
virtual unsigned short & minWordLength ()
 returns/sets the minimum word length cut-off. More...
 
virtual bool more () const
 return true if any words left. More...
 
virtual void next (void)
 Find the next word, updating the mWord pointer returned by word();. More...
 
 oofWordParser (const char *stringToParse=0, unsigned short minWordLen=3, const char **stopWords=0, unsigned short numStopWords=0)
 
virtual void start ()
 sets the oofWordParser to the start of the string. More...
 
virtual void start (const char *stringToParse)
 call to start parsing different string More...
 
virtual const char * word () const
 returns the current word. More...
 
virtual const char * wordString () const
 returns the original string. More...
 
virtual ~oofWordParser ()
 dtor. More...
 

Protected Types

enum  { maxWordLength = 40 }
 

Protected Attributes

unsigned long mGeneratedCount
 
const char ** mGeneratedStrings
 owned strings generated for subsearch call More...
 
unsigned long mIterStrings
 
unsigned short mMinWordLength
 
unsigned short mNumStopWords
 
const char * mpString
 pointer to start of words to process, may point to mGeneratedStrings entries More...
 
const char * mpStringStart
 pointer to original strings More...
 
unsigned int mReferences
 
const char ** mStopWords
 passed in list of stop words to ignore More...
 
char mWord [maxWordLength]
 
unsigned short mWordLength
 

Detailed Description

Base class for user-replaceable word parser.

Used mainly to search tables and word-indexable fields. Default behaviour in this base class is used if no overriding default object is attached to a table or dbConnection.

includes stop-word list and minimum length. Doesn't do any stemming or more complex parsing.

A word is generated from the isalnum() chars between each isspace() separators. Any non alnum chars inside a word are skipped.

mpString remains a pointer to the start of the current word. mWord is updated (in theory) with the filtered copy of the word.

Bug:
SPOTTED 14 Nov 2000 - calling start() again after using parser doesn't update mWord

To aid in search implementation, generateWordArray() creates a separate list of strings which is retained and can be accessed by generatedStrings() and generatedCount()

See also
dbTable::searchSelContainsAnyOf for example of use.
Note
not to be regarded as good example of programming style until parser cleaned up and storage neater!
Todo:

cleaner abstraction of word source iteration

work out why minWordLength returns a reference and add setter if necessary

Definition at line 40 of file oofwords.h.

Constructor & Destructor Documentation

oofWordParser::oofWordParser ( const char *  stringToParse = 0,
unsigned short  minWordLen = 3,
const char **  stopWords = 0,
unsigned short  numStopWords = 0 
)

Definition at line 50 of file oofwords.cpp.

References start().

oofWordParser::~oofWordParser ( )
virtual

dtor.

Delete the const char* strings owned in mGeneratedStrings, if we ever generated them!

Todo:
fix mGeneratedWords management so deletes without complaint

Definition at line 71 of file oofwords.cpp.

References mGeneratedCount, mGeneratedStrings, and OOF_mixRefCount::mReferences.

Member Function Documentation

void OOF_mixRefCount::decRefs ( )
inherited
void oofWordParser::extract ( std::ostream &  )

Definition at line 257 of file oofwords.cpp.

References more(), next(), start(), and word().

Referenced by operator<<().

unsigned long oofWordParser::generatedCount ( ) const
inline
const char ** oofWordParser::generatedStrings ( ) const
inline
void oofWordParser::generateSearchArray ( )
virtual
void OOF_mixRefCount::incRefs ( )
inlineinherited
bool oofWordParser::isValidWord ( const char *  s) const
virtual

Definition at line 169 of file oofwords.cpp.

References OOF_stricmp().

Referenced by next(), and start().

unsigned short & oofWordParser::minWordLength ( )
virtual

returns/sets the minimum word length cut-off.

Definition at line 250 of file oofwords.cpp.

References mMinWordLength.

bool oofWordParser::more ( ) const
virtual

return true if any words left.

Todo:
make this more reliable, preparse the next word at next() and start()

Definition at line 217 of file oofwords.cpp.

References mpString, and mpStringStart.

Referenced by extract(), generateSearchArray(), OOF_simpleRecordBackend::SearchWordNonIndexed(), and OOF_ctreeKeywordIndex::updateWords().

void oofWordParser::next ( void  )
virtual

Find the next word, updating the mWord pointer returned by word();.

Todo:
refactor to isolate the common parsing code in start() & here
Note
I don't know who wrote this - it wasn't Andy!

Definition at line 135 of file oofwords.cpp.

References isValidWord(), maxWordLength, mGeneratedStrings, mIterStrings, mpString, mpStringStart, mWord, and mWordLength.

Referenced by extract(), generateSearchArray(), OOF_simpleRecordBackend::SearchWordNonIndexed(), and OOF_ctreeKeywordIndex::updateWords().

void oofWordParser::start ( )
virtual

sets the oofWordParser to the start of the string.

Todo:
Fix existing bug where re-starts do NOT reset the content of mWord. Note that both start(const char*) and next() have a copy to mWord which further filters the letters!!!!

Definition at line 196 of file oofwords.cpp.

References maxWordLength, mGeneratedStrings, mIterStrings, mpString, mpStringStart, and mWord.

Referenced by extract(), generateSearchArray(), oofWordParser(), OOF_simpleRecordBackend::SearchWordNonIndexed(), and OOF_ctreeKeywordIndex::updateWords().

void oofWordParser::start ( const char *  stringToParse)
virtual

call to start parsing different string

Definition at line 90 of file oofwords.cpp.

References isValidWord(), maxWordLength, mGeneratedStrings, mIterStrings, mpString, mpStringStart, mWord, and mWordLength.

const char * oofWordParser::word ( ) const
virtual

returns the current word.

Definition at line 228 of file oofwords.cpp.

References mpStringStart, and mWord.

Referenced by extract(), generateSearchArray(), OOF_simpleRecordBackend::SearchWordNonIndexed(), and OOF_ctreeKeywordIndex::updateWords().

const char * oofWordParser::wordString ( ) const
virtual

returns the original string.

Definition at line 239 of file oofwords.cpp.

References mpStringStart.

Member Data Documentation

unsigned long oofWordParser::mGeneratedCount
protected

Definition at line 71 of file oofwords.h.

Referenced by generatedCount(), generateSearchArray(), and ~oofWordParser().

const char** oofWordParser::mGeneratedStrings
protected

owned strings generated for subsearch call

Definition at line 70 of file oofwords.h.

Referenced by generatedStrings(), generateSearchArray(), next(), start(), and ~oofWordParser().

unsigned long oofWordParser::mIterStrings
protected

Definition at line 72 of file oofwords.h.

Referenced by next(), and start().

unsigned short oofWordParser::mMinWordLength
protected

Definition at line 67 of file oofwords.h.

Referenced by minWordLength().

unsigned short oofWordParser::mNumStopWords
protected

Definition at line 69 of file oofwords.h.

const char* oofWordParser::mpString
protected

pointer to start of words to process, may point to mGeneratedStrings entries

Definition at line 63 of file oofwords.h.

Referenced by more(), next(), and start().

const char* oofWordParser::mpStringStart
protected

pointer to original strings

Definition at line 64 of file oofwords.h.

Referenced by more(), next(), start(), word(), and wordString().

unsigned int OOF_mixRefCount::mReferences
protectedinherited
const char** oofWordParser::mStopWords
protected

passed in list of stop words to ignore

Definition at line 68 of file oofwords.h.

char oofWordParser::mWord[maxWordLength]
protected

Definition at line 65 of file oofwords.h.

Referenced by next(), start(), and word().

unsigned short oofWordParser::mWordLength
protected

Definition at line 66 of file oofwords.h.

Referenced by next(), and start().


The documentation for this class was generated from the following files: