Next     Previous  TOC
 
 
 
 
Historical Approaches

Luhn Algorithm (cont.)
 

 5. Similar spellings are consolidated into word types (a rough approximation of a stemmer)

 5a.  any token with less than seven letter non-matches are considered to be of the same word type:
 


 
 
 

frequently

frequent
 
 

10 letters,

8 match, 2 non-match