posted 7 years ago
Hi,
I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with Lucene 4.4.0.
Lucene's WordDelimiterFilter should be ideal for this. However, it treats every(?) non-alphanumeric character as a delimiter. So, terms like 'C++' are transformed to 'C', which is not what I want.
Apparently, Solr allows to specify custom delimiters. But how can I do it in Lucene?
I have looked into the documentation and the 'byte[] charTypeTable' parameter in the Constructor looked promising. But it seems to have no effect if I specify some delimiters in a charTypeTable.
Thank you!