I finally identified the chinese search problem.
My environment first:
Tomcat 4.1
Jdk 1.5
Debian sarge (LANG=en.US)
JForum 2.1.6
Sympton, searching chinese not returning desired result.
The principle cause of that is the submision method for search. JForum uses "GET" for search which naturally is just for ascii data. Unless backend explicitly do the convertion, GET would only work for ascii char set.
The solution is to use POST instead of GET for search. My
test result shows it works. You only need to change search.htm, just replace GET with SET, no need to recompile.
The alternative is to do conversion in the backend but that involves code change and more risky - both from my experience and others' suggestions. There's much debate on this issue, I don't want to get into details here.
Another observation rather than problem is, JForum has a minimum length for a
word to be indexed, which means it becomes searchable. So if you have a short word, it might not be searchable - that's a design rather than a bug. The current setting is
search.min.word.size = 3
So if you have a short message like ���, it won't show up in the search result.
[originally posted on jforum.net by luorihui]