The Sun JDKs apparently consider "UNICODE" to a be a synonym for UTF-16. I would consider it a warning sign that the guys who made the file may not really know what encoding they're using. Ask if they mean UTF-16 - if they say yes, then great. If they say "what?", then assume they don't really know what they're doing - find someone else who does, or study their code yourself. You might try some other common Unicode-based alternatives instead: UTF-8, UTF-16BE, UTF-16LE.
Another very suspicious thing I notice is the mention of ISO-8859-15 above. What's that for? If you're representing Japanese characters, they can't
possibly be encoded (or decoded) correctly using ISO-8859-15. That's an 8-bit encoding scheme, it can't represent more than 256 characters. (Japanese of course has many more than this.) So, what's the purpose of ISO-8859-15 here?
It sounds like there are four or five different encodings being used here for various purposes. I think you need to simplify things a bit by only worrying about one or two at a time. Forget about what encoding is used in Oracle (which you don't have direct access to anyway, at the moment) or what encoding is used in SQL Server (which maybe you can access, but there will be several more steps involved here). You've got a single file, allegedly written in "UNICODE", and you want to read it and render its contents in HTML somehow. Here's a simple way to try that:
This allows you to create a simple HTML file on your local machine, with no web server or anythign else to worry about. You can open it with your browser and see how it looks. If it's no good, the probablem is most likely that the testload.dat encoding is not really in UTF-8. So try other encodings until something works. Then go back and tell the people who gave you the file, what their encoding
really is.
[Rahul]: Byte calculations / manupulations are always prefered over character manupulation. Um, so, I really can't agree with that. It's possible that they may be faster, but in a case like this, you really need to understand the details of the encoding to work with the bytes reliably. Unless you're only using US-ASCII characters (which are admittedly very common), the process is very error-prone. For things like UTF-8, UTF-16, and SJIS, I would strongly recommend do
not try to work with the raw bytes yourself - unless you really need to, and know what you are doing. It's much easier to rely on the Java libraries to convert these bytes to characters, using things like Charset, InputStreamReader, OutputStreamWriter, etc.