in any browser as well as in eclipse console output is same.
What do you mean by "in any browser"? The Java code runs on the command line, not in a browser, right? And again, does the Eclipse console support those characters?
Also, while I'm not sure what " tidy.setCharEncoding(org.w3c.tidy.Configuration.UTF8)" does, if the numerical entities are replaced by their corresponding characters (is that what "tidy.setNumEntities(true)" does?), then "stream.toString()" uses the platform default encoding - which is likely not ISO-8859 or UTF-8.
You are using UTF-8 when reading the contents from the InputStream and writing to the ByteArrayOutputStream, but then you're using the system default encoding to convert that byte into a String. Try using this: