• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Translate CharSet of InputStream

 
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am using an external library that reads bytes from an InputStream and assumes that it is UTF-8 text while it is actually text in another char encoding. So I need to translate the actual bytes from ISO-8859-15 (or so) to UTF-8. What I want to avoid is to have to read the entire stream into a String first. Any ideas? I hoped that there was a commons-lang or commons-io util that would do this for me, but I didn't find any.

Jeroen
 
Bartender
Posts: 1952
7
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You could wrap the InputStream with an InputStreamReader and use a CharsetDecoder to decode from ISO-8859-15 to Unicode. If you browse through InputStreamReader's Javadoc page you should find an appropriately overloaded constructor.
 
Jeroen Kransen
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I saw that, but then I've got a Reader, and I really need an InputStream. Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case. Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.

 
Master Rancher
Posts: 5161
83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeroen Kransen wrote:Maybe there's a way to put another InputStream on top of that, but it seems like a lot of overhead for a seemingly simple and common use case.


Yeah, welcome to the world of Java I/O. It's like that a lot.

Jeroen Kransen wrote:Also, I need UTF-8, not Java's implementation of Unicode, which I believe is UTF-16.


I'm not sure where that idea came from. Unicode includes both UTF-8 and UTF-16 within its standard, and internally Java uses both forms in various ways. But there are classes to handle either of these encodings, and many more.

The main problem is, as you've said, you can get a Reader but what you need is an InputStream. You could do this by writing everything to a byte array or file, and then rereading it. Or if you want something quicker and/or with lower memory requirements (assuming the file is fairly large), then you can try something like this:

Which is more work than we might like, but oh well. It may be possible to do this faster with NIO, but the basic idea would be the same.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic