• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

BufferedWriter with prime quotation marks

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am trying to take multiple files and merge them all in to one new file. Everything to this point is great except for one item. *note i am receiving these files so i have no control over changing anything*

When writing the file get weird symbols replacing the the prime quotation marks that were in the original file. I can't figure out what I am doing wrong and have tried setting the encoding in bufferedWriter but that does not seem to make a difference. Any help would be great. Thanks!

Here is the original file:


Here is what it looks like in the newly created file:


 
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How do you know the files are encoded in UTF-8? How do you know the quote marks are being read correctly by your reader? Try printing them to the console; if that does not work try the old‑fashioned technique of displaying them on an option pane, so you can verify they are being read correctly.
 
Tai Lo
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Both console and option pane print out the following:

 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you reading in UTF-8? If so, it is likely that your input files are not UTF-8 and you are therefore not reading them correctly. Is there amy way to find out the encoding of the files you are reading?
If not, you might try printing out each char as an int (preferably in hexadecimal) and comparing your output with the Unicode values. You might even use the read() method because that gives you the individual characters. The values of \u201c and \u201d are \u201c=“ \u201d=” Here is a table which tells you what Unicode characters are in UTF-8. If you go to the U+2000 block, you find quotes like this:-If your quotes don't come out as e2809b/e2809c then you aren't using UTF-8.
 
Tai Lo
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the hint. I completely neglected checking the file's encoding. Turns out it was ANSI (Cp1252). Using this has seemed to solve my issue. Thanks!
 
Campbell Ritchie
Marshal
Posts: 79239
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Success
 
reply
    Bookmark Topic Watch Topic
  • New Topic