Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

BufferedWriter with prime quotation marks  RSS feed

 
Tai Lo
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to take multiple files and merge them all in to one new file. Everything to this point is great except for one item. *note i am receiving these files so i have no control over changing anything*

When writing the file get weird symbols replacing the the prime quotation marks that were in the original file. I can't figure out what I am doing wrong and have tried setting the encoding in bufferedWriter but that does not seem to make a difference. Any help would be great. Thanks!

Here is the original file:


Here is what it looks like in the newly created file:


 
Campbell Ritchie
Sheriff
Posts: 55292
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do you know the files are encoded in UTF-8? How do you know the quote marks are being read correctly by your reader? Try printing them to the console; if that does not work try the old‑fashioned technique of displaying them on an option pane, so you can verify they are being read correctly.
 
Tai Lo
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Both console and option pane print out the following:

 
Campbell Ritchie
Sheriff
Posts: 55292
156
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are you reading in UTF-8? If so, it is likely that your input files are not UTF-8 and you are therefore not reading them correctly. Is there amy way to find out the encoding of the files you are reading?
If not, you might try printing out each char as an int (preferably in hexadecimal) and comparing your output with the Unicode values. You might even use the read() method because that gives you the individual characters. The values of \u201c and \u201d are \u201c=“ \u201d=” Here is a table which tells you what Unicode characters are in UTF-8. If you go to the U+2000 block, you find quotes like this:-If your quotes don't come out as e2809b/e2809c then you aren't using UTF-8.
 
Tai Lo
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the hint. I completely neglected checking the file's encoding. Turns out it was ANSI (Cp1252). Using this has seemed to solve my issue. Thanks!
 
Campbell Ritchie
Sheriff
Posts: 55292
156
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Success
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!