• Post Reply Bookmark Topic Watch Topic
  • New Topic

reading and writing Strings with accent marks  RSS feed

 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm reading in Spanish words from a text file, with characters like á and ñ. Then I'm separating them into groups and writing them to new text files. The Strings with accent marks look fine in the input text file and look fine in the output text file. However, some of the logic in that separates them in the groups is behaving incorrectly. This is because accented characters are treated as multiple characters in the logic, i.e. á is treated as á. So for example, the logic thinks cádiz and cáliz have the same first 3 letters (cá), when in fact they only have the same first 2 letters (cá). So cádiz and cáliz are both put in the cá output file, instead of cádiz in the cád file and cáliz in the cál file.

How do I get the logic to treat accented characters correctly? How do I get cádiz in the cád file and cáliz in the cál file?

Does it have something to do with Locale? If so, exactly what code do I need to write, because I've been messing around with Locale for a while now with no luck.
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It might have something to do with character encodings, depending how the text is stored in the text files. Once you pull it into Java Strings though, everything should be Unicode, and those special characters should be single chars, not a combination of two. How exactly do you read the input files? A Reader? An InputStream? Some combination of the two?
 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is my code for reading in the words.

Lots of words get printed out with á, none with á. However, the words in the input text file and later the output text files have á and not á. It looks like only java is treating the accented character as 2 characters.
 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This forum will not allow me to attach files with extension .txt, .list, or no extension, so I've copied and pasted a sample of the input text file into this post, which contains some of the offending characters.

anexarán
anexarás
anexaría
anexasen
anexases
anexaste
anexemos
anexitis
anfibias
anfibios
angelita
angelito
angelote
angoleña
angoleño
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, I think the problem is that your files are encoded as UTF-8, and you are reading them as ISO-8859-1 (or Cp1252 if you're on Windows). I'm not sure why your files are encoded that way though, since the standard encoding should be fine for Spanish. In any case the FileReader will use the default encoding for the platform, so you want to do something like:

 
za zan
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks a lot. That seems to have solved my problem.
 
Campbell Ritchie
Marshal
Posts: 56536
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Too difficult a question for the beginner's forum. Moving.

And have a look at the Joel Spolsky article: here.
 
Neo Zoon
Greenhorn
Posts: 2
Java Netbeans IDE Windows XP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greetings to all,
the post is outdated but i think this comes all the time. i had this issue myself since my company uses french mostly and it uses charcater with accents.
the solution i found is using unicode.
this is not from me, i found this on http://www.eteks.com/tips/tip3.html
it's in french but the table is very clear . the author offers HTML codes for accents too

enjoy.
 
Campbell Ritchie
Marshal
Posts: 56536
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch
 
Neo Zoon
Greenhorn
Posts: 2
Java Netbeans IDE Windows XP
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks Ritchie, i hope i can help and get helped in here ^^
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!