Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

US-ASCII (again)

 
Martin Wallgren
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I've read this forum for a while and it helped me allot when I took the SCJP exam in November. I've been working on the SCJD off and on for the last few weeks, and after all the reading on this forum I finally decided to register. I'm working on the exam as a full learning experience and I'm testing out all the crazy ideas I have in my head (I'm guessing I'll have rewritten all the important classes quite a few times when I'm done).

Here's my questions.

I'm currently making a decision on how to approach the to all familiar character encoding. I've read some threads about it in the forum and here's my thought so far.

When I'm converting a String to byte[] for writing to the db file this is what I had planned.





My option to this is at the moment using



And that line is obviously shorter than my loop above and the motivation is that it gives me an IllegalArgumentException if there are any illegal characters in the String. The exception allows me to give some feedback to the user about the issue.

the easier one line conversion adds to the codes simplicity (those junior programmers aren't the brightest stars in the sky) but it will store any faulty characters as ? and just ignoring them feels wrong straight down to the bone.

Short question:

What are your arguments for and against the two choices?

How can I test my loop?

[ January 24, 2008: Message edited by: Martin Wallgren ]
[ January 28, 2008: Message edited by: Martin Wallgren ]
 
Ulf Dittmer
Rancher
Posts: 42969
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch.

In the real world it's obviously unacceptable to restrict text to US-ASCII, so I'm guessing that this is something stipulated by the SCJD exercise.

Either all input is US-ASCII (in which case there will be no problems with the conversion) or it's not (it sounds as if that's the case here) - in the latter case there is no meaningful content-preserving transformation using either of these methods. The way to store non-ASCII content in an ASCII database would be encode the data using base-64 or a similar encoding that transforms binary data to ASCII. That can be easily reversed for display purposes, but it would no longer be easily possible to use SQL operations (like comparisons) on the stored data.
 
Martin Wallgren
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ulf Dittmer:
in the latter case there is no meaningful content-preserving transformation using either of these methods.


This is why I'm throwing an exception if I encounter illegal characters.
[ January 28, 2008: Message edited by: Martin Wallgren ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic