Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

NX URLyBird 1.3.2: Extracting null terminated strings from ByteBuffer

 
Larry Cha Cha
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In my read method I am reading one record at a time into a ByteBuffer and then decoding into a CharBuffer. Next I need to obtain each of the fields so I get a CharSequence of the first field being the hotel's name.
The toString method of the CharSequence leaves me with a string the length of HOTEL_NAME_BYTES so I then have to get up to the null character.
I asked in another thread for a quick e.g of how to do this which can be seen in the second snippet of code below, however it does not seem to work for me. The nullpos is returning -1 meaning that the null char couldn't be found.
If I print out each character in my 64 long string I get the hotel name followed by several blanks.
I feel like this is going to be a red faced moment for me but please help.
Cheers.

 
Larry Cha Cha
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok it seems like all of my strings aren't null terminated at all but full of spaces, ie 32. A good old trim() seems to shorten it down so that problems gone now.
Does my technique of reading a full record into a ByteBuffer decoding it all to a CharBuffer and then sequentially obtaining each field like in the code above seem ok? It feels to me a bit long winded.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does my technique of reading a full record into a ByteBuffer decoding it all to a CharBuffer and then sequentially obtaining each field like in the code above seem ok? It feels to me a bit long winded.
Well, it can be made a little more efficient. That Charset won't change, so you can hold it in a static final variable - no need to look it up for each record. And you can skip the CharsetDecoder and CharBuffer allocation - Charset's decode() does that for you. (I'm not suggesting putting a CharsetDecoder in a static variable, because it's not thread-safe.)
Now because the assignment is using US_ASCII it's possible to convert bytes to chars just by casting. But I think that's a bad approach, because in the real world, customers often don't know the difference between US-ASCII and Cp-1042 and ISO-8859-1 and any other encoding. So while they may say they something is US-ASCII, don't trust them. Put an explicit Charset in your class, so that later if someone realizes the encoding is wrong, they can fix it just by changing the name. If you had just used casting to convert chars, how would a junior programmer have any idea how to fix that? Better to put a Charset in there now to show them how it's done.
Note also that FileChannel does not guarantee it will fill the buffer when it reads. (It usually does, unless it's a really big buffer, but don't rely on this.) Wrap the read() in a loop and check the return value to decide if the read is complete.
 
Philippe Maquet
Bartender
Posts: 1872
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Jim,
I also decided not to hardcode the encoding, but I don't use the Charset class. Here is what I do :
I have a String charsetName property set by default to "US-ASCII" (which is documented as supported by all java platforms).
My setCharsetName() method may throw the UnsupportedEncodingException. If that condition is met, and before throwing it, I reset charsetName to its default "US-ASCII" value. You may also pass null as the charsetName, in which case the default encoding for the platform will be used.
To read a record (writing is similar), I follow these steps :
  • read from the fileChannel using a ByteBuffer
  • get a byte[] by calling ByteBuffer.array()
  • get each String field value by calling :



  • I am not sure I am OK. Any comment will be welcome.
    Cheers,
    Philippe.
     
    Jim Yingst
    Wanderer
    Sheriff
    Posts: 18671
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I also decided not to hardcode the encoding
    Perhaps I should clarify; I actually do hardcode the encoding, but I do so in exactly one place, with a nice explicit declaration that should be obvious to junior programmers.

    The encoding could be made configurable at runtime; didn't see the need though. I'm more concerned with making sure there's at least some acknowledgement in the program that the encoding could be replaced with something other than US-ASCII. I distrust techniques that implicitly assume an encoding, such as new String(byte[]) or RandomAccessFile's readLine() method. That's the sort of thing that confuses junior programmers later since they didn't realize an encoding was assumed, and have no idea that they might need to change it, or how to change it.
    Sorry, that's just another of my rants. Back to your comments...
    To read a record (writing is similar), I follow these steps :
    Looks good. There are a number of ways to do this; it probably doesn't matter too much which you use. I tried to use nio classes wherever possible because I wanted to try them out some more. Also if performance were an issue I could argue that the nio classes are probably more efficient because they are set up to allow more things to be handled by the OS using native code, which will usually be faster. But I doubt that really matters, as long as whatever technique you use gives correct results and is readable.
    One quibble:

    I'd put something inside the catch - an e.printStackTrace(), or better yet a logging statement (setting up logging as very worthwhile IMO) so that if someone ever alters the other part of the code that tests the charset, and an exception is thrown later, you will have some evidence onscreen or in a log file, which will aid in locating the problem.
    Here's another simple solution:

    Assuming that code is run with assertions enabled (in testing at least, and maybe in production), this will throw a nice obvious error if a problem occurs.
    Yet another option (in fact, the one I like best) is to not bother catching the UnsupportedEncodingException at all, if you're sure it should not occur. It's a RuntimeException so it's not checked anyway. If you're right and it's never thrown, great, you didn't clutter up your code worrying about it. If you're wrong, well, the exception will stop the program, and force someone to deal with the problem then and there - and personally I think that's the bestthing that could happen here. In some final production systems this might be viewed as completely unacceptable - they'd rather the system continues to operate at all costs. But I usually take the opposite view unless the customer indicates otherwise.
    [ July 04, 2003: Message edited by: Jim Yingst ]
     
    Philippe Maquet
    Bartender
    Posts: 1872
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Hi Jim,
    Thank you for your input.
    There are a number of ways to do this;

    Yes, even too much IMHO :-). It looked a bit messy to me, and that's why I wrote "I am not sure I am OK" (my tests showed it works).
    Also if performance were an issue I could argue that the nio classes are probably more efficient

    That's a good motivation to try other techniques. I'll test the Charset class and compare both in a loop before deciding which one I keep.
    I'd put something inside the catch - an e.printStackTrace(), or better yet a logging statement (setting up logging as very worthwhile IMO) so that if someone ever alters the other part of the code that tests the charset, and an exception is thrown later, you will have some evidence onscreen or in a log file, which will aid in locating the problem.

    I fully agree with you, you convinced me, but I prefer your third solution (as you :-)). I'll add a warning comment in setCharsetName() anyway, to emphasize the dependance.
    Yet another option (in fact, the one I like best) is to not bother catching the UnsupportedEncodingException at all, if you're sure it should not occur. It's a RuntimeException so it's not checked anyway.

    Oops ! I thought it was a checked exception ! There is a mistake in my Java doc :

    As a matter of fact, if the program cannot decode the database encoding anymore, or - even worst (!) - begins to write in the database using a wrong encoding, I think it's a good thing it crashes. :-)
    Regards,
    Phil.
     
    Jim Yingst
    Wanderer
    Sheriff
    Posts: 18671
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Oops ! I thought it was a checked exception !
    Sorry, my bad - I confused UnsupportedEncodingException with UnsupportedCharsetException. They mean approximately the same thing, but UEE is thrown by traditional IO classes and is a checked exception (as you thought), while UCE is thrown by some NIO classes (like Charset.forName(), which I've been using in my code). So, good thing I listed the options, since the last one I listed was wrong. Unless you switch to NIO of course.
     
    With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic