• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Ganesh Patekar

Why padding is required when converting byte array to hexadecimal?

 
Greenhorn
Posts: 7
TypeScript Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Recently, I found this code snippet while searching on the subject cryptography and hashing. I could able to understand the most of it, apart from this padding with a zero part.
Why does it really require? What happens if we don't do that? Why do we pair this hexadecimal? I searched through the internet but I couldn't able to find a reasonable explanation.
I would like you guys to show me what I am missing here. Thank you.



Is it because each hex digit equivalent to 4 binary bits? So if we want to represnt a byte which is 8 bits, we gonna need to have two hex digits?
 
Saloon Keeper
Posts: 10669
228
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you didn't zero-pad each byte, then how would you see the difference between the hexadecimal representation of [17] (hex 11) and [1, 1] (hex 0101)?
 
Ranjith Suranga
Greenhorn
Posts: 7
TypeScript Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ya, my bad, silly question. I still can't figure out why it didn't come to my mind. Anyway, thank you very much, sir.  
 
Saloon Keeper
Posts: 6243
58
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another way to do it where padding happens automatically.
 
Saloon Keeper
Posts: 21133
134
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:Another way to do it where padding happens automatically. ...



A classic. and generally very efficient. Except use StringBuilder instead of StringBuffer. StringBuffer has synchronization overhead that's not necessary here.
 
Sheriff
Posts: 21805
104
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another improvement is pre-allocate the size: new StringBuilder(hash.length * 2). I've also used versions that write to a char[] instead of a StringBuilder, but that requires keeping track of your own index.
 
Carey Brown
Saloon Keeper
Posts: 6243
58
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Updated:
 
Tim Holloway
Saloon Keeper
Posts: 21133
134
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Spoor wrote:Another improvement is pre-allocate the size: new StringBuilder(hash.length * 2). I've also used versions that write to a char[] instead of a StringBuilder, but that requires keeping track of your own index.



Definitely. ALWAYS try and allocate StringBuilder/Buffers to the size of the result, if you know it. And if you don't, allocate the size of the largest expected result, excepting maybe in cases where a smaller value is common 99% of the time.

This also applies to Collections.

The default size is 16 characters. If you need a 17th character, a new buffer twice as large as the old one is allocated and everything in the new buffer has to be copied over. Repeating as the new buffer(s) fill up in turn. A fixed worst-case size eliminates this overhead.

Also, since you mention it:


Also, since byte is supposed to be unsigned, the "& 0x0F" on the first character is supposed to be redundant. Java should promote the unsigned byte to a positive integer before shifting. However, it's safer to do this, since some systems do consider "bytes" as signed, and in such cases, the byte 0xDC would end up as 0xFFFFFFDC before shifting, and 0xFFFFFFFD (int value -13) after the shift and that would be unfortunate.

Oh, and I know I'm the odd one out here, but I prefer "nybble", in solidarity with "byte".
 
Rob Spoor
Sheriff
Posts: 21805
104
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:


I actually prefer Character.digit(..., 16), possibly wrapped in Character.toUpperCase. Why not use the APIs you're given?

Also, since byte is supposed to be unsigned, the "& 0x0F" on the first character is supposed to be redundant. Java should promote the unsigned byte to a positive integer before shifting. However, it's safer to do this, since some systems do consider "bytes" as signed, and in such cases, the byte 0xDC would end up as 0xFFFFFFDC before shifting, and 0xFFFFFFFD (int value -13) after the shift and that would be unfortunate.


In Java, byte is signed, with values from -128 to 127 (inclusive). If you use the correct shift, >>>, it will use 0 to fill the "gap" on the left; if you use >> it will use the sign bit: 1 for negative numbers, 0 for non-negative numbers. But using 0x0F is always safe. (I actually had to lookup which one used 0 and which one used the sign bit, it's easy to get them confused.)
 
Tim Holloway
Saloon Keeper
Posts: 21133
134
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Rob Spoor wrote:
I actually prefer Character.digit(..., 16), possibly wrapped in Character.toUpperCase. Why not use the APIs you're given?



Well, if you're going to go that route, the first thing I'd do is check to see if there was a standard method to convert the whole string and be done with it.  

The sample Java code I gave actually can be replaced with about two machine language instructions on some hardware architectures - e.g., the IBM System/360. And commonly was. I forget if the Intel x86 can be that terse, however.

I was also thinking that Java was one of the languages with unsigned bytes. Sorry about that.

 
Bartender
Posts: 3519
150
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Gentlemen,

this is disappointing. I would have expected at least some refactoring and a failing test.
 
Ranch Hand
Posts: 218
5
MS IE Notepad Suse
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
whenever I'm encountered with such "conversions" it always bothers me that Java doesn't support such out of the box - and that the most methods the SE API provides doesn't fit what one expect
idk about the time java was once invented (back in the 80s) - but I guess output the internal binary data in a more human readable way like bitmask or hex was always some of the most basic things a maschine was needed to be able to do
remember how long it took java until the once internal-only base64 was made public?
isn't there still no "easy way" expect from regex or self-build short few-liners to correctly pad and signage - are we still slaves to apache common?
 
Stephan van Hulst
Saloon Keeper
Posts: 10669
228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean? What wrong with Integer.toHexString() or String.format()?
 
Matt Wong
Ranch Hand
Posts: 218
5
MS IE Notepad Suse
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:What do you mean? What wrong with Integer.toHexString() or String.format()?


Integer.toHexString: doesn't 0-pad - if the result is 0-15 (0-F) it only returns a String with one single character - that's the main issue in the original first post hence the question why it's needed to be done manually
as toHexString is located in Integer class it suggest to work on 32bit values (also, it's only in Long, Float and Double - the smaller datatypes Byte, Short and Character doesn't provide it - also, the toString() with additional radix parameter is only provided by Integer and Long) - and as "hex string" nowdays associated with two-leter encoding of an 8-bit byte it could be expected to at least pad to strings with length of multiple of 2 - although it could be debated if a padding to a 6 charachter string should be used when the value fits in 24 bit
String.format(): when using with some like %02x it's easy to represent a value as a 0-pad 2 character hex string - but what happens internal, parsing regex, converting, etc - is very heavy overhead for something that should be simple

in addition the parseInt method in my eyes lacks support for correctly parsing negative values correctly

correctly parse and doesn't throw any exception

instead of parse to negative integer max it throws a NumberFormatException
even more surprising:

wich doesn't make sense at all correctly parses to negative max int - although all except char is signed in java - hence "negative (negativ max int)" should throw the exception as it would result in int max +1 wich doesn't fit anymore
that's what I meant with "what SE API provide doesn't provide whats expected"
sure - as long year dev one might know these quirks - but for a new one just learn java or maybe programming at all  this doesn't seem to make sense - wich mostly causes such threads here wich mostly answered with "because someone decided so and made it spec > (link to spec here)" - wich, although maybe technically correct, isn't really a good explanation
 
Marshal
Posts: 65814
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Matt Wong wrote:. . . Integer.toHexString: doesn't 0-pad . . . the smaller datatypes Byte, Short and Character doesn't provide it - also, the toString() with additional radix parameter is only provided by Integer and Long) . . .

And what is wrong with System.out.printf("%#04x%n", myByteValue)? That looks perfectly all right to me.
Remember that one does arithmetic with longs, ints, and doubles; maybe they thought the smaller datatypes don't need toHexString() methods. Or maybe they forgot. Who knows. But you can still write System.out.printf("%#06x%n", +myCharValue). Note the unary +.

in addition the parseInt method in my eyes lacks support for correctly parsing negative values correctly . . .

. . . Have you read its documentation recently? Notice that keyboard int input is different from hexadecimal int literals in code. You are not allowed a decimal int literal larger than 2147483647, except that 2147483648 is permissible if preceded immediately by the the sign change operator (unary minus). That is exactly what you are showing. It's what the Java┬« Language Specification describes for decimal integer literals. It is the same behaviour as in decimal; if you look at the decimal version of Integer#parseInt(), you will find it says the same result as ...parseInt("123", 10). If "2147483648" won't parse in decimal why should its hexadecimal equivalent "80000000"? I am not familiar with this method, and there is nothing stopping you from writing your own parseInt() method.

. . . as long year dev one might know these quirks - but for a new one . . . this doesn't seem to make sense . . . although maybe technically correct, isn't really a good explanation

Whenever you use a new method or class, make sure to look at its documentation. We give the best explanations we can, but we cannot know what happened at Sun all those years ago, before many users of this website were born?
 
Stephan van Hulst
Saloon Keeper
Posts: 10669
228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Matt Wong, you asked why Java doesn't have an easy way to convert integers to padded hexadecimal strings, and when somebody pointed out String.format(), which does EXACTLY what one wants, you complain that it has too much overhead.

Well, I don't really know what you intend to do with your hexadecimal strings, but I don't think that the overhead of formatting an integer comes anywhere NEAR the overhead of printing a string or writing it to file or another data stream.

Note that a formatter conversion like %#08x is about as declarative as it gets, and the formatting engine is free to use a solution that is optimized for that particular format specifier.
 
Campbell Ritchie
Marshal
Posts: 65814
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:. . . String.format(), which does EXACTLY what one wants . . .

Maybe when the % tags were introduced they thought that would solve the problem of not padding hex Strings with 0s.
 
You got style baby! More than this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!