Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Converting AES from C# to Java?

 
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I'd really appreciate a conversion from C# to Java of the following code, but the thing is that it needs to make compatible data/files between the two versions of the program, so both must have an IDENTICAL effect on the data! For reference, AesManaged and all things related to that are imported from System.Security.Cryptography, and EncryptionMode is just an enumeration of two values (Encrypt and Decrypt).  I apologize for the messy for loop, but I needed to do essentially the exact same thing conditionally in two different directions, and didn't want a ton of redundant code.  You may also disagree with the way that I'm doing parts of the algorithm, but I have my reasons, and that's really not the issue, so please just stick to the conversion, rather than comments like, "This would be a better way to do it...", because I really just want to convert precisely what is here to have the same exact effect in both languages.  Anyway, I suggest reading the aesCipher function at the bottom first, because it's the simplest, and it's used by the aes function above it:

 
Rancher
Posts: 181
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My approach here would be to establish a large collection of unit tests first before you do any coding to convert from C# to Java. If there aren't any existing unit tests for the C# code, then either write some or hash a bunch of values by hand and write down the input/output bytes. This will give you a set of data that you can use to reliably transfer the behavior from language to language (since test cases are a universal idea).

Beyond that, in place of AesManaged, I recommend using the Cipher class like this:

(you can attach an IV as well after the secret key)
That'll need to be in a try/catch block.

The rest of it looks like it can stay mostly the same.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, but now I have another problem: I've written the Java version and it encrypts and decrypts correctly, but it doesn't give the same result for an encrypted file as the C# version does, so even though it's encrypting, it's just not quite doing it the same way.  As far as I can tell, it's doing exactly the same thing as the C# version that I posted above (though I separated the functions slightly differently), but it's using CBC rather than ECB, because when I used ECB it gave me an error that I'm not supposed to use an IV with it.  In any case, I wouldn't it would matter if I switch to CBC mode, since I'm only encrypting one block at a time, anyway, but what do you think?  Anyway, here's my Java code:

 
Saloon Keeper
Posts: 12161
258
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Encryption algorithms are supposed to output a different ciphertext each time you encrypt the same message. ECB does not, and this is exactly one of its weaknesses.

IVs are absolutely pointless in ECB mode. You can remove the IV completely if you're going to use ECB.

Normally I'd strongly discourage you from using ECB, but I think I recall from your last topic that you won't be persuaded.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, as for producing different ciphertext each time it's encrypted, keep in mind that I'm taking about encrypting it with the same key and the same algorithm, so theoretically the only way that it would be different is if it's padded with random data, which isn't necessary because all that I'm dealing with is divisible by the block size and the key size.  And other than that, I don't want it to produce different ciphertext because the whole point is that I need to be able to encrypt something with either version of my program (C# or Java) and then decrypt with the other one.

I can't remove the IV because C# is using it, and I have to be compatible between the two programs, to produce exactly the same files.

But do you have any idea why I'd be getting different results in my two versions of the program?  I can only think of 2 possible reasons at this point (but I'm open to ideas):

- Notice that I'm setting the key size to 256 and the block size to 128 in the C# version but I don't think I'm specifying it at all in Java.  How would I do that?

- In C# I'm using unsigned bytes, but Java doesn't seem to have that type, so I use the fixSign function declared at the bottom, which is intended to keep all of the bits the same, even if the number is technically different.  Do you think that should work as is?
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:so theoretically the only way that it would be different is if it's padded with random data, which isn't necessary because all that I'm dealing with is divisible by the block size and the key size


The IV is supposed to be random. Encryption is not safe if you use a fixed IV. However, even if you used CBC with a randomized IV, it would be meaningless because you're not making proper use of the feedback mode.

And other than that, I don't want it to produce different ciphertext because the whole point is that I need to be able to encrypt something with either version of my program (C# or Java) and then decrypt with the other one.


You must prepend the IV to the encrypted data, en then during decryption reuse the prepended IV. This will work just fine even if you're decrypting with a different application. But like I said above, it's pointless because you're not using the feedback mode properly.

I can't remove the IV because C# is using it, and I have to be compatible between the two programs, to produce exactly the same files.


Go ahead and change the IV in your C# application and see if it makes any difference for the produced ciphertext.

But do you have any idea why I'd be getting different results in my two versions of the program?


Because your C# version uses ECB and your Java version uses CBC. CBC uses the IV. ECB does not.

Notice that I'm setting the key size to 256 and the block size to 128 in the C# version but I don't think I'm specifying it at all in Java.  How would I do that?


Java uses the size of the byte array that the key is in, so just use a 32 byte array. Setting the block size is pointless, because AES always uses 128 bits.

In C# I'm using unsigned bytes, but Java doesn't seem to have that type, so I use the fixSign function declared at the bottom, which is intended to keep all of the bits the same, even if the number is technically different.  Do you think that should work as is?


I would argue that it's a mistake to use ints to represent binary data in the first place. Why are you storing your key in an int array?
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:You must prepend the IV to the encrypted data, en then during decryption reuse the prepended IV. This will work just fine even if you're decrypting with a different application. But like I said above, it's pointless because you're not using the feedback mode properly.



Oh, I see.  So the whole point of the IV is the cram random data at the beginning so that if you keep xoring the blocks together then even the first actual data block will end up more random?  Please excuse the fact that I'm a bit rusty on the purpose of the IV.

Stephan van Hulst wrote:Because your C# version uses ECB and your Java version uses CBC. CBC uses the IV. ECB does not.



Alright, so the fact that I'm giving the C# version an IV to use doesn't necessarily mean that it's actually using it, but rather, it just holds whatever I give it, and in this case it's not doing anything with it at all?

Stephan van Hulst wrote:Java uses the size of the byte array that the key is in, so just use a 32 byte array. Setting the block size is pointless, because AES always uses 128 bits.



Alright, so like with the IV, it's not actually making use of what I give it if it turns out to be irrelevant?

Stephan van Hulst wrote:I would argue that it's a mistake to use ints to represent binary data in the first place. Why are you storing your key in an int array?



Because just for testing purposes until I get it working properly, I hard-coded the key into the code as a byte array in C# just for convenience, so I copied that into Java and it threw a fit, telling me that the bytes have values that are too large.
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:Oh, I see.  So the whole point of the IV is the cram random data at the beginning so that if you keep xoring the blocks together then even the first actual data block will end up more random?


Exactly, but it only works if you use a new random IV for each entire message and let the cipher mode perform its feedback magic. The algorith you wrote is like a variant of ECB, even if you use CBC for the individual blocks.

Alright, so the fact that I'm giving the C# version an IV to use doesn't necessarily mean that it's actually using it, but rather, it just holds whatever I give it, and in this case it's not doing anything with it at all?


Right.

Alright, so like with the IV, it's not actually making use of what I give it if it turns out to be irrelevant?


I forgot. I'm not sure if Java allows you to set a different block size, but if so then the algorithm is not truly AES, but rather a variant of Rijndael. I'm pretty sure C# won't allow you to set a block size other than 128 for AesManaged.

Because just for testing purposes until I get it working properly, I hard-coded the key into the code as a byte array in C# just for convenience, so I copied that into Java and it threw a fit, telling me that the bytes have values that are too large.


Use hexadecimal literals. They are unsigned.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Use hexadecimal literals. They are unsigned.



Actually, that's what I've been using, but when I copied the array from C# to Java, anywhere that the hex number was greater than 127 (or whatever that is in hexadecimal) there was an error saying that it couldn't convert an int to a byte, so I just made it an array of ints and then made a function to convert it to a byte array.

Also, I noticed that when I encrypt using Java I get fewer characters for the same file than when I encrypt with C#, but the files sizes are the same.  For Java, it mostly looks like Chinese text, and for C# it just looks like normal symbols and letters.

I surmised that the Java version must be turning everything into Unicode characters, which are probably mostly Chinese symbols, since there are thousands of those, and the C# version is mostly ASCII stuff I suppose.  But the weird thing is that if anything, I would expect it to be the reverse of that!  In C#, I can hold unsigned bytes all the way up to 255, but in Java, I had to do the function to convert to negative numbers, just so that they'd still have the same bits, so if anything went wrong with that then I could see how everything would be clipped to within the 0 -> 127 range and then all look like ASCII, but that would imply that the Java version would look like ASCII and the C# version would look Chinese, though actually it's the opposite!  Why would that be?
 
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't see anywhere that your code converts bytes to Strings -- that's about the only place where Unicode would come into play. And that certainly shouldn't be happening within the encryption code you're calling -- encryption always works on bytes.

As for unsigned bytes, I don't see why you can't use "byte b = (byte) 200". That produces a byte which Java interprets as -56, but it has the same eight bytes as the equivalent C#, what's that, "Byte b = 200"? As long as you aren't doing arithmetic with those numbers I don't think there should be a problem.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:I don't see anywhere that your code converts bytes to Strings -- that's about the only place where Unicode would come into play.



Although if you converted the bytes to Strings to look at them, which it sounds like you did, then Unicode would come into play. So I think that's a red herring.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I didn't realize that if it's too big then it would wrap the numbers and make them negative.  I was worried that (byte)200 would just be 127 or something, since that's the maximum number.

I didn't convert them to strings, but I wrote it to a file and then opened it in Notepad, and that's where I saw the Chinese characters.  Sorry that I wasn't clear about that.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:I didn't convert them to strings, but I wrote it to a file and then opened it in Notepad, and that's where I saw the Chinese characters.  Sorry that I wasn't clear about that.



You're better off using a hex editor to look at files which don't contain text, instead of Notepad. And when you write non-text data to a file, make sure you use an OutputStream and not a Writer, otherwise you'll get your bytes converted to Unicode text.
 
Marshal
Posts: 3151
466
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use Notepad++ with the Hex Editor plugin to view and modify byte data.

 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Paul Clapham wrote:You're better off using a hex editor to look at files which don't contain text, instead of Notepad. And when you write non-text data to a file, make sure you use an OutputStream and not a Writer, otherwise you'll get your bytes converted to Unicode text.



Let me make sure I'm understanding you.  Are you saying that if I write bytes - not strings, but an array of bytes into a file - using any sort of writer instead of stream, then it will assume that it's text and change the bit in it, to make it Unicode compatible?

And thanks for the suggestion Ron.  I'll look into Notepad++.

EDIT: I just got and installed Notepad++.  It has a way to convert from ASCII to Hex and back, but it just prints it right there, with no spacing, which is fine, but I think that hex editor that you use looks great!  Mine doesn't seem to have it though; is it something that I have to get separately?
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:Let me make sure I'm understanding you.  Are you saying that if I write bytes - not strings, but an array of bytes into a file - using any sort of writer instead of stream, then it will assume that it's text and change the bit in it, to make it Unicode compatible?



Yes. That's what a Writer is for in Java. It's not just a matter of "changing the bit", encoding Unicode text into bytes and decoding them back is much more complicated than that. I could go on and explain encodings and charsets and UTF-8 and all that but you don't need to know all that. Just don't go there.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By the way, Java has been using streams for this from the start, but I don't know whether .NET may be using a reader and writer - I'd have to check.  But that could be the discrepancy.  However, I'd have thought that if a writer translated to Unicode then wouldn't that hold all of the Chinese characters, etc.?  But then on the other hand, if all the bits are unrestricted then that would be all possible characters, though I'd expect to perhaps see a bunch of black squares or empty boxes signifying invalid characters or something.  I don't know.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:However, I'd have thought that if a writer translated to Unicode then wouldn't that hold all of the Chinese characters, etc.?  But then on the other hand, if all the bits are unrestricted then that would be all possible characters, though I'd expect to perhaps see a bunch of black squares or empty boxes signifying invalid characters or something.  I don't know.



Yes, Unicode includes all of the Chinese characters and a whole lot of other scripts as well. But it does that by having Unicode characters represented by more than one byte -- two or three or maybe four now, I don't know. And it's possible for a sequence of bytes to not represent Unicode characters, although Java won't throw an exception when it sees those bytes.

I've said already that you don't need to know any of this stuff for your encryption project, but I guess we need to get into it. There are several (many) charsets which can be used to translate between (encoded) bytes and (decoded) Unicode characters. Some of them can only handle a subset of Unicode. I don't know which one you used and neither do you -- unless you specified one with your writer based on some code which you got off the web, but it's optional to specify a charset so most likely you didn't. The default charset varies depending on your environment, e.g. your operating system and maybe the IDE you're writing the code in. But none of the charsets are designed to take a sequence of random bytes and produce something meaningful, they are all based on the assumption that the bytes they are given were encoded from something meaningful by that charset. If you break that assumption then all bets are off.

So the fact that you get Chinese characters is just an artifact of whatever charset you happened to use.

But like I said you don't have to know that. You just need to know not to go into that swamp.

And sorry, I don't know what features C# has to deal with Unicode, but you shouldn't be using any of them for this project either.
 
Ron McLeod
Marshal
Posts: 3151
466
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:I just got and installed Notepad++.  It has a way to convert from ASCII to Hex and back, but it just prints it right there, with no spacing, which is fine, but I think that hex editor that you use looks great!  Mine doesn't seem to have it though; is it something that I have to get separately?


Yes - you will also needs to install the Hex-Editor plugin.  Use the Plugin Manager (menu: Plugins⭢Plugin Manager⭢Show Plugin Manager⭢Available) to install it.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul: If by charset you mean something like "UTF-8" then no, I didn't pick one.  I realize that I don't need to know the specifics of how it works, but I'm just trying to diagnose what went wrong and in which of the two programs, and all evidence I have to go by is the encrypted data that they output, so that's why I'm analyzing what I see and comparing them.

Ron: I had tried that but it wasn't on the list, despite it being a pretty long list of plugins.
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're likely using an OutputStream to write the encrypted data to a file in your Java application. The file now just contains binary data, which you simply just can not interpret as text. When you open such a file in a text editor, you're basically telling the editor: "Here is a bunch of data which is not text, but make it look like text anyway!". What it will look like exactly depends on the poor text editor that is called upon to fulfill such a crazy request. Most text editors will just try to interpret the data as UTF-8 though, which is why it might appear to contain a lot of CJK ideograms.

I don't know what you're doing to write the binary data in .NET, but if your text editor displays it as mostly latin alphanumeric characters, it means you are most likely using some class that first encodes the data as Base64 or as hexadecimal before it writes it away to the file. That's because those are the most commonly used character encodings that can map any random binary message to a small readable ASCII subset. I can't tell you more than that though if you don't show us the code you used to write the data.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've been using HexEditor Neo for looking at binary files, but I'm sure there's a lot of similar tools out there. It doesn't really matter which one you choose, you just need something which works directly with bytes.
 
Ron McLeod
Marshal
Posts: 3151
466
Android Eclipse IDE TypeScript Redhat MicroProfile Quarkus Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:I had tried that but it wasn't on the list, despite it being a pretty long list of plugins.


Hmm ... it looks like I installed it using a plugin manager which is not available anymore.

It is possible to download and install the plugin manually.  Here's the steps - proceed at your own risk (my opinion the risk is low):

1. Download the plugin from GitHub (choose either the 32bit or 64bit version to match the version of NotePad++ that you have installed):
    64bit: NPP HexEditor
    32bit: NPP HexEditor

2. Open the zip file, and copy the HexEditor.dll file to the following directory (directory will depend on the version of NotePad++ that you have installed):
    64bit:   C:\Program Files\Notepad++\plugins\HexEditor
    32bit:   C:\Program Files (x86)\Notepad++\plugins\HexEditor

 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephen: But I did post the code above.

Paul and Ron: I'll look into those, as soon as I get time.
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, the code of your Java application. We already know that it's printing raw binary data because you are using OutputStream. The question is what your C# application does.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The first post in this thread is my C# code.  Though actually, I don't think that shows any file reading/writing.  Anyway, I checked and it seems that in C# I'm always using FileStreams, to open the files, but then to read and write I'm using BinaryReaders and BinaryWriters, which I suppose is bad.  But I can't seem to find an equivalent to the DataOutputStream of Java to use for C#.  I think I've seen one kind of stream but it only handles bytes, and I really need to be able to write all sorts of primitive types.  What should I use?
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:I think I've seen one kind of stream but it only handles bytes, and I really need to be able to write all sorts of primitive types.


You mean you want to write primitive or textual data in addition to the encrypted data? I would use neither DataOutputStream in Java nor BinaryWriter in .NET, because neither has a good counterpart in the other language.

What kind of data do you want to write to file, why, and is it important to you that the data can be read using a text editor (i.e. all the data will be encoded)?
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, what I'm saying is that currently the C# version is writing the encrypted data using BinaryWriter and it's reading the encrypted data using BinaryReader.  But you said that readers and writers mess with the data to try to fit it into some standard format, hence the probable reason why it approximately doubles the size of the output when viewed in Notepad (though I think the file sizes are actually the same when written from C# or from Java), and why the text is mainly English/Latin/whatever and some punctuation, accent marks, etc. when written from C#, but when using Java it's mostly Chinese, presumably because the majority of the characters are in that category so the laws of probability on anything pseudorandom is going to make it mostly that.

Is my logic sound?  And if so, then I shouldn't be using BinaryWriters and BinaryReaders in C#, but rather I should use something like a BinaryOutputStream and BinaryInputStream, except that I don't know of anything like that, and I don't know what it would be called or what to import so that I could access it.
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look, why don't you just show us how you are writing the data in C#?

BinaryWriter does not perform any transformation on the binary data if you use the method that writes a byte array or a span of bytes. When you open a file written that way in a text editor, you should see the same kind of garbage as you see with the Java version.

Also, you're trying to assign meaning to binary data that was interpreted as text. Maybe the two files look different because you wrote them in a different way. Or maybe they look different because the text editor interprets them differently because they start differently.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well I didn't write an array of bytes, I wrote ints, longs, etc. in C#, but does that not apply?  You said: "...if you use the method that writes a byte array or a span of bytes."

I can't easily get the code to you because it's on a different computer which is offline, and some of the code must stay private, but I'll see what I can do.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've been leaving this thread to Stephan because he knows C# and I don't. But just one thing: an int value (assuming that's a 32-bit value?) isn't just a sequence of four bytes. When Java writes an int value to an external destination it always writes thebytes in big-endian sequence, regardless of how the underlying OS and hardware store them. This is so that the result is consistent between operating systems. That may be consistent with how C# happens to do it, or maybe not.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh, that's a good point!  I hadn't thought of that, but I'll look into it as well!

In any case, I may not be able to work on this for just a couple of days, because as usual, things are hectic around here, but I swear I'll get to it as soon as I can, because I'm anxious to complete it!
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul is correct. BinaryWriter uses little-endian byte order for primitives.

This begs the question, why are you writing longs and ints? That's pretty much exactly like writing random binary data as encoded text: You are assigning meaning where there isn't any.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually, I may have confused that with something else that I'm writing in the same program.  I'm sorry.  I'm trying to do about 5 things at once and I'm also busy with about 20 other things and everything is so hectic that I can't seem to keep anything straight!  Believe me, nobody is more bothered by that than I am.  Plus, keep in mind that when I relay the information back to you it's after I looked at it a while ago on the other computer, because it doesn't have Internet access, but only this one does, so I'm always doing one thing at a time, which slows me down and makes it hard to remember things properly.  But I'll take another look at it when I get the chance and try to finally sort out what's going wrong.
 
Paul Clapham
Marshal
Posts: 25682
69
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You started out thinking that there would be a straightforward translation, but now you've been repeatedly bitten by the universal rule "It's actually more complicated than that". Amusing for some of us, less amusing for you I'm sure. But hang in there, it's still possible that there's a solution which doesn't involve bending your original code too far out of shape.  
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm about to take another look at it.  The weird thing is, if it's true that the BinaryWriter doesn't corrupt the data at all, and leaves it in the precise binary form that it already was, then the fact that I'm dealing with essentially random data (meaning that it's encrypted so there's no discernable pattern and any bit could potentially be a 0 or 1), then regardless of which byte gets stored first or last shouldn't affect any sort of pattern or tendency either, so it theoretically should look like all possible characters, and I should be seeing mostly the Chinese alphabet in C# as well as Java.  Doesn't it at least seem like that would be the case?  So I wonder why it's not...
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I didn't say that BinaryWriter doesn't "corrupt" the data.

I said that the methods that write a byte array or a span of bytes don't transform the data. You yourself said that you weren't using those methods, but used the methods that wrote primitives instead. This is what's causing your problem, because those methods interpret and transform your data.

Simply put, your data does NOT represent integers. Why then are you telling the writer to write integers?
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's an explanation why writing integers makes your file "look more like Latin", while writing raw binary sequences makes your file "look more like Chinese":

When you write a random binary sequence and your text editor interprets it as some kind of Unicode encoding (most likely UTF-8), it is very likely that any printable character that the encoding has mapped to a random sub-sequence of bytes will happen to be a CJK ideogram. This is because, as you've theorized yourself, the vast majority of most printable characters are CJK ideograms.

When you write every separate byte of your random binary sequence as a separate integer, the writer will encode each integer as a two's complement little endian byte sequence. That means that the byte 0xEF will actually be written as 0xEF, 0x00, 0x00, 0x00. In fact, every byte you write will be written as the byte's two's complement binary value followed by three NUL bytes. Because none of these 4 byte sequences encode to a valid UTF-8 code point, and only few of them encode to a valid UTF-16 code point, your text editor will likely interpret the byte sequences using your system's default extended ASCII code page, which is usually Latin-1. The NUL bytes are not shown, and a byte value like 0xEF is shown as 'ï'.
 
Terrance Samson
Ranch Hand
Posts: 57
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well actually, I'm not writing the bytes as integers into the file; I'm just writing them as bytes.  All I'm doing with integers is storing a hard-coded key as an array of ints for testing purposes (I'll use a key file later), because in C# it's an array of bytes, but in Java it threw a fit when I tried to assign values larger than 127, so I just made them ints, and then made a different array, with that function for converting and saving the int values into the bytes, but with potential negative values.

Aside from that, when I said before that I'm reading and writing other data types from the file, I was mistaken, because I'd been working on it a long time and my brain was tired, but what I was actually thinking of was a different part of the program - sorry.

In any case, I've extracted just the essential parts for the AES stuff into a separate test project (in C#) and tweaked it a bit, and made sure that I'm using streams, and so on (which it seems like I already was, anyway).  Somehow, I'm now able to write encrypted files from C# and from Java which look identical!  Even the specific characters are identical, even though I don't know what they mean!  I'm not sure exactly what fixed it, but basically, that means that my problem is solved!  Now I just have to merge the little experimental program back into the main one, but I don't think that will cause any problems (fingers crossed).  Thanks for all your help!
 
Stephan van Hulst
Saloon Keeper
Posts: 12161
258
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Terrance Samson wrote:Even the specific characters are identical, even though I don't know what they mean!


They don't mean anything. It's random data.
 
We can walk to school together. And we can both read this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
    Bookmark Topic Watch Topic
  • New Topic