• Post Reply Bookmark Topic Watch Topic
  • New Topic

Array of unicode characters  RSS feed

 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm working on a project in college and am confused about some MS documentation related to it. I'm trying to recreate an encryption key. The initial vectors for the key are the (password + salt). Ok sounds easy but the saltValue stored within my file is 32 bytes and not the specified 16 bytes in the description. The decoded hex string from base64 is 82a264407ea974a3cbccb4af5f3364fd. I know this is hex and each two characters is an ascii char but they are totally gobbledygook when converted to ascii. The output is ‚¢d@~©t£ËÌ´¯_3dý. Am I missing a step here?

The second conundrum is the password value. A string of unicode characters is specified. So say my password is "abc", this would be "\u0062\u0062\u0063". Would anyone know if this is the correct format to put them in as.

https://msdn.microsoft.com/en-us/library/dd925430(v=office.12).aspx

This is a link to the MS-offcrypto documentation which mentions the unicode characters.
 
Stephan van Hulst
Saloon Keeper
Posts: 7993
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Cryptography is done on binary data, so the salt and password identifiers used in the article refer to byte arrays. That means you first need to convert the password "abc" to a byte array using some sort of character encoding.

Concatenation (The '+' operator) refers to "pasting two byte arrays together" so they form one with the combined size.

The salt looks like gobbledygook when you try to encode it to ASCII, because it's random binary data, and never intended to be read as text.

If the specification says that salts must be 16 bytes, then obviously the file you have does not match the specification.

Honestly, the article is really poor because they specify the password to be an array of Unicode characters, while hashing is done on binary data. They don't specify whether the data is UTF-8, UTF-16 or something else still. They also use the identifier block without specifying what it means. I wouldn't be confident writing an algorithm based on this article alone. You may actually have to dive into the ECMA-376 specification.
 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Stephan. The saltValue is in the XML field next to the salt size so it's definitely correct. Saly size is specified at 16 while the size of the base64 decoded string is 32 bytes. Like I said if this was hex it would make sense. I understand what you're saying about the unicode characters. In the documentation it specifically mentions that the password be broken down to unicode before being hashed. My SHA512 algorithm was getting a byte array from the string already so this threw me off.
 
Stephan van Hulst
Saloon Keeper
Posts: 7993
143
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What XML? Do you have an example?

Salts are only encoded to store them more easily in text files. That doesn't mean the length of the encoded string is also the length of the original salt value.
 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

This is the xml from the end of the encrypted document. The saltSize is 16 and the saltValue which should conform to the saltSize, but when decoded from base64 to hex is 32 chars long.
 
Stephan van Hulst
Saloon Keeper
Posts: 7993
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I formatted your XML a little bit so it's more readable.

There are more inconsistencies with the article. The spin count is 100000 while the article specifies 50000. The cipher mode is CBC while the article specifies ECB (why I'm not sure, ECB is horrible). The hashing algorithm is SHA-512 while the article specifies SHA-1.

This file simply does not seem to conform to what the article prescribes.
 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The article, although it says is updated, seems to be based on office 2007(I think). There are a few other things like the SpinCount=50,000 mentioned in the documentation but as per the XML you can see it's 100,000. There are a few other slight changes made for the latest version (2013), like the final iteration of the hash in the key generation previously used SHA(H(max),0) but instead of 0 there is a specific array of bytes (0xfe, 0xa7, 0xd2, 0x76, 0x3b, 0x4b, 0x9e, and 0x79.). Strangely enough these made their way into the article.

I'm trying to put together what sections are up to date and what are not, I'm relying on two references

http://www.cjmorgan.org/tech-blog/2015/1/8/default-encryption-settings-and-behaviors-for-onenote-2013-office-365

and a PDF by Yoshinori Takesako. Unfortunately both are missing slight but important points for password verification. I've also found this

http://source.dussan.org/raw/mirrors/poi.git/trunk/src/java/org/apache/poi/poifs/crypt/

The source code from the crypto library is either helpful or information overload when mixed with all other links.
 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have the section of my project related to this thread finished. It does everything I think it should but I can't seem to get a match in the output of the two values that confirm the correct password. If I were to post my code here would anyone mind having a look to see if they can spot a glaring problem in the code? There's a little bug in there somewhere I reckon but I can't find it.

 
Stephan van Hulst
Saloon Keeper
Posts: 7993
143
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Krispin, I deleted the Apache CryptoFunctions class from your post. Its source and documentation are readily available online.

I don't know the specification and don't know exactly what you're trying to prove in your main class. You have a bunch of magical constants that you pulled from somewhere, and it's not clear what you're doing with them or why.

Furthermore, you're instantiating instances of CryptoFunctions, SHA512 and Hex, while these are utility classes and you can call methods on the classes directly.

You're also not supposed to pass 0 as the cipher mode, but Cipher.DECRYPT_MODE.
 
Krispin Kilmurray
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I did a poor job of explaining anything. The constants were taken from my encrypted documented that I'm trying to open by verifying the hard coded password. The salt, hashValue and hashInput are decoded from base64 then used to create a key with a function from the crypto class. The decoded salt and blockKey are use to create an IV key. Both these byte arrays (IV and key) are used to create a cipher which in turn is used to decrypt the decoded hashInput and hashValue values. The input is then hashed and they are compared. If the password was correct they should match.

The Hex and SHA512 classes were my own which is why I included them. I instantiated the cryptoClass as I had initially found the code online before discovering apache.poi library whivh they are in. The class asked for an int in the getCipher parameter which corresponds to Cipher.DECRYPT_MODE or Cipher.ENCRYPT_MODE. I didn't know what int corresponds to what so I edited the getCipher method in the cryptoClass to just use Cipher.DECRYPT_MODE regardless of what int was passed in.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!