Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

conversion problem (charset ?)

 
pascal monfils
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I need to get a word document from a harddrive and insert the document to a blob column in an Oracle database.
Once the doc is in the DB, the users can open it using winword. To achieve the gaol, based ont the framework i must use, we retrieve the blob, write it to the disk and then open it with winword vua the runtime.exec() feature.

Upload and download of the document is OK (same size).
This works fine with simple text document but it doesn't work with complex word docuement.

After comparing the original and produced (by opening both document in notepad) we noticed that some characters are different ... for example the euro symbol in the original file is replaced by a questionmark (?) in the one produced from the blob.

Based on this, I suspect a problem regarding the charset used.
Locale.setDefault(Locale.UK) is set at the beginning of the applciation.

Do you think using a charset decoder/encoder can help ?
ANy suggestion welcome.

Thanks in advance for your help.

Pascal
 
Stefan Wagner
Ranch Hand
Posts: 1923
Linux Postgres Database Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I didn't work with BLOBs till now, but from the name - binary large object - I would expect a database to save bytes as they are, and not to translate something.

How do you save and restore it?
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 35279
384
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pascal,
We had this problem with CLOBs. The wrong encoding was set on the db server.

I'm surprised you are getting it with BLOBs as that is just data.
 
pascal monfils
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In fact, we work in a strong typed environment.
All the data is passed from a client to the server thru "bastypes" (typed objects).
The only way to get the data contained in it is to get some kind of representation wich is a string.

What i do is :
gets the string from the basetype
gets the bytes from that string
insert a record in the db using the empty_blob() function
"select ... for update" the created record
get the blob object from the resultset
create a byteinputstream with the byte[] from the string
get an outputstream on the blob
push the bytes in the blob (intream/outstream)
close the streams
commit

those base types are available on the server and on the client.
On the client, if I get the data in a basetype (open file) and then write the content of the bastype on disk under a different name, both files are strictly the same.

Regarding the database, the db is accessed by an application written in PB too and the import of documents in the blob column works fine.

I suspect the JDBC layer to use some king of locale or charset defined somewhere and to make conversion on the string part not on he bytes ...
 
pascal monfils
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
some new info ...

I identified the bytes wich are diff�rent ...
Only 75 bad for a file size of 52220 bytes !
Only 5 different bytes values identified in these 75 bytes.(-112, -115, -127, -113, -99) .

In each case, those bytes are replaced by the 63 value !

Really doesn't understand what happens !!!

Help still greatly needed
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic