I am trying to replace a word in a MS word file by another word and then save the modified version. Here is how I am doing it now:
I read the word file as bytes and construct the string by bytes and replace the word. Then I write the string to another file. The word can be found and replaced. But when I write the string back to a word file, MS word can't recognize it. It seems the file is corrupted.
I just replaced a word. I know I initially read in binaries. But I write them back as well. I don't know why MS word can't display it.
Thanks for reply. Can you tell why we can't treat binaries as string. It is corrupted. But it's there. Why can't we use replace? I know POI but find it complex...
posted 8 years ago
Binary files (actually, structured file formats in general) have a lot of additional information that can get corrupted if you insert or delete characters. A simple example: Say you have a DOC file containing nothing but "This is a test sentence.". The word "test" is in bold. Now, somewhere in this file it has the information "characters 11 to 14 are in bold". If you just replace the word "a" by "another", then "11 to 14" is no longer accurate - the file contents are corrupted.