• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

How to read and modify MS word using Java

 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I am trying to replace a word in a MS word file by another word and then save the modified version. Here is how I am doing it now:

I read the word file as bytes and construct the string by bytes and replace the word. Then I write the string to another file. The word can be found and replaced. But when I write the string back to a word file, MS word can't recognize it. It seems the file is corrupted.

I just replaced a word. I know I initially read in binaries. But I write them back as well. I don't know why MS word can't display it.

Please help. Thanks in advance.

--
Simon
 
Rancher
Posts: 43081
77
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can't treat binary files (like DOC and DOCX) as if they were text files - simple search-and-replace does not work on them. You'll need to resort to a specialized library that knows about the DOC/DOCX file formats, like Apache POI. See http://poi.apache.org/hwpf/quick-guide.html and https://coderanch.com/how-to/java/CreateWordDocument for more information and examples.
 
puff li
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for reply. Can you tell why we can't treat binaries as string. It is corrupted. But it's there. Why can't we use replace? I know POI but find it complex...
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Binary files (actually, structured file formats in general) have a lot of additional information that can get corrupted if you insert or delete characters. A simple example: Say you have a DOC file containing nothing but "This is a test sentence.". The word "test" is in bold. Now, somewhere in this file it has the information "characters 11 to 14 are in bold". If you just replace the word "a" by "another", then "11 to 14" is no longer accurate - the file contents are corrupted.

For some insight into how complex the Office file formats really are, read Why are the Microsoft Office file formats so complicated?
 
puff li
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Got it. Thank you.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic