• Post Reply Bookmark Topic Watch Topic
  • New Topic

apache POI - HWPF search & replace

 
Yahya Elyasse
Ranch Hand
Posts: 510
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I'm looking for a way to find and replace strings in a MS word file.
Actually I was not able to find the right method to do that in the apache poi library api. Could anyone help me?

Basicly my application take a word file in input (template) and replace
defined keys (like "${AMOUNT}") with values. To do that, I iterate through
the paragraphs and look for my pattern. Then I need some methods like
deleteString(start, begin) and insertAt(offset, text).

Or is that another way to do the same thing?


thanks

othman.
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI can't do this. It might be possible by saving the file as RTF, and doing a search-and-replace on that, but that would be fragile. The OpenOffice Java API might be able to do this, though.
 
Yahya Elyasse
Ranch Hand
Posts: 510
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks for reply

it would be nice from your part if you show me some sample code about saving as RTF and find /replace:just to get me started. should i after modifying rtf saving again as MS WORD DOC ?


does OpenOffice Java API implies a big engine installation and configuration?
if so that is not good for my application and the rtf solution will be my last chance.
can you please help ?

thanks.
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by othman El Moulat:

does OpenOffice Java API implies a big engine installation and configuration?


OpenOffice has a fairly large footprint (~250MB on my WinXP desktop, roughly the same size as MS Office). It also does not run in-process. One starts up OO, then programmatically connects to it using a CORBA-like network object model. This is fine for server-side purposes, but if you are writing a desktop application it just won't work. To be fair, OpenOffice is a full-featured office suite, not a Word-specific Java API like POI.
The fact of the matter is Microsoft products do not play well with others. Your options include:
- switch to MS technologies, at least for your code that interfaces with Word files.
- switch your output to a more vendor-neutral format (html, xml, rft, plain text) that are easier to work with from Java.
- Consider something like installing OO on a server and writing a web service to interface with it.
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
it would be nice from your part if you show me some sample code about saving as RTF and find /replace:just to get me started. should i after modifying rtf saving again as MS WORD DOC ?


There is no Java API for converting a Word document to RTF and vice-versa (although OpenOffice can do that). But if you can rearrange the workflow so that you can work with RTF instead of DOC (e.g., by having the user save as the document as RTF), then search and replace becomes easy, because RTF files are just text files. After you're done, just save the file with an RTF extension, and if the user then double-clicks it, on most systems Word will be started to open it.
 
Yahya Elyasse
Ranch Hand
Posts: 510
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
let me explain more the problem i'm trying to solve:

build a 100% java library (no dll dependent) to manipulate microsoft word documents with only this method:


Given an InputStream with the bytes representing a .doc document (97/2000/XP), which may contain images return, an InputStream with the bytes representing a word document, which only differs from original by every occurrence of searchThis word replaced for replaceWith word.

Replacement shouldn't alter document format in any way (keeping fonts, margins, bold, ... everything).
offenoffice api is big; and apache poi can't do that currently . so I'm thinking of other approaches like converting the document to other format replace and come back to .doc : can someone give me ideas on how to achieve this in java ? (with some sample code examples)

also, since this library is going to be used within a J2EE development is it possible to do that with Java Script to do the replacement in the client side? (how ?)

Any Ideas would be greatly welcomed.

thanks.
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI HWPF is open source and there's pointers to the document format on the HWPF home page. Get the source and start reading!
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read the HWPF Quick Guide? It suggests that this might just be possible with the current API, although the document admits that it is buggy.

JavaScript is of no use in this context.

As stated before, if you want to avoid OO, then there is no way (to my knowledge) of converting DOC to RTF or vice-versa.
 
Yahya Elyasse
Ranch Hand
Posts: 510
Eclipse IDE Google Web Toolkit Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ok here is what i did with poi library : the replace is working but the format of modified document is altered : many newlines and pages are insetred in result document.

the steps i did :
1. get the cvs code of scratchpad related to hwfppackage.
2. add following method to class Range :


3. here is the method is use to do find and replace :


as i said the result document seems to write each paragraph (in original doc) into a whole page ...
can some one help me fix the issue ?
many thanks.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!