• Post Reply Bookmark Topic Watch Topic
  • New Topic

Replace complete pdf text with any text preserving styling, format & images.  RSS feed

krishnann ravi
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello people!

In my project, I have to identify all text contained within a pdf, and replace it with any other text. Actually I want the replacement text to be meaningful, but since that seems too tedious, I'm thinking of cutting down my project to just *any* text.

The format & styling ( including images ) of the output pdf should be preserved, and the text should not overflow over the images. I have considered PDF manipulation libraries from iText & Apache PDFBox so far.

In Apache PdfBox, there's a program called "ReplaceString", but it needs a specific "string to replace" and a specific "replacement string". The problem here is that since I need to replace all the words of the pdf with *any* text, so a single string replacement doesn't serve the purpose.

Here is the approached I have thought of:

Something which reads every word, counts the number of characters in the word, and replaces it on-the-spot with *any* same character count word. Maybe we can use a test condition for character count from 1 to 15.

My deadline is approaching, and I have not been able to do much because of being off track.

It would be great if someone could guide me as to how I should approach this, and if a similar work has been done in the past which I could use and build on.

Thanks very much!


Ravi Krishnan
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!