This week's book giveaway is in the Reactive Progamming forum. We're giving away four copies of Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams and have Adam Davis on-line! See this thread for details.
In this article I can see merging but I really need to take a multipage PDF file(parent) and insert a PDF file(child) after page 1 of the parent file. These are PDF files containing text, graphics, and images.
I just spent a minute looking at the PDFBox documentation -- I was curious to see if it was possible to extract a particular page from a PDDocument. Your use of the getNumberOfPages() method in your other parallel post suggested to me that it might be possible. Turns out it is possible.
The Linux platform has a whole raft of PDF creation and manipulation tools. Working with text processors is one of my specialties so I use many of them on a fairly frequent basis. You can insert, delete and re-arrange pages, split and merge documents, add/remove document properties - all sorts of stuff.
Getting a DOC file into PDF is a bit more complex, and my usual go-to on that is headless LibreOffice. Do realize, however, that there is a fundamental difference between DOC and PDF. Programs like MS-Word and LibreOffice Writer are word processor programs. They allow creating and editing of formatted text. PDF's however are basically typeset documents.
What's the difference? Well, a typeset document - which originally was something you'd create with a page layout program such as Aldus PageMaker - has every element on the page nailed down to a very specific position (and in the case of text, a very specific font). Word processors are more free-form and they work with the constraints of the system that they are running on.
What this means is that a PDF will render exactly the same on any system anywhere. Word documents, however, will not. This is a common complaint by the ignorant about open-source word processors such as LibreOffice. Text imported from Word doesn't look the same, page breaks may move, etc. In actuality, this self-same problem occurs when moving from one MS-Word system to another as well. The font metrics used for page layout are actually obtained from the currently-selected printer driver, thus the page layout will vary based on that the printer driver tells it.
It's not as noticeable these days, since most of us use soft fonts. Back in the previous millennium, however, soft fonts were mostly on Mac systems, and HP printers (as a popular example) came with a half-dozen hard fonts. Which usually weren't even scalable. The office I worked in in the late 90's, in fact, had 2 HP laserjets, each with its own unique set of fonts and documents would constantly re-arrange themselves as people passed them back and forth.
So when you send a Word document to a PDF converter, the only way to get accuracy relative to the original source is if the original user's computer has a "print-to-PDF" printer driver installed and they produce the PDF themselves. Otherwise, count on a certain amount of re-structuring.
You can, however, reduce the disappointments. First and foremost, ensure that your users understand that MS-Word is NOT a typewriter. Avoid hard spacing and using ENTER to produce blank lines. Use tabs and styles instead. Make sure that there's a good match between the document creator/editor machines and the PDF converter machine. Linux can handle TrueType™ fonts these days and there's a common MS-Windows font core package that can be installed.