You would need to download the source code for Tika, alter it so it works the way you envision, and then build it yourself using
Maven. That's not rocket science, but not entirely trivial, either. For ODF files I have mentioned which method in which class you need to patch; that's actually straightforward if you look at the source code. For Microsoft formats the relevant files seem to be org.apache.tika.parser.microsoft.ooxml.XWPFWordExtractorDecorator and org.apache.tika.parser.microsoft.WordExtractor; those don't look too hard to patch, either. I haven't looked at PDF in detail, but nothing jumps out that screams "header" or "footer", so you may have to do a bit of digging around.