posted 17 years ago
A PDF is a description of how to render a document on a page. Things like "draw a vertical line here", "write 'foo bar baz' here in Courier". It does not contain any information about the format or organisation of the stuff it is rendering. You won't be able to tell that you're looking at a table, or a list of bullet points, or a paragraph, or anything like that.
The PDF format does contain information on a page-by-page basis. Therefore, page breaks are the one piece of format/organisation information that you can find.
If you want anything more than a raw stream of completely unformatted, disorganised text, one per page, you are out of luck. It's virtually impossible.
Betty Rubble? Well, I would go with Betty... but I'd be thinking of Wilma.