I just want to count number of pages in a doc file.
we use while((in.read())!=-1) to read till end of file.
but is there any logic to check control has come to an end of page?
posted 5 years ago
OK, so that requirement doesn't actually exist; that's good. You could use a library like JODConverter (which relies on running OpenOffice in server mode) to convert the document to PDF - PDFs are fixed in layout, and libraries like PDFBox can tell you the number of pages.
Basically a Word document doesn't have pages at all. When you see it displayed in Word it may appear to have pages, but that's because it's using the default page layout information to paginate the document. If you click on the Page Layout tab you'll see all the things you can change -- margins, page orientation, page size, columns, and more -- and which will affect the pagination. And as already pointed out, there are many other things which affect the pagination.
But if, as you say, you're just reading the raw bytes from the .doc file, you don't have any hope of finding out any of those things. You're just reading the document text and the document formatting and other control information as uninterpreted bytes. You can't find out anything at all about the document that way except how many bytes it took Word to store it on disk.
Gajendra Kangokar wrote:@paul you mean there is no way to know where page break happened in DOC file..?is there any way to use form feed or something.
Form feed? No, it's not nearly that simple. In fact Word is probably a thousand times as complicated as just throwing in a form-feed character. I'm guessing you haven't actually used Word yourself much?
If you really have to address the requirement of extracting a page from a Word document, you at least have to start by accessing it via Apache POI's Word components, or else Aspose's software which allows you to access Word documents. And then prepare yourself for a long stretch where you learn how to use those things. Last time I looked at accessing Word (from Visual Basic over a decade ago) there were about 500 different types in its data model. I'm sure that the number is closer to 1,000 by now. It isn't simple and you shouldn't expect a simple solution.