I may be wrong but this post on the POI forum seems to confirm my view: http://apache-poi.1045710.n5.nabble.com/Using-text-find-the-page-number-in-word-document-td5710448.html
but is there anyway to know that we have come to end of a page.
or any way to know that the page is changing.
i am not supposed to read entire file instead i am given a page number.
This sounds like a really strange requirement; what is the point of it?
we use while((in.read())!=-1) to read till end of file.
but is there any logic to check control has come to an end of page?
But if, as you say, you're just reading the raw bytes from the .doc file, you don't have any hope of finding out any of those things. You're just reading the document text and the document formatting and other control information as uninterpreted bytes. You can't find out anything at all about the document that way except how many bytes it took Word to store it on disk.
@paul you mean there is no way to know where page break happened in DOC file..?is there any way to use form feed or something.
I am not supposed to convert it to PDF.
Where are all these strange requirements coming from? It sounds like the requirements contain details of the technical implementation, where that kind of thing has no place.
Gajendra Kangokar wrote:@paul you mean there is no way to know where page break happened in DOC file..?is there any way to use form feed or something.
Form feed? No, it's not nearly that simple. In fact Word is probably a thousand times as complicated as just throwing in a form-feed character. I'm guessing you haven't actually used Word yourself much?
If you really have to address the requirement of extracting a page from a Word document, you at least have to start by accessing it via Apache POI's Word components, or else Aspose's software which allows you to access Word documents. And then prepare yourself for a long stretch where you learn how to use those things. Last time I looked at accessing Word (from Visual Basic over a decade ago) there were about 500 different types in its data model. I'm sure that the number is closer to 1,000 by now. It isn't simple and you shouldn't expect a simple solution.