• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

how to read a particular page from a DOC file

 
Ranch Hand
Posts: 94
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello all
i have .DOC file but i am not supposed to read entire file instead i am given a page number.
therefore i got to read only that particular page from the doc file.
I am using apache.poi api.



thank you.
 
Bartender
Posts: 3323
86
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm not sure that you can easily do this as I believe doc files don't store pages number information. The page number is calculated using information such as content, font, page size etc.

I may be wrong but this post on the POI forum seems to confirm my view: http://apache-poi.1045710.n5.nabble.com/Using-text-find-the-page-number-in-word-document-td5710448.html
 
Gajendra Kangokar
Ranch Hand
Posts: 94
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ok the doc file do not store page numbers.
but is there anyway to know that we have come to end of a page.
or any way to know that the page is changing.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think it is possible to know page numbers before the entire file has been read, for the reasons Tony mentioned.

i am not supposed to read entire file instead i am given a page number.


This sounds like a really strange requirement; what is the point of it?
 
Gajendra Kangokar
Ranch Hand
Posts: 94
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I just want to count number of pages in a doc file.
we use while((in.read())!=-1) to read till end of file.
but is there any logic to check control has come to an end of page?
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK, so that requirement doesn't actually exist; that's good. You could use a library like JODConverter (which relies on running OpenOffice in server mode) to convert the document to PDF - PDFs are fixed in layout, and libraries like PDFBox can tell you the number of pages.
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Basically a Word document doesn't have pages at all. When you see it displayed in Word it may appear to have pages, but that's because it's using the default page layout information to paginate the document. If you click on the Page Layout tab you'll see all the things you can change -- margins, page orientation, page size, columns, and more -- and which will affect the pagination. And as already pointed out, there are many other things which affect the pagination.

But if, as you say, you're just reading the raw bytes from the .doc file, you don't have any hope of finding out any of those things. You're just reading the document text and the document formatting and other control information as uninterpreted bytes. You can't find out anything at all about the document that way except how many bytes it took Word to store it on disk.
 
Gajendra Kangokar
Ranch Hand
Posts: 94
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am not supposed to convert it to PDF.

@paul you mean there is no way to know where page break happened in DOC file..?is there any way to use form feed or something.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, that's what Tony and Paul and myself have been saying.

I am not supposed to convert it to PDF.


Where are all these strange requirements coming from? It sounds like the requirements contain details of the technical implementation, where that kind of thing has no place.
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Gajendra Kangokar wrote:@paul you mean there is no way to know where page break happened in DOC file..?is there any way to use form feed or something.



Form feed? No, it's not nearly that simple. In fact Word is probably a thousand times as complicated as just throwing in a form-feed character. I'm guessing you haven't actually used Word yourself much?

If you really have to address the requirement of extracting a page from a Word document, you at least have to start by accessing it via Apache POI's Word components, or else Aspose's software which allows you to access Word documents. And then prepare yourself for a long stretch where you learn how to use those things. Last time I looked at accessing Word (from Visual Basic over a decade ago) there were about 500 different types in its data model. I'm sure that the number is closer to 1,000 by now. It isn't simple and you shouldn't expect a simple solution.
 
Gajendra Kangokar
Ranch Hand
Posts: 94
1
Eclipse IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
yes i am using Apache POI and thank you,will try with Aspose software also.
 
reply
    Bookmark Topic Watch Topic
  • New Topic