• Post Reply Bookmark Topic Watch Topic
  • New Topic

Text Files and word processing software  RSS feed

 
pandu chinnu
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a question. Why is it simple for a word processing software (one that reads number of words, words greater that x characters etc) to deal with text file(.txt) and not with a doc file (.doc) file??

Is there any special requirements or what are the points taken into consideration while developing the software for a .doc file?

Thanks in Advance
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
DOC is a very complicated file format, for which there is no official documentation. In addition to the raw text, DOC files contain styling and layout information, possibly other objects like images, and other stuff. If you look at the javadocs of the one Java library that even attempts to deal with it (Apache POI), you'll see everything that's involved. It's very tricky stuff.
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As Ulf said, it's complicated. I recently wrote a program that worked its way through about 7,000 Word files (scattered among 1,500 directories) and renamed each file based on the first occurrence of the "Heading 1" style. I needed lots of special cases to deal with things like tables of contents and general oddities that I couldn't explain, and some files still fell through to be handled manually. (So the program also needed to detect when something wasn't quite right with what it "thought" was the first Heading 1.)

You can get an idea of what's involved by opening a .doc file in a simple text editor like Notepad. This should allow you to see all of the "background" information like fonts, styles, templates, and markup around the text itself.
 
Nicholas Jordan
Ranch Hand
Posts: 1282
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by pandu chinnu:
Is there any special requirements or what are the points taken into consideration while developing the software for a .doc file?


It's like the other posters state. Hard to explain why (without causing disagreement) but stick with simple textual representation. I sat down to write an editor last night and the first design consideration was to look for pre-written classes that stick to straight text like wet sticks to water.

Try what marc and Ulf state, you will find fast the fist of futility.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!