Forums Register Login

Getting tagged content (headings) from rich text files

+Pie Number of slices to send: Send
I have rich text files (Word docs saved as rtf) that are structured using heading styles for a table of contents. I need code to get the text that's tagged as the first "heading 1" in each file.

I've downloaded a description of RTF from wotsit.org, but haven't really dug into it yet.

I took a quick pass at some Java code that basically finds the second occurrence of the literal "s1\ql" (the first of these is in the definition of the heading, and the second is the actual application of that heading), then finds the first left-brace following this. That point usually marks the beginning of the first heading 1 text. The ending of this text is usually marked by the literal "\par". This works about 90% of the time, but I haven't found a consistent pattern in the remaining 10%.

So if anyone has done this before, maybe you can offer some clues on how to work with headings in rich text.
[ May 14, 2007: Message edited by: marc weber ]
+Pie Number of slices to send: Send
I think I have a solution. It's designed for a rather specific need, but if anyone's interested, here's the quick and dirty logic. (Note: An additional requirement is it must work using Java 1.3, since it will run as a Lotus Notes agent. So, among other things, regex Patterns can't be used.)
I got this tall by not having enough crisco in my diet as a kid. This ad looks like it had plenty of shortening:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 819 times.
Similar Threads
Help required for identifying classes
Writing RTF files from Java Swing DefaultStyledDocuments
issue in converting "\n" to a new line in the .rtf file using java code.
RTF to Text with CJK characters
i need code
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 02:34:51.