• Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing a non XML text document  RSS feed

 
Sudarshan Chakrabarty
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I need to parse the following content which I am getting after parsing a webpage using HtmlParser. I can store the content in a text document or a String object. I need to extract the words in bold and store it in some Value Objects i.e. basically I need the " Link to" and "titled" data.


I tried using StringTokenizer, Pattern etc but it's not working

Can someone please help me out?

[ November 28, 2008: Message edited by: Sudarshan Chakrabarty ]
[ November 28, 2008: Message edited by: Sudarshan Chakrabarty ]
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Sudarshan Chakrabarty:
I tried using StringTokenizer, Pattern etc but it's not working


In what way is it not working ? What results do you get and how do they differ from what you want ?
 
Sudarshan Chakrabarty
Ranch Hand
Posts: 38
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Joanne,
Thanks for the reply.
Using StringTokenizer doesn't work as my requirement is to be able to give both the start and end string, say if I can give "Link to " and ";" then I will get all data between them, which is what I need.
But StringTokenizer takes only one delimiter, so if I code something like
It's obviously not going to help me .

I would want to be able to select all data between
i) "Link to" and the next ";"
and
ii) "titled" and the next ";".
And so I would need to iterate through the whole content and store the above relevant data in some collection.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could try String.split() which uses a regular expression to split the string.

Or maybe just use String.indexOf and String.substring.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!