Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing an HTML file  RSS feed

 
Sean Casey
Ranch Hand
Posts: 625
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
HI,
I'm creating a banner applet that checks headlines from another url. My thinking is that I could use the applet as the front end and then use RMI (all on my own machine, so it'll be local) to query the url and retrieve the headlines. None of this seems too difficult. My problem is parsing the html file. From what I've looked at, it seems that HTMLEditorKit.ParserCallback from the Swing package is my best alternative. My only problem is I don't know how to override the handleStartTag() in order to tell the parser where to begin to retrieve the headlines. I understand that there may be an easier way to do this, as I haven't given it that much thought. If anyone can help me out, I'd appreciate it. Thanks.
Sean
 
Sean MacLean
author
Ranch Hand
Posts: 621
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Personally, since you're only looking for a couple of specific tags (I assume the Headlines are defined in a single tag and not just randomly placed), I'd write a custom parser to pull out info between the tags I was interested in. You could simply use a StringTokenizer but this would not be very scalable. A better option would be to write a highly optinised class using a StringBuffer and labeled loops. I have done exactly this for a project that (in some quick and dirty tests) turned out to be 100 to 1000x times faster than a 3rd party regex package that we had been using. Interestingly enough, Java1.4 (beta) will have some kick butt new I/O handling and built in regex - but, alas, I doubt you'll see browser support for this package for a while.
Sean
 
Sean Casey
Ranch Hand
Posts: 625
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sean,
That sounds like a good idea. I hadn't taken a look at the html source code of the site I was thinking of in particular, but it seems like it would be easier to do, as you pointed out. I'll give it a try. Thanks for your help.
 
Sean Casey
Ranch Hand
Posts: 625
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok,
I gave it a try. I did what you suggested using a bunch of loops. It seems to work pretty quickly. I was going to display the hyperlinks in an applet, and the user could click on the applet and go to that page, but (I just realized this) this probably would cause a security exception of some sort. I can't open another web page in an applet can I? And if not, using a servlet I could probably accomplish this right?
The reason I am doing all this is to be able to post headlines from major newspapers on my homepage.
Thanks for any more advice.
 
Sean MacLean
author
Ranch Hand
Posts: 621
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use a servlet to publish your home page when a user requests it. Then the servlet can go and fetch the headlines, include them in the html page as standard links and then write the page back to the client. This way, you won't even have to use the applet. Something like this,

This is the most simplistic approach but should be very effective for your home page.
Sean
[Fixed my "you = your" problem - I sounded like a gangster]
[This message has been edited by Sean MacLean (edited June 20, 2001).]
 
Sean Casey
Ranch Hand
Posts: 625
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your help.
I'll give it a try today.
-Sean
 
Sean Casey
Ranch Hand
Posts: 625
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I ran in to a bit of trouble. The servlet idea worked fine. The problem was with the links. I wanted to keep them intact so, someone could click on it and go to that site. However most of the links were relative and not absolute, so when clicked on they went to an error page. Instead I'm creating a custom parser that reads the Headline between the and tags and then creating a link if somebody wants to go read that particular headline. It's a bit more work than I wanted, but it seems to be the safest idea.
Thanks for your help though.
-Sean
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!