Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading in HTML file, having trouble finding tags  RSS feed

 
Chris Blanchard
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I'm reading some pages from the BBC.co.uk web site, all of the news articles have the tags <!-- S BO --> and then later on <!-- E BO --> to indicate where the main body of text is. I'm trying to locate the first tag, and then add everything I can find into a string until I locate the second tag.

This code I have pasted in is meant to find the first tag and print out confirmation when it does. The problem is that a lot of time it cannot find the first tag.

For instance, running it on this link, it can't find the tag:

http://news.bbc.co.uk/sport1/hi/football/teams/t/tottenham_hotspur/6225089.stm

even though if I go edit->view source in my browser, and do edit->find, I can see the tag I'm looking for

However if I run it on this article, it finds the tag:
http://news.bbc.co.uk/1/hi/education/6224801.stm




I cannot see what is going wrong with my program at all, if anyone can offer any advice then thanks a lot!
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could it be a timing problem ? If nothing has been returned from the server the first time you call in.ready(), it may never go into the while loop. I ran it on both URLs and neither of them worked. I then put a sleep in before the while loop and they both worked.
Try putting some debug statements in to see exactly what it is doing.


BTW - I believe it will be more efficient to use a StringBuffer/StringBuilder to build up your string rather than continuously concatenating.
[ January 02, 2007: Message edited by: Joanne Neal ]
 
Chris Blanchard
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hey, thanks for the quick reply.

Where abouts would the sleep function go, because the first time I call
in.ready() is within the while loop. (I'll probably use an empty for loop for the time being)

Also, is putting in a sleep function an ideal solution? Or is it a bit of a hack that'll work but isn't necessarily ideal?

I'll have a look into the StringBuffer/StringBuilder later on as well.

Cheers.
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, it is definitely a hack. I only put it in to see if that was what the problem was.

You could just loop until in.ready() returns true. You may have to put a timeout into the loop as well though in case you never get a response.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You shouldn't use ready() at all, as it can return false anytime in the middle of the page.

Instead, check the return value of read() the -1 value, which indicates the end of the stream.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I believe Ilja has identified the problem.

You can see a discussion of a very similar issue here:

AvailableDoesntDoWhatYouThinkItDoes

The available() method works much like ready(), in the sense that avilalable() can return 0, or ready() return false, when there simply isn't any data available immediately - but if you wait a millisecond or two, maybe there will be. It's not a reliable way to detect the end of a file.
 
Chris Blanchard
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah thanks that seems to have fixed it, putting in a sleep function did make it more reliable but testing for -1 fixed it.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!