This week's giveaway is in the JDBC forum.
We're giving away four copies of Java Database Connections & Transactions (e-book only) and have Marco Behler on-line!
See this thread for details.
Win a copy of Java Database Connections & Transactions (e-book only) this week in the JDBC forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
Bartenders:
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

Extracting data from dynamic HTML page - slow, incomplete  RSS feed

 
Ranch Hand
Posts: 86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to write an apllication, which automatically browse for me several html pages on the net, parse them and extract some data from them. In java I use

url = new URL(sURL);
conn = url.openConnection();
rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));

String sPage = streamToString(rd); // my method, rd.readLine()'s until eof
...

The html page is dynamically generated, the URL is combination of some input parametres. Imagine TV Guide.

My problem is that this URL works well (instantnly) only in Mozilla browser. In IE and in my application it sometimes returns instantly, sometimes it takes longer, but sometimes the search page returs incomplete data after longer while. Is it anything in above code which can be improved, so the returned page will be always complete and
will be returned always quickly? Thanks.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How in the world are we supposed to comment on this when all the work is being done here:


If, in fact, you are reading line by line and constructing a giant String - yes that could well be pretty slow. Why do you want the page turned into a giant String? If you are only extracting certain bits, much better to recognize the bits early and discard the extraneous text.

What do the comments about browsers have to do with your application?
 
Ranch Hand
Posts: 1970
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by William Brogden:
What do the comments about browsers have to do with your application?



Is it an applet?
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
See whether using http://jakarta.apache.org/commons/httpclient/ helps.
 
Jiri Nejedly
Ranch Hand
Posts: 86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, if you are interested in my method, here it is. I am constructing huge (it's not so huge, about 40 KBytes) string, because is hard to find some html tags which start on one line and end on the other line and are interrupted by crlf or tab. String can be better searched.

private String streamToString (BufferedReader br) {
String x=null;
String result="";
try{
x=br.readLine();
while(x!=null){
result+=x;
x=br.readLine();
}
}
catch(IOException exc){
return null;
}
return result;
}


I know, I should use StringBuffer instead of String, so for testing purposes I commented line

result+=x;

but even then the processing time of this metod took several minutes. It has nothing to do with constructing the string. And why I mentioned Mozilla (no, apllication is not an applet)? Because there the response never lasted longer than 4 seconds, even if it was run concurrently (!) with running (but waiting) java application.

P.S.: I can give you whole url (maybe day variable should be modified in future). You can try yourself.

http://www.canalsat.fr/index.php?recherche=1&pid=194&tpl=78&page=1&Search_text2=&genre=FILM&day=18-09-06&sgenre=&version=&tranche=&diffusion=&CHAINE_ID_last_selected=1&CHAINE_ID%5B%5D=1
 
Jiri Nejedly
Ranch Hand
Posts: 86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Originally posted by Ilja Preuss:
See whether using http://jakarta.apache.org/commons/httpclient/ helps.



WOW! I tried it repeatedly without any flaws. And it's so quick !

Many thanks!
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

then the processing time of this metod took several minutes.



That is really mysterious. Just reading the input stream should NOT "take several minutes."
I'm glad you found a solution with HttpClient - but that makes the problems with your original code even more mysterious.

Bill
 
Don't count your weasels before they've popped. And now for a mulberry bush related tiny ad:
how do I do my own kindle-like thing - without amazon
https://coderanch.com/t/711421/engineering/kindle-amazon
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!