I want to write an apllication, which automatically browse for me several html pages on the net, parse them and extract some data from them. In java I use
url = new URL(sURL); conn = url.openConnection(); rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String sPage = streamToString(rd); // my method, rd.readLine()'s until eof ...
The html page is dynamically generated, the URL is combination of some input parametres. Imagine TV Guide.
My problem is that this URL works well (instantnly) only in Mozilla browser. In IE and in my application it sometimes returns instantly, sometimes it takes longer, but sometimes the search page returs incomplete data after longer while. Is it anything in above code which can be improved, so the returned page will be always complete and will be returned always quickly? Thanks.
How in the world are we supposed to comment on this when all the work is being done here:
If, in fact, you are reading line by line and constructing a giant String - yes that could well be pretty slow. Why do you want the page turned into a giant String? If you are only extracting certain bits, much better to recognize the bits early and discard the extraneous text.
What do the comments about browsers have to do with your application?
The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
posted 12 years ago
Ok, if you are interested in my method, here it is. I am constructing huge (it's not so huge, about 40 KBytes) string, because is hard to find some html tags which start on one line and end on the other line and are interrupted by crlf or tab. String can be better searched.
I know, I should use StringBuffer instead of String, so for testing purposes I commented line
but even then the processing time of this metod took several minutes. It has nothing to do with constructing the string. And why I mentioned Mozilla (no, apllication is not an applet)? Because there the response never lasted longer than 4 seconds, even if it was run concurrently (!) with running (but waiting) java application.
P.S.: I can give you whole url (maybe day variable should be modified in future). You can try yourself.
WOW! I tried it repeatedly without any flaws. And it's so quick !
Author and all-around good cowpoke
posted 12 years ago
then the processing time of this metod took several minutes.
That is really mysterious. Just reading the input stream should NOT "take several minutes." I'm glad you found a solution with HttpClient - but that makes the problems with your original code even more mysterious.
Don't count your weasels before they've popped. And now for a mulberry bush related tiny ad:
how do I do my own kindle-like thing - without amazon