• Post Reply Bookmark Topic Watch Topic
  • New Topic

When Reading a Webpage using an InputStream, some tags and lines are skipped.. why?  RSS feed

 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When Reading a Webpage (from a URL) using an InputStream, some tags and lines are skipped.. why? My code is below. To test this out, try using http://www.braille.org as the URL. You will notice it totally skips the TITLE tags and even the BODY tags as well as a few others. Why is this happening? Any workaround for this?

try{

URL url = new URL(address);

//read data from supplied url
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));

while ((in.readLine()) != null){
input.append(in.readLine());
}

in.close();

//write test
BufferedWriter out = new BufferedWriter(new FileWriter("output.htm"));
out.write(input.toString());
out.close();

} catch (MalformedURLException me) {
System.out.println("MalformedURLException: " + me);
} catch (IOException ioe) {
System.out.println("IOException: " + ioe);
}
 
Joe Ess
Bartender
Posts: 9429
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Look really closely at the following two lines of code. Can you see why the buffer input will contain every other line from the in stream:



 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!