Win a copy of Murach's Python Programming this week in the Jython/Python forum!
    Bookmark Topic Watch Topic
  • New Topic

Problem reading in all contents of a webpage using url.openstream()  RSS feed

 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Report post to moderator
I am trying to read the all the contents of an html page. Problem is that I only seeem to be able to get the <html></html> tags and the content between them. Nothing else. I am using this code to read from a URL:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

Do I need to do something different to read all the contents of an html file stored on a Web server?
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Report post to moderator
Originally posted by Hank Haroldson:
I only seeem to be able to get the <html></html> tags and the content between them. Nothing else.


What else do you expect?
 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Report post to moderator
If their is info before the html tag, it does not get it. However, my bigger problem is that is seems to skip info in the html file. It is bizarre, it just skips a few lines and also seems to cut the file short.
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Report post to moderator
If it is info outside of the document type declaration allowed by the HTML Specification, I'm not surprized that something, be it your web server or Java, is ignoring it. What shows up when you load the page with a web browser and do a "view source"?
As for the stuff within the HTML tags, perhaps you can share a small example of java code and an HTML file which exhibits the behavior you are seeing.
 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Report post to moderator
Hi Joe,
Would saving the file to my hard drive and then reading it in as a plain text file work better? If so, how would I save a file from a url?
Thanks
 
Joe Ess
Bartender
Posts: 9406
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Report post to moderator
HTML is plain text. It just has a particular sequence of characters.
I think you are going to have to show us a test case.

I see that you have given us some code in this post. I'm going to close this post to avoid confusion.
[ January 04, 2006: Message edited by: Joe Ess ]
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    Bookmark Topic Watch Topic
  • New Topic
Boost this thread!