Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Problem reading in all contents of a webpage using url.openstream()

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator
I am trying to read the all the contents of an html page. Problem is that I only seeem to be able to get the <html></html> tags and the content between them. Nothing else. I am using this code to read from a URL:

BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));

Do I need to do something different to read all the contents of an html file stored on a Web server?
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator

Originally posted by Hank Haroldson:
I only seeem to be able to get the <html></html> tags and the content between them. Nothing else.



What else do you expect?
 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator
If their is info before the html tag, it does not get it. However, my bigger problem is that is seems to skip info in the html file. It is bizarre, it just skips a few lines and also seems to cut the file short.
 
Joe Ess
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator
If it is info outside of the document type declaration allowed by the HTML Specification, I'm not surprized that something, be it your web server or Java, is ignoring it. What shows up when you load the page with a web browser and do a "view source"?
As for the stuff within the HTML tags, perhaps you can share a small example of java code and an HTML file which exhibits the behavior you are seeing.
 
Hank Haroldson
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator
Hi Joe,
Would saving the file to my hard drive and then reading it in as a plain text file work better? If so, how would I save a file from a url?
Thanks
 
Joe Ess
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Report post to moderator
HTML is plain text. It just has a particular sequence of characters.
I think you are going to have to show us a test case.

I see that you have given us some code in this post. I'm going to close this post to avoid confusion.
[ January 04, 2006: Message edited by: Joe Ess ]
    Bookmark Topic Watch Topic
  • New Topic