I am trying to read the all the contents of an html page. Problem is that I only seeem to be able to get the <html></html> tags and the content between them. Nothing else. I am using this code to read from a URL:
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
Do I need to do something different to read all the contents of an html file stored on a Web server?
If their is info before the html tag, it does not get it. However, my bigger problem is that is seems to skip info in the html file. It is bizarre, it just skips a few lines and also seems to cut the file short.
If it is info outside of the document type declaration allowed by the HTML Specification, I'm not surprized that something, be it your web server or Java, is ignoring it. What shows up when you load the page with a web browser and do a "view source"? As for the stuff within the HTML tags, perhaps you can share a small example of java code and an HTML file which exhibits the behavior you are seeing.