Win a copy of Terraform in Action this week in the Cloud forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

how can i read HTML file in java

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi ,
I want to make and application that read the HTML file from
from website and want to take some action depending upon the information present in that HTML file(suppose in HTML file somewhere if ARR word present then i want to send a mail in respective person which is related to that word ARR

please help me

How can i developed the things in JAVA

THANKS
 
Ranch Hand
Posts: 8944
Firefox Browser Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is same as reading a file.
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Reading from a webpage is quite simple in Java; you can use class java.net.URL, call method openStream() on the URL object and read the HTML page from it. Then you need to look for "ARR" or whatever else you want to look for in the returned HTML page. You can do that with the methods in class String.
 
ranger
Posts: 17346
11
Mac IntelliJ IDE Spring
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am going to move this to the Sockets and Internet Protocol forum. This forum is for Servlets questions directly. Since you aren't creating a Servlet, that is why I am moving it.

Mark
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look into URL as mentioned above or maybe HttpURLConnection or even the Apache HTTPClient package to pull HTML from web sites.

If you need to get complex stuff out of the HTML - more than just the contents of a tag or two with simple string manipulation - look into HTML parsers. I use the Quiotix parser described and linked HERE.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello all, longtime reader with a quick question:

I'm hoping to extract data from a website (namely ncbi) and analyze data after several queries. Has anyone compared the performance of apache's httpclient to say, HTTPUrlConnection?

I'll try to do the same, but I wonder if there are any server/platform vagaries.

Matt
[ October 06, 2005: Message edited by: Matt Chambers ]
 
Stan James
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your program will almost certainly be faster than the network so you'll spend more time waiting for data over the wire than waiting for your instructions to run. I'd guess HttpClient is "fast enough".
 
Matt Chambers
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Stan! (sorry for the late response).
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey.. i want to read html from a page where I need to login first.

Is there any way I can login programatically and then read html?

Example: I want to read html from mail.yahoo.com .. right now i get html code for login page.. but i want to read the html after login.

Please help.
Regards,
Shamil
 
Rancher
Posts: 43027
76
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd use a library like jWebUnit for programmatic access to web sites. Makes it much easier to deal with the HTML, and it supports Basic and Form Authentication.
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic