• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

how to parse html webpage

 
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi guys can anybody give idea to parse html webpage live url parsing

using java.


i have code but the out put is in the form of html tags
so how can i split the tags so give idea friends

import java.net.*;
import java.io.*;

public class URLReader {
public static void main(String[] ar) throws Exception {

URL yahoo = new URL("http://finance.yahoo.com");
BufferedReader in = new BufferedReader(new InputStreamReader(yahoo.openStream()));
BufferedWriter wr=new BufferedWriter(new FileWriter("sample.txt"));

String inputLine;
while ((inputLine = in.readLine()) != null)
// System.out.println(inputLine);
try
{
wr.write(inputLine);
}catch(Exception e)
{
e.printStackTrace();
}
in.close();
}
}
bye
Naga
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are many things you might want to accomplish with a downloaded web page. You need to tell us what you're trying to do with it.

If you want to extract the text, I'd start by converting the HTML into well-formed XML; libraries like NekoXNI, JTidy and TagSoup can do this for you.
 
naga raaju
Greenhorn
Posts: 29
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi
thanks for your reply,
i need some text from the web pages.so what sholud i do.


can i depend on third party API. or that is possible with java coding.


bye
Naga
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is an HTML parser provided in the Java API. As Ulf says, it depends on your exact requirements whether it will fit the bill or not.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That depends on the specifics. Are you talking about one particular page on one particular site? Several pages? Several sites? Is the layout of the page(s) predictable? Are there ID tags on which you can rely?

You will need to do some coding, but the libraries I mentioned will help you get started.
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

You can also use biterscripting (.com for free download) for parsing html. It works great.

They have a sample script posted at http://www.biterscripting.com/SS_URLs.html . This script extracts referenced URLs from a page. Another sample script http://www.biterscripting.com/SS_SearchURL.html will search a page for specific search words. The sample script http://www.biterscripting.com/SS_SearchWeb.html is de facto your own search engine.

You can get started with these scripts.

If you come up with new html parsing scripts of your own, can you please post them for the rest of us ? Thanks.

Randi
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to JavaRanch, Randi but please don't resurrect 10-month old threads. Have a look at this FAQ.
 
Consider Paul's rocket mass heater.
reply
    Bookmark Topic Watch Topic
  • New Topic