pranav modi

Greenhorn
+ Follow
since Sep 01, 2008
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by pranav modi

Hi all,

I am writing an application to extract event date, time and their descriptions from websites which do not give out feeds. can someone give some tips as to how this might be accomplished in java. are there web scraping applications which already have this functionality?

thanks in advance,

Pranav Modi.
15 years ago
oops...here is the xml document :

XML file does not appear to have any style information associated with it. The document tree is shown below.


<irrepressible:info>

<fragments count="10" query="http://irrepressible.info/api?lang=en&limit=10&country=china%20iran" language="en">

<fragment href="http://irrepressible.info?lang=en&fragment=1" id="1">
<country>China</country>
<domain>http://hrichina.org/</domain>;
<organisation>Human Rights in China</organisation>
<siteDescription>HRIC is a Chinese human rights organisation</siteDescription>

<url>
http://www.hrichina.org/public/contents/article?revision%5fid=10853&item%5fid=3759
</url>

<fragmentText>
With incredible bravery, she has broken through much of the fear and isolation imposed upon other victims' families.
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=2" id="2">
<country>China</country>
<domain>http://hrichina.org/</domain>;
<organisation>Human Rights in China</organisation>
<siteDescription>HRIC is a Chinese human rights organisation</siteDescription>

<url>
http://www.hrichina.org/public/contents/article?revision%5fid=10853&item%5fid=3759
</url>

<fragmentText>
Her dedication and perseverance has gained her wide admiration,
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=3" id="3">
<country>China</country>
<domain>http://hrichina.org/</domain>;
<organisation>Human Rights in China</organisation>
<siteDescription>HRIC is a Chinese human rights organisation</siteDescription>

<url>
http://www.hrichina.org/public/contents/article?revision%5fid=1894&item%5fid=1893
</url>

<fragmentText>
Several scholars and outspoken journalists have lost their jobs or been warned by the government.
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=4" id="4">
<country>China</country>
<domain>http://www.amnesty.org/</domain>;
<organisation>Amnesty International</organisation>

<siteDescription>
Amnesty International is a movement of ordinary people standing for humanity and human rights.
</siteDescription>
<url>http://news.amnesty.org/index/ENGAMR510802006</url>;

<fragmentText>
They also need to allow independent medical experts into the camp
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=5" id="5">
<country>China</country>
<domain>http://www.amnesty.org/</domain>;
<organisation>Amnesty International</organisation>

<siteDescription>
Amnesty International is a movement of ordinary people standing for humanity and human rights.
</siteDescription>

<url>
http://web.amnesty.org/pages/deathpenalty-index-eng
</url>

<fragmentText>
With 20,000 people on death row across the world, over 2,148 people were executed in 22 countries in 2005.
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=6" id="6">
<country>China</country>
<domain>http://www.bbc.co.uk/</domain>;
<organisation>British Broadcasting Corporation</organisation>
<siteDescription>The BBC is a public service broadcaster</siteDescription>
<url>http://www.bbc.co.uk/weather/5day.shtml</url>;

<fragmentText>
so if a day is forecast to be sunny with the possibility of a brief shower,
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=7" id="7">
<country>China</country>
<domain>http://www.bbc.co.uk/</domain>;
<organisation>British Broadcasting Corporation</organisation>
<siteDescription>The BBC is a public service broadcaster</siteDescription>
<url>http://news.bbc.co.uk/1/hi/sci/tech/default.stm</url>;

<fragmentText>
A US-British team challenges the idea that the tiny skeleton dubbed the ""Hobbit"" is a human new to science.
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=8" id="8">
<country>China</country>
<domain>http://www.bbc.co.uk/</domain>;
<organisation>British Broadcasting Corporation</organisation>
<siteDescription>The BBC is a public service broadcaster</siteDescription>

<url>
http://news.bbc.co.uk/1/hi/entertainment/4994564.stm
</url>

<fragmentText>
Star Trek fans are being offered a "once in a lifetime" opportunity to buy props, models and sets from the show
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=9" id="9">
<country>China</country>
<domain>http://www.freetibet.org/</domain>;
<organisation>Free Tibet Campaign</organisation>

<siteDescription>
Free Tibet Campaign stands for the Tibetans' right to determine their own future
</siteDescription>
<url>http://www.freetibet.org/events/climbfortibet.html</url>;

<fragmentText>
In the last few years the Climb for Tibet team has sent out thousands of Peace Messages
</fragmentText>
</fragment>
+
<fragment href="http://irrepressible.info?lang=en&fragment=10" id="10">
<country>China</country>
<domain>http://www.freetibet.org/</domain>;
<organisation>Free Tibet Campaign</organisation>

<siteDescription>
Free Tibet Campaign stands for the Tibetans' right to determine their own future
</siteDescription>
<url>http://www.freetibet.org/events/diary.html</url>;

<fragmentText>
The Healing Chod is an ancient enchanting ceremony and is a wonderful opportunity to deeply relax ones body, speech and mind.
</fragmentText>
</fragment>
</fragments>
</irrepressible:info>
15 years ago
Hi all,

I have written a very simple client for a web service. all i want it to do is connect to a website, get a xml document and print some elements of the document. It creates the url correctly, but on printing the document it gives :
[#document = null]
also it does not print the element which i coded it for. can someone point out the error for me?
thanks!
the code goes as follows:

import java.io.*;

import org.w3c.dom.*;
import org.xml.sax.SAXException;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import java.net.HttpURLConnection;
import java.net.URL;



public class Iapp
{
public static void main(String[] args)
throws IOException, ParserConfigurationException,
SAXException, XPathExpressionException
{
String lang = "lang="+args[0];
String country = "&country="+ args[1];
URL url = new URL("http://irrepressible.info/query?"+lang+country);
System.out.println(url);
HttpURLConnection huc = (HttpURLConnection) url.openConnection();
huc.setRequestMethod("GET");
huc.connect();
InputStream is = huc.getInputStream();
//System.out.println(is);
//int code = huc.getResponseCode();

// Turn the response entity-body into an XML document.
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
System.out.println(docBuilderFactory);

DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
System.out.println(docBuilder);
Document doc = docBuilder.parse(is);



huc.disconnect();

System.out.println(doc);

XPath xpath = XPathFactory.newInstance().newXPath();
NodeList bookmarks = (NodeList)xpath.evaluate("/irrepressible:info/fragments/fragment/fragmentText", doc, XPathConstants.NODESET);
//Iterate over the bookmarks and print out each one.
for (int i = 0; i < bookmarks.getLength(); i++)
{
NamedNodeMap bookmark = bookmarks.item(i).getAttributes();
String description = bookmark.getNamedItem("description")
.getNodeValue();
String uri = bookmark.getNamedItem("href").getNodeValue();
System.out.println(description + ": " + uri);
}

System.exit(0);
}

}
15 years ago