Win a copy of The Way of the Web Tester: A Beginner's Guide to Automating Tests this week in the Testing forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Help With HttpUrlConnection class

Jason Hoskins
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Everyone,
This is my first post. I've been searching for a solution to my problem but to no avail.
Here goes:
I am trying to write a java bean which reads a URL and parses out certain info. This info is then stored in another object and used by some jsp pages at an application level. Its basically a peice of code which holds "Status" information and is read by numerous jsp pages (it refreshes every 5 minutes).
Anyways, here is the guts of my problem. The info I am trying to parse is from an XML document. At first I had written the bean to parse the document using the xerces DOM parser. However, the XML document (see below) has html mark-up which is not contained in a CDATA section. I have to retain the html mark-up, so I abandoned the DOMParser (long story) and jsut decided to read the XML (which sits on a web server) into a string via the HttpUrlConnection class. The using standard String methods I parsed out the few bits of info I needed.
This logic works fine, but I am having trouble with the encoding. The XML document is UTF-8 encoded, however the html mark-up contains ISO-8859-1 encoding. This includes those pesky curly quotes. The HttpUrlConnection class reads the URL fine but can not read the quotations cahracters and replaces them with gibberish.
Has anyone encountered anything like this before ?? I'm trying to run the beans on a Solaris 5.8 box.
<?xml version="1.0" encoding="UTF-8"?>
<abstract>Insert various html tags in here</abstract>
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic