Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Error while parsing html page in java on linux

 
Rahul Dhaware
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am parsing HTML page using some html parsing utility. i am using cobra.jar and js.jar for that.

There are some unreadable special charactor like ' � ' but when I compiled my program in windows it compile properly and run fine.

But when i compiled it in linux it gives me followig Warning:
unmappable character for encoding UTF8
String stateZipArray[] = stateZip.trim().split(" � ");

and then while accessing elements from stateZipArray array it gives ArrayIndexBounds exception.

In InputStreamReader class i am using 'ISO-8859-1' as a charsetname.

Can any one please tell me what is problem and how can i resolve it?

Thanks in advance.
 
Martijn Verburg
author
Bartender
Posts: 3275
5
Eclipse IDE Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It sounds like you're mixing and matching your encoding types. Try using UTF-8 in your InputStreamReader and also read this article
 
Rahul Dhaware
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have tried using UTF-8 in constructor of InputStreamReader.
It it not works. it gives me same error.
 
Martijn Verburg
author
Bartender
Posts: 3275
5
Eclipse IDE Java Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Have you read the article link I posted? It gives you vital understanding of these sorts of problems...
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic