• Post Reply Bookmark Topic Watch Topic
  • New Topic

Problem with csv file  RSS feed

 
Bill Hayes
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, I'm new to the forum so please bear with me as I try to get used to what level my general java questions are. I hope I'm in the right forum.

I have found several csv parsers out there and have been successful using them on various test files. Now I'm trying to parse a csv file that is generated from a vendor website which uses MS SQLServer. Parsing aside, there seems to be something different about the csv file downloaded from the vendor website and I can't even read the file properly.



When I run the above code I get a bunch of binary looking output which I'm not able to paste in here. If I take this file and read it line by line and write each line to a new csv file then the above code will output the contents of the new file correctly.
I'm completely confused and frustrated and I've been working on this for days. Has anyone seen this before?
I would prefer not to create a new copy of the file before parsing it.
Also, I didn't see anywhere that I can attach a file. It would have been helpful to know if someone else can duplicate the problem.
Thanks.
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch.

Where does the output of the System.out.println go? If it goes to a console of some kind, then that may not be able to display some of the characters in the file, resulting in gibberish. Or is it a pure ASCII file?

I'd definitely recommend using a CSV library instead of rolling your own. There are a number of edge cases that need to be considered with CSV, and before you've got all those coded up, you might as well use a ready-made library.

There is no facility for attaching files to posts. You can include a link to a file, though, if you have some web space where you can upload it.
 
Bill Hayes
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The output goes to the console. I have used several CSV parsing libraries and was successful. The problem I'm having sems to be pre-parsing. I should be able to simply read each line from the csv file like my example code illustrates. There are no strange characters in my file, so they should all output correctly. If I copy the contents of the bad file and paste directly into a new file then my code will correctly display the contents in the console. I'm baffled. I've creating a new file off the vendor website several times but it doesn't make a difference.
 
Adam Schaible
Ranch Hand
Posts: 101
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've had success with the OsterMillerUtils parser.. Not sure what you're using, but there's not enough information in your post to really help.

Step through the parsing and then you'll be able to see where it's going astray.
 
Joe Ess
Bartender
Posts: 9443
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This sounds like a character encoding problem. Readers use the default charset of a platform to read a file and convert the contents into characters. Your editor may be smarter than java.io.Reader and be able to detect different charsets and render them properly. You may be "converting" the charset when you create a new file, as your editor probably saves new files as the default charset (cp1252 on Windows). Can your editor display the character encoding (in JEdit select Utilities->Buffer Options, in Eclipse select File->Properties)?
See The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) for more on charsets.
 
Bill Hayes
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
WOW! Thanks a million for the response. That article was very interesting and quite an eye opener for a novice programmer. And, lo and behold, my problem is encoding. I opened the file in a decent text editor and found the file is encoded as UCS-2 Little Endian. If I format it as UTF-8 (using my text editor) my program will read it correctly.
So, now what? How do I handle various encodings in java when parsing text files? I've never seen this covered in any Teach yourself a programming language book, or possibly I skimmed right over it.
 
Bill Hayes
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think I answered my question. After some searching I'm using the code below which seems to work.



My final question is, is there a way to determine the encoding before reading the file then pass the encoding as a variable to the BufferedReader? Sounds like I would want the flexibility in case the encoding changes on my input file.

 
Joe Ess
Bartender
Posts: 9443
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Bill Hayes:
My final question is, is there a way to determine the encoding before reading the file then pass the encoding as a variable to the BufferedReader? Sounds like I would want the flexibility in case the encoding changes on my input file.


I am not aware of any Java API method. Since you are downloading the file via HTTP, you should get the encoding in the Content-Type header (see the end of the article I linked to for more on that).
 
Bill Hayes
Greenhorn
Posts: 24
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you to everyone for the replies. It's nice to feel welcome on a forum and get the help you need. Most other java forums are not as kind to novice programmers.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!