• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

Suitable CSV Library for Java

 
Greenhorn
Posts: 7
Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Guys,
I'm badly in search of a good Open-Source Library for CSV that works with Java. After some research, I found some of them like Commons CSV, OpenCSV, SuperCSV. There are few parameters on basis of which I have to make my selection and after some research I was able to obtain info on some of these but not all. So I hope someone here can help me with the remaining.

Problem:
The CSV file would be consisting of student records such that one student can have more than 1 record. and records of a particular student will always be together in the file. Example:

Id, name, grade, subject, marks
S1, abc, 5th, English, 88
S1, abc, 5th, Maths, 80
S1, abc, 5th, History, 85
S1, abc, 5th, English, 82
S2, xyz, 5th, English, 78
S2, xyz, 5th, Maths, 80
S3, pqr, 6th, Maths, 89

Some unanswered questions are:

1. Which library has in-built validations for detecting formatting errors in the csv file.
2. Which library supports multi-threading as I want to process records pertaining to different students in different threads.

 
author & internet detective
Posts: 42056
926
Eclipse IDE VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ankit,
Welcome to CodeRanch!

How many students are in the file? Unless it is an extremely large number, your best bet might be to load the file into a ConcurrentMap (using one of those libraries) and doing your parallel processing in Java.
 
Ranch Hand
Posts: 99
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
 
 
Ankit Gohil
Greenhorn
Posts: 7
Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeanne Boyarsky wrote:Ankit,
Welcome to CodeRanch!

How many students are in the file? Unless it is an extremely large number, your best bet might be to load the file into a ConcurrentMap (using one of those libraries) and doing your parallel processing in Java.



Thanks Jeanne. The file size could run into Gigabytes also so I don't want to get OutofMemory errors by loading the whole file into memory at once. Also my last night research helped me to understand that none of the libraries support multi-thraeding so I'll have to handle it manually, which is not the issue.

Now my major concerns is: How does these library actually read ??
I have learnt that Commons CSV reads the complete file at once and store it in memory while others (OpenCSV, SuperCSV) read line by line.
Do they actually read a line at a time from the file and store it in some buffer area. How is it actually processed. Can you give me in-depth info on this ??
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Don't need a framework, just read the file line by line and split each line. For validation, check the values in the split array.


No, don't do that. There's a reason these libraries exists, and that's because CSV is not quite so simple as it appears at first. Before you're done coding all the special cases (which you need to in case somebody uses them), you might as well use a library.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ankit Gohil wrote:Do they actually read a line at a time from the file and store it in some buffer area. How is it actually processed. Can you give me in-depth info on this ?


The libraries are all open source, and not big in size - might as well just look at the source if you're interested. I have always just used them, and never worried about how they work under the hood. I'd be surprised if any of them read the entire file into memory instead of line by line, as it would be an obvious and needless inefficiency.
 
Ankit Gohil
Greenhorn
Posts: 7
Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ulf Dittmer wrote:

Ankit Gohil wrote:Do they actually read a line at a time from the file and store it in some buffer area. How is it actually processed. Can you give me in-depth info on this ?


The libraries are all open source, and not big in size - might as well just look at the source if you're interested. I have always just used them, and never worried about how they work under the hood. I'd be surprised if any of them read the entire file into memory instead of line by line, as it would be an obvious and needless inefficiency.



Also, If you have any info on the below things please let me know.

  • Does any of these libraries provide any in-built formatting validations. If yes, are they customizable ?
  • Is multi-threading in any form supported by any library ? For my code I'm actually aiming for thread-pooling. Example: Say my CSV consists of 100 records or lines (assuming each record per line). Now I want to create thread T1 and make it read records 1-10, create T2 and make it read records from 11-20, T3 21-30 and so on.... or if I have to do it manually, then how ?
  •  
    Ulf Dittmer
    Rancher
    Posts: 43081
    77
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    No idea. I would imagine that the documentation of the libraries talks about that. I am almost certain that none supports your second point, simply because only one process can open a file at any given time. But concurrent I/O should not be necessary - just read the CSV into memory, and then create multiple threads that work with the in-memory representation.
     
    Bartender
    Posts: 3648
    16
    Android Mac OS X Firefox Browser Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hello Ankit

    I have used OpenCSV a few times. Regarding the library reading line by line or all at once, from my observation it's line by line. Use OpenCSV as example. Its API has 2 read() methods

    public String[] read(Reader file);
    public List<String[]> read(Reader file);

    You may have guessed the method returning List<String[]> reads the entire file into memory. This option has its good and bad points. Good point it you can easily go to the footer at the end of the file instead of reading line by line until the end. The bad thing is large files will cause OutOfMemoryError.

    The threading issue I'm not aware such libraries has what you want. Normally if there are many files to process, one file per thread. What you do in the thread (eg to spawn new sub-threads) is up to you.
     
    Ankit Gohil
    Greenhorn
    Posts: 7
    Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    K. Tsang wrote:Hello Ankit

    I have used OpenCSV a few times. Regarding the library reading line by line or all at once, from my observation it's line by line. Use OpenCSV as example. Its API has 2 read() methods

    public String[] read(Reader file);
    public List<String[]> read(Reader file);

    You may have guessed the method returning List<String[]> reads the entire file into memory. This option has its good and bad points. Good point it you can easily go to the footer at the end of the file instead of reading line by line until the end. The bad thing is large files will cause OutOfMemoryError.

    The threading issue I'm not aware such libraries has what you want. Normally if there are many files to process, one file per thread. What you do in the thread (eg to spawn new sub-threads) is up to you.



    Thanks for your reply. Also I would like to know if there is any method available in OpenCSV or any other library that can check for the genuineness of the file(whether its actually a CSV). It could be by checking either the extension of the file or its actual contents.
     
    Ulf Dittmer
    Rancher
    Posts: 43081
    77
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I would imagine that the code throws an exception, or some error code, if the file is not actually properly formatted CSV.
     
    K. Tsang
    Bartender
    Posts: 3648
    16
    Android Mac OS X Firefox Browser Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Nope at least for OpenCSV.

    Checking the file extension only an indicator. The actual content (whether it's separable or not by some delimiter) is what you are looking for.

     
    Ankit Gohil
    Greenhorn
    Posts: 7
    Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    @Ulf , @Tsang: So this mean I need to write my own code for checking it. Hope some day these libraries include this functionality or maybe I can provide them with one
    Anyways thanks guys for your support !!
     
    Ranch Hand
    Posts: 679
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Ankit Gohil wrote:So this mean I need to write my own code for checking it. Hope some day these libraries include this functionality or maybe I can provide them with one


    I think Ulf's reply indicated the exact opposite of that

    Ulf Dittmer wrote:I would imagine that the code throws an exception, or some error code, if the file is not actually properly formatted CSV.

     
    Ranch Hand
    Posts: 10198
    3
    Mac PPC Eclipse IDE Ubuntu
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I would go with the option of writing my own CSV parser. May be give camel-csv / camel-bindy a try!
     
    Ankit Gohil
    Greenhorn
    Posts: 7
    Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Stuart A. Burkett wrote:

    Ankit Gohil wrote:So this mean I need to write my own code for checking it. Hope some day these libraries include this functionality or maybe I can provide them with one


    I think Ulf's reply indicated the exact opposite of that

    Ulf Dittmer wrote:I would imagine that the code throws an exception, or some error code, if the file is not actually properly formatted CSV.



    @Stuart: It was my bad, what Ulf stated was completely opposite & I understood a bit later... Also I have tried & tested it using SuperCSV.. The code throws an exception if the file contents are not in CSV format because its not able to convert the record into a bean..
    thanks man !!!
     
    Ranch Hand
    Posts: 334
    2
    Netbeans IDE Tomcat Server Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I've used openCSV for a few projects and find it efficent and easy to use. I'll add a few random observations:

    1) I'm not sure what errors a general CSV parser could possibly detect. I suppose you could pass it a binary file but every text file is technically a csv file even if it has no commas.
    The checks I use are a) number of fields is in a valid range and b) numbers are where they are expected and in the allowed range.

    2) I don't really see any efficiency to opening the same file multiple times in different threads, although most OSs will allow a file to be opened multiple times for reading. I can see opening separate files in separate threads, or having the file reading thread pass individual records to a thread pool to process.

    3) Depending on exactly what you are going to do with this data it might be worth importing the csv into a relational database table and do your processing with SQL queries. If the data is fairly stable and you plan to access it multiple times before re-importing it would be a bit more obvious.

    Joe
     
    Ankit Gohil
    Greenhorn
    Posts: 7
    Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    @ Joe: Currently I'm evaluating SuperCSV for different functionalities that I require. It also has in-built validation for number range, not null, String matching(using RegEX). I know opening the same file in multiple threads isn't a good idea so I have already dropped it.
    DB is a constraint, I can't use DB in any form.

    Once done with my evaluation of SuperCSV I'll evaluate OpenCSV on the same set of parameters & will post it here.
    Thanks Buddy !!
     
    What could go wrong in a swell place like "The Evil Eye"? Or with this tiny ad?
    Smokeless wood heat with a rocket mass heater
    https://woodheat.net
    reply
      Bookmark Topic Watch Topic
    • New Topic