• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Groovy : Process CSV to SQL

 
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello there,
I need to pass data from a CSV file to a database, so hence I thought of CSV->SQL.
In my CSV file, it has about 70 fields per row of data. And there are other issues.
I found someone else online having the same problem with similar data issues of their CSV file.
I will put the other persons example, but illustrates the problem that I have very well:

*******************************************************************************
http://stackoverflow.com/questions/782353/sql-server-bulk-insert-of-csv-file-with-inconsistent-quotes

Is it possible to BULK INSERT (SQL Server) a CSV file in which the fields are only OCCASSIONALLY surrounded by quotes? Specifically, quotes only surround those fields that contain a ",".

In other words, I have data that looks like this (the first row contain headers):

id, company, rep, employees
729216,INGRAM MICRO INC.,"Stuart, Becky",523
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253

Because the quotes aren't consistent, I can't use '","' as a delimiter, and I don't know how to create a format file that accounts for this.

I tried using ',' as a delimter and loading it into a temporary table where every column is a varchar, then using some kludgy processing to strip out the quotes, but that doesn't work either, because the fields that contain ',' are split into multiple columns.

*******************************************************************************


This is the type of problem I am having with fields such as this:
"Stuart, Becky".

I have some solutions, I could write a Groovy script to drop all quotes and then count the
field place of where Stuart and Becky are and concatenate them.

But then I run into another problem of how to construct the SQL string.
For example I would plan to write an SQL INSERT statement in a Multiline string using the operator """,
and do variable interpolation with the fields.
But I consider my approach kind of long.

For example I don't want to write:

INSERT INTO VALUES( $fields{1}, $fields{2), $fields{3},$fields{4},$fields{5},.......,$fields{70});

Is there a way I can maybe write a for-loop to create this INSERT statement.
Because I don't want to write /hardcode 70 fields into my code.

SO that is my approach.
If somebody knows a totally different easier approach, please let me know.

Sincerely,

Paul













 
Paul Ristevian
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK,

It seems my original question may be a difficult one.
Let me rephrase my question into a smaller chunk.

OK, I hope that someone can help me with this one.

As a first step, I need to be able to parse a CSV file.
And this parsing needs to take into account embedded comma in a text qualifier, specifically the double quotes.

So for example the line:
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114

Should be parsed into fields/elements:

(1) 729235
(2) GREAT PLAINS ENERGY, INC.
(3) Nelson, Beena
(4) 114


SO has anyone created a CSV parser in Groovy?

Anybody?


Sincerely,

Paul








 
Ranch Hand
Posts: 15304
6
Mac OS X IntelliJ IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One of the nice things about Groovy is that it is java and you can use java code directly in your groovy code. Instead of trying to figure out a way in groovy, why not just use what someone already wrote in Java?

http://opencsv.sourceforge.net/
 
Paul Ristevian
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Thanks for that.
I looked at the different Java parsers, but was hoping something like that was done
in Groovy, because the code in Groovy is usually shorter and more friendly to read in general.

P
 
Gregg Bolinger
Ranch Hand
Posts: 15304
6
Mac OS X IntelliJ IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Ristevian wrote:
Thanks for that.
I looked at the different Java parsers, but was hoping something like that was done
in Groovy, because the code in Groovy is usually shorter and more friendly to read in general.

P



Yea, but when using an API, how cares?
 
Paul Ristevian
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
OK,
But when I looked at OpenCSV, I could not figure out how it would handle
embedded comma in the double quote text qualifier.
Does it do this automatically?

Here is some code below:
***********************************************************************
How do I read and parse a CSV file?
If you want to use an Iterator style pattern, you might do something like this:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"));
String [] nextLine;
while ((nextLine = reader.readNext()) != null) {
// nextLine[] is an array of values from the line
System.out.println(nextLine[0] + nextLine[1] + "etc...");
}


OR

Can I use my own separators and quote characters?
Yes. There are constructors that cater for supplying your own separator and quote characters. Say you're using a tab for your separator, you can do something like this:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t');


And if you single quoted your escaped characters rather than double quote them, you can use the three arg constructor:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'');


You may also skip the first few lines of the file if you know that the content doesn't start till later in the file. So, for example, you can skip the first two lines by doing:

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), '\t', '\'', 2);


**************************************************************************
It talks about escaping characters and, skipping lines, but I don't see where it
handles my particular case.


If you can tell me how this handles the parsing case I mention please let me know.

Paul










 
Gregg Bolinger
Ranch Hand
Posts: 15304
6
Mac OS X IntelliJ IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Have you tried it to see what happens when you feed it your file?
 
Paul Ristevian
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

OK,

You know what I just found a description that describes what it can do on the opencsv site:
Somehow I missed this when I was reading thru it in the past.

*********************************************************************
What features does opencsv support?
opencsv supports all the basic csv-type things you're likely to want to do:

* Arbitrary numbers of values per line
* Ignoring commas in quoted elements
* Handling quoted entries with embedded carriage returns (ie entries that span multiple lines)
* Configurable separator and quote characters (or use sensible defaults)
* Read all the entries at once, or use an Iterator style model
* Creating csv files from String[] (ie. automatic escaping of embedded quote chars)
*********************************************************************

You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.


Paul

 
Gregg Bolinger
Ranch Hand
Posts: 15304
6
Mac OS X IntelliJ IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Ristevian wrote:

You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.


Paul



Understanding is great. But we expect a bit of effort on your part. You try something, it doesn't work, you ask us for help, and we point you in the right direction.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Ristevian wrote:You see I like to understand things first before attempting any kind of trial and error approach. That is why I ask questions on forum sites.


But there's a limit beyond which that isn't practical. Sometimes it's more useful to just try things and see what happens. In fact, quite often that's the best approach.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi folks,

So I m in a similar situation as this and I need some guidance in moving forward. We need a dynamic parser that will be working with a series of CVS files.. each having different fields and values.. however the format would be something like below..

"NAME","ADDRESS","CITY","STATE","POSTAL_CODE","CNTRY_CODE"
"NY-Correctional Facility","Exchange St","ATTICA","NY","14011","US
"
"Consolidated School Corp","205,East Street","Attica","IN","47918","US
"

Looking at OpenCSV it seems it can handle most of the data issues we have seen in CSV files, like comma inside values, carriage returns etc. However one of our requirements is that after parsing the CSV files I need the tool to return it as a list with the format attribute:value where attribute would be the corresponding field from the header line.
ie something like this..

NAME:NY-Correctional Facility, ADDRESS:Exchange St, CITY:ATTICA, STATE:NY, POSTAL_CODE:14011,CNTRY_CODE:US

Is there any way I can get OpenCSV to do this?

I m kinda new to all this, so if the solution is simple, do excuse me

Thanks in advance



 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic