• Post Reply Bookmark Topic Watch Topic
  • New Topic

Importing csv, reading data, and have a count using queries  RSS feed

 
Hank Harrison
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok so I know how to import a csv using java. What I'm curious about doing is importing a csv using file chooser, reading the data, sorting the data out by certain parameters, and then outputting a count of each parameter I chose. Say for example I have columns 1,2 and 3. Column 1 has the name, column 2 has the percentage, column 3 has an o'clock time. I want to use queries to sort through the parameters and have a count of each parameter I choose.

my question is what is the best way to go about this! Any suggestions? Thanks in advance.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hank Harrison wrote:Ok so I know how to import a csv using java.

You do? Then you're one up on me, because I've never used the CSV API. All I know is that they can be very tricky, particularly if the data involved contains quotes and/or embedded commas.

What I'm curious about doing is importing a csv using file chooser, reading the data, sorting the data out by certain parameters, and then outputting a count of each parameter I chose. Say for example I have columns 1,2 and 3. Column 1 has the name, column 2 has the percentage, column 3 has an o'clock time. I want to use queries to sort through the parameters and have a count of each parameter I choose.

I think you'll have to explain what you mean by a "count" because, as I see it, it could just be the number of lines in the CSV file (minus a header line, if you have one). I'm also not quite sure where the FileChooser part comes in, but my advice would be to separate that completely from the rest of your problem, which would appear to be about "parsing" columns in some meaningful way.

One possibility might indeed be to supply a header line that contains the "column names" and then use those "names" to translate to a column index.

HIH

Winston
 
Hank Harrison
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By count lets say for example column one is an item, column 2 is a number in percentage format, and column 3, is an orientation in o'clock format. I want to create I guess a query that counts the items with certain percentages and specific orientation range and have the java program spit out a number telling me how many it found with those requirements.

I've been reading about jpql and thought maybe that would come in handy for that but haven't really found some good info on it. I would just use SQL if that we're an option but unfortunately at my job it isn't.
 
Piet Souris
Master Rancher
Posts: 2044
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Hank,

well nothing is lost without SQL, although these are built for such occasions.

What I would do, is:

create an extra column for each combination of columns you're interested in,
the value of such a column would simple be the concatenation of the values
of the corresponding columns. With these extra columns, it is a very easy matter
to create Maps.

But then must you do this in Java? Even a speadsheet would seem more appropriate.

Greetz,
Piet
 
Hank Harrison
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yeah the way we do it at work is in excel by just filtering and manually counting them. But there is so much room for error as there can be quite a bit of data and we've recently hired a lot of new faces. I figure the fastest way to do these without errors would be to just create a program. I'm only familiar with java but I'm curious if this would be easier done in another language?
 
Piet Souris
Master Rancher
Posts: 2044
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, it is not difficult to do a count on whatever column you have in mind.
But maybe we can get specific? Come up with a file that has some characteritics
of what you encounter in practise, and formulate some questions. Then we can
figure out what would be wise, i.e. can it be done in Java, or is there some other way
that would work out handier?

Greetz,
Piet
 
Hank Harrison
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok. Thanks guys.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hank Harrison wrote:Yeah the way we do it at work is in excel by just filtering and manually counting them. But there is so much room for error as there can be quite a bit of data and we've recently hired a lot of new faces. I figure the fastest way to do these without errors would be to just create a program.

Yes, but in order to do that you need to have some idea of what result you want (Excel provides STATIC formulae).
I worked on inventory systems for 10 years and, believe me, there are a myriad different things you can get just from "quantity on hand" and "quantity on order". It all depends on context.

And it's that that you haven't explained.

As is usual, if the question is simply "can it be done?", then the answer is almost certainly "yes". Java is a very flexible language. Your question would seem to be "how do I do this?", but you haven't supplied us with enough information to make a judgement.

I'm only familiar with java but I'm curious if this would be easier done in another language?

Possibly, but not necessarily. See above.

About the only thing I'm pretty sure of is that there aren't too many other languages out there that can even read Excel files - but I could be wrong.

HIH

Winston
 
Piet Souris
Master Rancher
Posts: 2044
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston,

Hank is talking about csv files. Now, nearly every program in the world can read these.
Excel is handy for these things, since it has a very fine selection mechanism, that
let you do all sorts of things. But the problem of having to read Excel files is
fortunately not relevant.

Hank will come up with a more or less practise case, together with some questions
he likes to see answered. I guess your expert knowledge of these things
will come in very handy. So, stay tuned...

Greetz,
Piet
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:Hank is talking about csv files. Now, nearly every program in the world can read these...

Quite right. I forgot the OP. Although, I'll say again: CSV files are not as easy as you might think - which is why there are libraries around for them too.

@Hank: Please TellTheDetails (←click). It sounds highly likely that you can do what you want; we just need a bit more information...

Winston
 
Piet Souris
Master Rancher
Posts: 2044
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:(...) Although, I'll say again: CSV files are not as easy as you might think - which is why there are libraries around for them too.
(...)Winston

We're digressing, but this is not my experience. If I have to read in some csv file (and I have to
do that a lot at work) I simply read in the first few lines and print them out, Then it is
very easy to see what you are dealing with. And in the very big majority of cases there is no
difficulty whatso-ever. The only problems I encounter is that I sometimes have to output such a file in dbf
format, and not only is the number of programs limited that can do that, but you also get to
deal with field definitions.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:We're digressing, but this is not my experience. If I have to read in some csv file (and I have to
do that a lot at work) I simply read in the first few lines and print them out, Then it is
very easy to see what you are dealing with.

Then I hate to say, but you've never worked in inventory or BOM, which uses CSV files all the time that can fool you into thinking that they're simple, when in fact they're not.

A part description (in fact. most of them) can be simply:
3/16 screw
or it can be:
3/16 screw, roundhead
and that involves embedded commas, which are usually solved by putting the description in quotes (which are NOT part of the data). And If we're now using quotes in a particular way, what about embedded quotes?

Now if that is part of a multi-megabyte stream of parts for an aircraft maintenance schedule, would you want to get it wrong?

The reason that there are CSV libraries out there is that they are not simple; and anyone who thinks they can just "roll their own" doesn't understand the protocol.

There's a big difference between a custom, comma-delimited file, and a CSV file.

Winston
 
Piet Souris
Master Rancher
Posts: 2044
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hmm, I wonder what is to blame: the csv format, or the guys that came up
with that kind of content.

I receive some files in excel format, every quarter, but if you look at what's in
it, then it is just a print out format, including in some cells the '===' and in
the cell below the sum of whatever is above it. Horrible to read, but I'm
not blaming the excel file format for that.

Greetz,
Piet
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:Hmm, I wonder what is to blame: the csv format, or the guys that came up
with that kind of content.

I'm not sure that it's either; although if I was designing the format I think I might have chosen something other than quotes for dealing with embedded commas or line-breaks, or allowed some form of "escaping". On the other hand, quotes are very "visual", and read very naturally, which I think was the intent.

The problem with them is that they work in pairs, which means that
(a) you usually need indented (or recursive) logic to handle them.
(b) you need some way to detect/handle "unclosed" quotes.
Now I'm not saying that you need a PhD to write a CSV reader or writer; just that there are rules that may not be immediately obvious.

For example, RFC 4180 mandates the use of CRLF (\r\n) as the line-break (as required by MIME), although it does say that "some implementations may use other values". It also mandates the use of double-quotes for fields, although it also states that Excel doesn't use them.

Possibly of more concern is the warning:
'Due to lack of a single specification, there are considerable differences among implementations. Implementors should "be conservative in what you do, be liberal in what you accept from others"...'

HIH

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!