• Post Reply Bookmark Topic Watch Topic
  • New Topic

Read csv file of different datatypes (Eval my code)  RSS feed

 
Raymond Holguin
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So im reading in an csv file into Java object, where the data types vary. Here is my code for doing this, please comment on if you think this is a good way to accomplish this or can it be improved upon. Note that this code does work and that the java object and its field types/getters/setters cannot be modified. Basically what im dong is that I have 4 possible data types that I define. With each csv field I read in, using reflection I attempt to see if there is a matching getter in the java object for any of the datatypes in the list I defined. If one fails, then I move on the next data type until I find a match. Once i find a match then I exit the loop and move onto the next data field.



I want to try and get the code as efficient as possible since once in production, this csv file can potentially contain 10k+ lines of data.
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This isn't really a self-contained example. You don't show us what "headers" is for example. Still, I can make some general suggestions.

You want to avoid reflection when possible. I really can't tell if it's possible to avoid or not in this case, but at least try to find an alternative.

Don't do things inside a loop body that you could do a single time before the start of the loop. The getMethod() calls could be done that way, and the methods stored in array where the loop body could access them on each iteration.

Your loop on dataTypes makes no sense. You loop over the different classes, but then handle each one separately anyway. You could just dispense with the loop and call the four meth.invokes() one after the other. Well, you could except for that break statement. That makes it so you will call the case for the String.class, and then skip all the other cases. Why write a loop that will always exit after the first iteration?
 
Raymond Holguin
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greg Charles wrote:This isn't really a self-contained example. You don't show us what "headers" is for example. Still, I can make some general suggestions.

You want to avoid reflection when possible. I really can't tell if it's possible to avoid or not in this case, but at least try to find an alternative.


Headers is just the name of the columns and the general idea is just name, address, DOB, some other numerical values associated with the person. The data is read from excel and stored in the DB which has a strict data type definitions which is why i can't treat them all like strings.


Don't do things inside a loop body that you could do a single time before the start of the loop. The getMethod() calls could be done that way, and the methods stored in array where the loop body could access them on each iteration.


Not sure what you mean by this. I have no idea what method I will be calling is until I arrive at the field in in the loop so im not sure how i can define this in advance.


Your loop on dataTypes makes no sense. You loop over the different classes, but then handle each one separately anyway. You could just dispense with the loop and call the four meth.invokes() one after the other. Well, you could except for that break statement. That makes it so you will call the case for the String.class, and then skip all the other cases. Why write a loop that will always exit after the first iteration?


The idea is that if



does not exist in the java object it will throw an exception and never hit my IF statements. The IF statements will only be hit once a valid method is found. but once the valid method is found I need to figure out which CLASS type was actually hit so I can type cast the String value to the appropriate data type and pass it in as a parameter to the method. I see no way of doing dynamic type casting which is why i explicitly did the type casting in IF statements. The purpose of the BREAK statement is to move on to the next data field once a valid CLASS type was found. Otherwise every String would iterate over the CLASS types loop 4 times which most of them only need to go through it once.

 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, I get it now. Your dataTypes loop goes until it finds a case that doesn't trigger an exception. That's an extremely poor programing practice. Document it with a massive comment if have to, but it would be much better to find a more straightforward way of accomplishing your task.

The code you're showing is part of loop itself, right? You say the file has thousands of rows, and the code here only seems to process one. The same headers apply to all the rows, or at least that's what I assume, so the code that gets a method by reflection doesn't depend on anything in the row data, and yet you're calling it for each row ... an expensive waste of time. Again, I doubt you need reflection at all, but if you decide you do, at least get the Method instances once for the whole file, not every time you process a row.
 
Raymond Holguin
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes for simplicity purposes I just showed how a single row is processed. I suppose I could assign the headers in a map of


so i know what the class type is in advance, but the dataTypes loop would still have to happen at least once in order to populate it but after that it just becomes a map lookup which i suppose could save some time. The only other way I can think of to populate my Objects would be to just individually call each setter in code and manually assign the data index such as


The only reason I do not want to do this is because
1) The columns could increase and my current method is dynamic in that increasing columns has no affect on my reader
2) The current column count is in the 30's which means its incredibly tedious task to write all those setter calls, so this few lines of code handles it very simply.
 
Junilu Lacar
Sheriff
Posts: 11494
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you want to make efficient code, then using reflection and nested loops is not going to help you. It looks like you already know what the different column types are so why do you want to program in so much flexibility by using reflection? You could just create as many value-converters as you need then "decorate" your target class with a class that "covers" each setter of the target class with an appropriate value converter.

That is, decorator class wraps the target class, then you call the setters of the decorator class, each setter converts the value to the appropriate type and calls the appropriate target class setter, passing in the converted value. Think of the decorator class as a mold or template. You only create one decorator, have one method to load in a new target class instance, call all the decorator setters, thus filling up all target class instance values, then pop the target class instance out to a "bucket" (another collection to hold completed target instances). Repeat the process until you have processed all the data.

If you're a gun enthusiast or have ever seen a progressive reloader in action, you'll kind of get the picture. The decorator is the reloader and each shell is a new target instance. At each step, a property is converted and set on the target instance. As far as terminology goes, I'm not sure if "decorator" is exactly the right pattern name for this but that's what I have pictured in my head right now as I'm thinking about this.

Does that make sense?

BTW, you don't have to write your own value converters either. Apache has a bunch of them: http://commons.apache.org/beanutils/v1.8.3/apidocs/org/apache/commons/beanutils/ConvertUtilsBean.html
 
Paul Clapham
Sheriff
Posts: 22835
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Raymond Holguin wrote:2) The current column count is in the 30's which means its incredibly tedious task to write all those setter calls, so this few lines of code handles it very simply.


However that is in direct conflict with what you originally said:

I want to try and get the code as efficient as possible


Don't get me wrong, I don't usually have a problem with writing redundant code if it's easier for me to do that. But I rarely have "as efficient as possible" as one of my requirements, and you do.
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're sure about reflection, here's the way I'd do it:

Use the Class.getMethods() method to get all the methods from you EmsStudent class. Loop through them looking for methods that start with "set". Parse out the property name from method name (e.g., setXxxx() becomes "xxxx"). Make a hash map of the methods, keyed by the name you parse out.

As you are processing the rows, retrieve the method you need from the HashMap based on the "header" value for each field. Look at the method to see what sort of parameter it takes. Parse the field String data into the correct type. Invoke the setter.
 
Raymond Holguin
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@Paul - indeed i did say that. But i guess i meant efficient as possible while still maintaining flexibility. Having an app that i have to custom build for each customter because each one wants different sets of data isnt going to work. I need an efficient single peice of code to handle all possibilities if possible

Greg - never thought about this but it sounds interesting. ill look into it thanks!! Ill have to run some timers against the various methods to see which is fastest
 
Pat Farrell
Rancher
Posts: 4686
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
More fundamentally, concern over performance for 10,000 records is pretty classic premature optimization.

Move the reflection out of the loop, and then run it and see if its fast enough. It probably will be.
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This looks like a bug:

mm means minutes, not months. For months, use MM:

 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Raymond Holguin wrote:I want to try and get the code as efficient as possible since once in production, this csv file can potentially contain 10k+ lines of data.

I'm afraid I'm with practically everyone else here about using reflection: It's simply not warranted; either by the complexity of your problem or any flexibility you might gain (by which I assume to mean the ability to add new classes without recompiling, which at the moment you can't do anyway).

Unless you also plan on also creating a file of class names and using something like Class.forName() to create a template - and the question then begs: who is responsible for maintaining that file? - your object creation is pretty much limited to value classes you've listed, with at most a single value for their constructor. Also, any enhancements to your logic are likely to be difficult and error-prone.

And for 4 classes? Unless this is simply an exercise for school (and it doesn't sound like it), I'd follow some of the good advice you've already been given - and Junilu's is pretty much exactly what I would have given you.

Probably not what you want to hear; but there it is.

Winston
 
Raymond Holguin
Ranch Hand
Posts: 82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the input guys, after doing some testing my code vs using some of the suggestions here gave only a few millisecond difference in parsing a 10k line file. So in the end efficiency wasn't really an issue. but I do hear you guys when it comes to good programming practice so for that I thank you for your tips in better code designs, I definitely learned some interesting things which was sort of the point of this post =)
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!