Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

How do you find the median of a CSV file?  RSS feed

 
Justin Robbins
Ranch Hand
Posts: 121
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a CSV file with several types of data. One column is all doubles, the other all ints, and the remaining are all strings. Am wondering how one might take that information into account when trying to calculate the median. I understand in the math world, the median is when you find the "middle number". So for the columns they are unsorted, so first I'll need some way of sorting the columns from least to greatest. Not sure how this would be done in code. Also, what if the column is even or odd? If it's even, then somehow the code must take two middle numbers and divides that by two?

Pretty confused about this.


If possible could someone give a step-by-step process of how to solve this?
Thank you
 
Winston Gutkowski
Bartender
Posts: 10573
65
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Justin Robbins wrote:I have a CSV file with several types of data. One column is all doubles, the other all ints, and the remaining are all strings.

OK, well that sounds wrong right there.

What are they? What data do they represent? What do you mean by "the median"?

I understand in the math world, the median is when you find the "middle number". So for the columns they are unsorted, so first I'll need some way of sorting the columns from least to greatest.

Which again we can't really help you with unless we know more about what we (or actually you) are talking about.

Also, what if the column is even or odd? If it's even, then somehow the code must take two middle numbers and divides that by two?

That's generally only an issue when whatever the "value" you're counting is numeric - and it doesn't have to be.

For example, the "median state" in the US for income, based on some sample you take, could be either "Illinois" or "Maryland", but there's no way to decide between them, since exactly half of your sample lies in the bottom half topped by Illinois, and the top half is "bottomed" by Maryland.

You can't have "Illiois.5".

Tell us more about this file and exactly what it contains, and then we'll be able to help you better.

Winston
 
Knute Snortum
Sheriff
Posts: 4070
112
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would read all the data into a list using something like the Apache Commons CSV Reader, then find the median of the list.
 
Carey Brown
Bartender
Posts: 2980
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
  • Create a class ("Record"?) to represent a single row in the CSV file.
  • Have one field per column.
  • Most fields will be String except for a few that might be of type Double or Integer.
  • Have a constructor that takes a single line from the CSV file, splits it using the delimiter (assuming (,)) and populates the fields.
  • Have a main class with a main() method that opens the CSV file.
  • Iterate through all the lines.
  • Add new Records to a list using each read line.
  • Create a Comparator class (or classes) that define how you want the list to be sorted.
  • Sort the list.
  • Go to the middle of the sorted list and pull out the field in the Record that you are interested in.
  •  
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!