programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Devaka Cooray
• Knute Snortum
• Paul Clapham
• Tim Cooke
Sheriffs:
• Liutauras Vilda
• Jeanne Boyarsky
• Bear Bibeault
Saloon Keepers:
• Tim Moores
• Stephan van Hulst
• Ron McLeod
• Piet Souris
• Frits Walraven
Bartenders:
• Ganesh Patekar
• Tim Holloway
• salvin francis

# How do you find the median of a CSV file?

Ranch Hand
Posts: 121
2
I have a CSV file with several types of data. One column is all doubles, the other all ints, and the remaining are all strings. Am wondering how one might take that information into account when trying to calculate the median. I understand in the math world, the median is when you find the "middle number". So for the columns they are unsorted, so first I'll need some way of sorting the columns from least to greatest. Not sure how this would be done in code. Also, what if the column is even or odd? If it's even, then somehow the code must take two middle numbers and divides that by two?

If possible could someone give a step-by-step process of how to solve this?
Thank you

Bartender
Posts: 10759
68

Justin Robbins wrote:I have a CSV file with several types of data. One column is all doubles, the other all ints, and the remaining are all strings.

OK, well that sounds wrong right there.

What are they? What data do they represent? What do you mean by "the median"?

I understand in the math world, the median is when you find the "middle number". So for the columns they are unsorted, so first I'll need some way of sorting the columns from least to greatest.

Also, what if the column is even or odd? If it's even, then somehow the code must take two middle numbers and divides that by two?

That's generally only an issue when whatever the "value" you're counting is numeric - and it doesn't have to be.

For example, the "median state" in the US for income, based on some sample you take, could be either "Illinois" or "Maryland", but there's no way to decide between them, since exactly half of your sample lies in the bottom half topped by Illinois, and the top half is "bottomed" by Maryland.

You can't have "Illiois.5".

Winston

Marshal
Posts: 5993
156
I would read all the data into a list using something like the Apache Commons CSV Reader, then find the median of the list.

Bartender
Posts: 5851
57
• Create a class ("Record"?) to represent a single row in the CSV file.
• Have one field per column.
• Most fields will be String except for a few that might be of type Double or Integer.
• Have a constructor that takes a single line from the CSV file, splits it using the delimiter (assuming (,)) and populates the fields.
• Have a main class with a main() method that opens the CSV file.
• Iterate through all the lines.