• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

sorting csv file for fixing column order

 
Padmanabh Sahasrabudhe
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I get a csv file from an export utility which has different column order every time. For example the first time it may export the following csv file

A,B,C
1,a,e
3,q,w
2,e,r

The second time it may export the same file as following data:

B,A,C
a,1,e
e,2,r
q,3,w

I am not bothered about change in row order since I have a program which can compare the two csv files correctly even if they have rows out of order but I don't know how to overcome the change in column order. Is there a way to process this csv file and get another file of fixed column order?

Thanks.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49793
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My, that looks like a strange problem. You cannot specify the output into the csv file so the column orders are fixed?
How many columns are there? Can you create factory methods which take the different values in different orders? Remember the number of methods required is n! where n is the number of columns which might be reordered.
Are the column names always the same? Can you create some sort of map from column name to column value?
 
Junilu Lacar
Bartender
Posts: 7595
53
Android Eclipse IDE IntelliJ IDE Java Linux Mac Scala Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Am I correct to assume that A, B, and C are the column headers and 1, 2, and 3 are your row headers?

If so, then it's simply a matter of mapping against row and column headers instead of just row headers (since you mentioned that the row ordering doesn't bother you). Show us some code so we have a better idea of how you're doing the comparison and where it is messing up.
 
Padmanabh Sahasrabudhe
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu,
ABC are column headers. But 1,2,3 need not be row headers since the order in which the data is exported is uncertain. Assuming I get consistent column order (say A,B,C everytime) I use following code. But I am not sure how to deal with it when it comes out of order (B,A,C).



Thanks,
Padmanabh
 
Campbell Ritchie
Sheriff
Pie
Posts: 49793
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why are you using sets rather than maps?
 
Padmanabh Sahasrabudhe
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ritchie,

Not sure what you meant? Could you please demonstrate with little code? Also, how using maps will help me getting rid scenario where columns and rows both are out of order?

Thanks,
Padmanabh
 
Campbell Ritchie
Sheriff
Pie
Posts: 49793
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A map would allow you to retain the relationship between the column and its contents. Are you simply trying to see whether the contents of the columns form disparate sets or not? If so, then sets are all right.
 
Padmanabh Sahasrabudhe
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I only wish to see if the rows which file 1 has are all present in file 2 or not. I need not retain them. The contents of the two files should same row wise.
 
Campbell Ritchie
Sheriff
Pie
Posts: 49793
69
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you worry about duplicates or ordering? Sets will only work if ordering and duplicates are not significant. Can you use the equals() method to check for equality of contents?
 
Winston Gutkowski
Bartender
Pie
Posts: 10504
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Padmanabh Sahasrabudhe wrote:I only wish to see if the rows which file 1 has are all present in file 2 or not. I need not retain them. The contents of the two files should same row wise.

It's not yet clear from you explanation, but I suspect you have two separate problems here: row order and column order. The first is (probably) a simple sorting exercise, the other is a mapping one, which supports Campbell's post.

For the latter, you will need some way of specifying the new order for your columns.

The simplest way to do that in Java is to supply column indexes in the order that you want them output, so if you intend to supply column identifiers instead (which, I assume, is what 'A', 'B', 'C'...etc. are), then you will need some way of translating your "new column order" input into a set (or array) of indexes, and then using that to rearrange the output for each line.

It should be added that reading CSV files can be quite involved: It's not simply a case of splitting data based on commas (unless you're absolutely sure that's the case), so you might want to look at third party libraries for reading your files.

Alternatively, if this CSV is generated from an Excel spreadsheet, you might want to look at Apache POI, because you may well be able to process it directly, rather than via CSVs.

HIH

Winston
 
Padmanabh Sahasrabudhe
Ranch Hand
Posts: 53
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All,

I think I need to reframe my problem. I was behind a wrong issue. My issue is I have these two files

A,B,C
1,a,e
3,q,w
2,e,r

and
B,A,C
a,1,e
e,2,r
q,3,w

Technically, both of these files contains same data which I want to verify the same through my code. Winston, I am not generating data from Excel but thanks for your pointers.
 
Winston Gutkowski
Bartender
Pie
Posts: 10504
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Padmanabh Sahasrabudhe wrote:Technically, both of these files contains same data which I want to verify the same through my code. Winston, I am not generating data from Excel but thanks for your pointers.

Right, well first you need to arrange the data from both files in the same column sequence; and to do that you must have:
  • A line that identifies columns uniquely.
  • A way of knowing which line that is (in your case, it would appear to be the first).

  • In addition, you may also need to know:
  • The order they should be in (ie, a way of ordering columns by their ID). If you haven't been told that, then you'll need to choose one for yourself.
  • What to do with duplicated and/or missing columns, if such situations are allowed. (NOTE: Only one file will be able to have them).

  • After that, it's simply an issue of mapping columns in a consistent order and (I suspect) sorting rows based on their "mapped" content - unless you want some form of diff algorithm, which is rather more advanced.

    HIH

    Winston
     
    Padmanabh Sahasrabudhe
    Ranch Hand
    Posts: 53
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Sorted out the column order issue. I am pasting the code here for others if ineterested. I used opencsv here.

     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic