i'm trying to compare the content of two csv files. I have the csv file test1.csv and test2.csv. The content from both should be the same.. if not , then I want to transfer the difference into a .txt file. If every thing is equal, everything is correct.
I just created two test csv files with columns and rows with content
The first column is a primary key of the respective table. I want to compare it by the identificator
The Output here in the .txt should be the row "1,Max,New York"
I just have no code. And I am happy about every adivce and hint I can get. Thank you in advance.
csv files are just values separated by commas and line separators. You can just go through the text line by line and compare them. You wanna put more logic than that probably but you can probably start with that.
Using Eclipse and some sort of diff utility does not conflict with one another. But if you're set on not saving yourself the work of writing a diff tool, I suggest to use one of the existing CSV libraries - writing a CSV parser that covers the edge cases is more work than it looks at first; see https://coderanch.com/wiki/660373/Accessing-File-Formats in the "Excel" section.
POI handles Microsoft Office file formats, which CSV is not.
I implemented a code which stores the content of each CSV files in a array list.. But is that a good solution ? My college told me something about hashmaps?! Is that a better solution to store the content of a csv file ?? And how can I print the line (based on the primary key of the table) which is different in both files..
It's not immediately obvious to me how you'd store the contents of a CSV file as a HashMap (although it is of course possible to come up with a way that utilizes them). So I can't opine on whether it might be better in some way; maybe ask your colleague what he has in mind.
You seem not to want to look into an existing library - why is that?
Al Hobbs wrote:Is it possible to use 'fc' in the case of windows or 'diff' from a java program? If it's possible is that even recommended or not?
Yes, that's possible. Runtime.exec and the more modern ProcessBuilder class make it possible. I don't see why one wouldn't use those tools if they're available, although they do make the code less portable - which may or may not be a consideration.
Read the entire first file, and put it into a List. Then read the second file one row at a time, and compare each row to all the rows of the first file to see if it's a duplicate. If it's not a duplicate, then it's new information. If you're having trouble with reading, look at http://opencsv.sourceforge.net/, it's a pretty good library for reading CSV files in Java.
I hope not. I think you want to code that in Java. Eclipse is just a program to help develop Java applications - and other things.
There are two ways to compare CSV's. One is via the Unix-style "diff" program, which does character-by-character comparisons. This only compares raw text, though. Windows has a COMPARE program, but I think it only checks for complete equality, not line-by-line differences.
The other way is to parse the files into their constituent components and compare component-by-component. That catches value differences while ignoring differences in how elements were quoted, spaces between elements, and the like.
CSV's do not have "key" fields. If you want to compare by keys, you either have to pre-sort the files to be compared or you'll have to read one of the files into memory so that you can access its lines randomly.
Some people, when well-known sources tell them that fire will burn them, don't put their hands in the fire.
Some people, being skeptical, will put their hands in the fire, get burned, and learn not to put their hands in the fire.
And some people, believing that they know better than well-known sources, will claim it's a lie, put their hands in the fire, and continue to scream it's a lie even as their hands burn down to charred stumps.
My simple solution in case you want to compare two csv responses stored in string variables (in the case you get them through a REST call). In my case I wanted to exit the check after a threshold of 10 different lines.