Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Help needed : Peculiar File Comparision Requirement  RSS feed

 
sandeepz putrevu
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Frnds

This is my 1st post on javaranch.

This is what i need.

I have different sql files generated as dumps.
Now one machine has TOAD 8 installed on it and it generates queries.
same way another has TOAD 9 installed on it and this also generates many queries.

The issue is that these generate queries in different order.
But the requirement is that both versions should generate equal sql queries
irrespective of the order in which they are generated !!

This boils down to

I have two files , having queries in random order
Now i need to validate that both files have the same content(queries)
even though not in the same order.

Ex: File1

select * from emp;
select * from dept;

drop table emp;

Ex: File2

drop table emp;
select * from emp;
select * from dept;

Now the output should show that both files File1 and File2 have same content
though the order is different.

I am thinking of using a Hashtable ( with a key generated for each query in each file )

In that way i should assign

1=select * from emp;
2=select * from dept;
3=drop table emp;

in File1 and also in File2 for the same query.

Then compare the keys generated for both the files

like

if(aHashDataofFile1.keySet().equals(aHashDataofFile2.keySet()))
{
System.out.println("The two files have same data in them !!");
}

else
{
System.out.println("The two files have different data in them");
}


Anyone who can help me with this issue .......

Thanks

Sandeep
 
Chandra Bhatt
Ranch Hand
Posts: 1710
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
1- You store one file in the ArrayList line by line.
2- Take an array as marker, (mark the line which has be read, see next)
3- Now take another file and pick one line and find the match
in the ArrayList and it match found, mark the marker array element.
You wont use the arraylist element that has been used, use the marker array.
I am using marker if one query exists more than one time in the file.

Gotcha:
Whitespaces!!!

Remove all the whitespaces in between words first before you put the lines of the file in the ArrayList.


Hi Chandra

Hi gave those queries just for example.

The real queries are much longer and bigger in expand to 10 lines also

The only way I though was to index as key each query ( as each query is delimited by a / in the dumps generated)

and then compare keys for both the files.

Also there are inserts , deletes , grants , etc etc in the files
that are quite bigger.

Thanks
Sandeep


I think the semicolon is the delimiter for sql queries. as



I can suggest you to use split method of the String method or go with
Pattern and Matcher classes.


Thanks,
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Assuming that the number of queries is not too large to fit into memory, I suggest putting all the queries in HashSets. Put all queries from one file in one HashSet, and put all from the other file in a different HashSet. Then let the equals() method determine if the contents are identical.

Another approach would be to use two different ArrayLists, and call Collections.sort() on each list to put them in alphabetical order. Then you can loop through both lists simultaneously, comparing them line by line.

Ultimately the HashSet solution should be faster and require less code.
 
sandeepz putrevu
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Chandra and Jim for your valuable Suggestions.
I have got the logic right this time.

I asked for help just 6 hours back and there are replies in just minutes !!

This is the best java forum available on the planet.
I apologize to Henry ( MOD of SCJP) forum for a repost.


I used Hashtables.
ht1 , ht2
Snippets from my working code.........to enthusiasts

Hashtable ht1 = convertToHash("C:\\file1.sql");
Hashtable ht2 = convertToHash("C:\\file2.sql");
boolean filesEqual = true;
Iterator it = ht1.keySet().iterator();
while (it.hasNext()) {
String key = (String)it.next();
if (!ht2.containsKey(key)) {
filesEqual = false;
break;
}
}

if (filesEqual)
System.out.println("Files are equal");
else
System.out.println("Files are not equal");

and when I call the function convertToHash
the function does the following


static Hashtable convertToHash(String fileName) throws Exception{
Hashtable<String,String> ht = new Hashtable<String,String>();
BufferedReader in = new BufferedReader(new FileReader(fileName));


while( (line = in.readLine()) != null)
file.append(line);
String[] queries = file.toString().split(";");

for(String str:queries) {

ht.put(str, "");
}
return ht;
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another approach would be to use two different ArrayLists, and call Collections.sort() on each list to put them in alphabetical order. Then you can loop through both lists simultaneously, comparing them line by line.


Jim hit on one of my old favorites, from my very first training program. For two sorted lists:

If you want to dig into more details about how the files differ, feel free to play with my little TextDiff implementation. The default "reporter" prints the results to stdout. You could make your own reporter that tracks whether the two files have lines moved around or insert and deleted.

BTW: If you want to stay with your current algorithm ... Are you only storing "" as the value in the HashMap solution? If so, you can probably switch to ArrayList which will hold the lines in order without the extra "" values.
 
sandeepz putrevu
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi James...

Now I have modified my algorithm so that it displays the list of queries on file1 and file2 , then also print the queries that are missing from file1 and also in file2 respectively.

Will Go through your Textdiff algorithm also as it seems a much better analyser than mine.
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!