Win a copy of The Journey To Enterprise Agility this week in the Agile and Other Processes forum! And see the welcome thread for 20% off.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Campbell Ritchie
  • Tim Cooke
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Junilu Lacar
  • Knute Snortum
Saloon Keepers:
  • Ron McLeod
  • Ganesh Patekar
  • Tim Moores
  • Pete Letkeman
  • Stephan van Hulst
Bartenders:
  • Carey Brown
  • Tim Holloway
  • Joe Ess

excluding duplicated content  RSS feed

 
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Java friends,

This is my first post since my registration (few minutes ago). I am a new student to CS and practiculary to Java. I have few tasks to accomplish, from my university so that I can move on.
But I have difficulty with 2 of them. I am really BEGGING some one to help me finish them (or at least one of them) I would be very very greatiful.
Here is the task....

implement the following function/method:



Return a collection of all the files in a given directory dir.

When we say "all the files" we mean all the files...recursively. Meaning that the files contained in the subfolders all the way down should also be included. From those files, exclude the ones who are equal to previously discovered files. By equal we mean files with equal contents. We don't care about the timestamp or the file name.

EXAMPLE
Let's say that your current directory looks like this:



In the above structure our root folder contains only one file - readme.md. We wish to traverse it to the bottom of the heirarchy, thus the list of all files our program should select is - . And now for the filtering part. and reside in different directories and have different names, but their contents are the same. The result of the program should be:



or



It doesn't matter which file you choose to keep, as long as there are no duplicating ones in the final result.


DETAILS:

Check if the directory exists
And don't forget - locate the files in every level of the root directory and exclude the duplicating ones!




I have some knowledge the accomplish the task but I don't know how to exclude duplicated files by content.
I would very much appreciate your help.
 
author
Sheriff
Posts: 23568
138
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:By equal we mean files with equal contents. We don't care about the timestamp or the file name.



Sounds like you have to actually compare the files -- meaning opening them and checking each byte.

Some short cuts are available though. If the file sizes are different, then no need to check, the files are different. Also, once two bytes are different, then no need to continue checking the whole file, the files are already different.

Henry
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Henry Wong wrote:

Boris Petrov wrote:By equal we mean files with equal contents. We don't care about the timestamp or the file name.



Sounds like you have to actually compare the files -- meaning opening them and checking each byte.

Some short cuts are available though. If the file sizes are different, then no need to check, the files are different. Also, once two bytes are different, then no need to continue checking the whole file, the files are already different.

Henry







This is as far as my knowledge goes. And in my mine is going - "how to compare every file, with every file when I have only one iteration?"
As I said, this is as far as my knowledge goes. please help.
 
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You'll need something like a 'boolean isDuplicate(File f1, File f2)' method. As was mentioned, first check to see if the file sizes are the same, if they are, then open up both files and read and compare each byte from the two files, stopping as soon as you find a byte that doesn't match.

In your directory traversal code I'd make a List<File> of all the files you find and then sort the list by file size, you'll only need to compare files that are the same size, a file that has a size that no other file has will be a unique file.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:You'll need something like a 'boolean isDuplicate(File f1, File f2)' method. As was mentioned, first check to see if the file sizes are the same, if they are, then open up both files and read and compare each byte from the two files, stopping as soon as you find a byte that doesn't match.

In your directory traversal code I'd make a List<File> of all the files you find and then sort the list by file size, you'll only need to compare files that are the same size, a file that has a size that no other file has will be a unique file.






Ok, I made something. Is that isDuplicated correct? would it work?
 
Carey Brown
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:

Ok, I made something. Is that isDuplicated correct? would it work?


Looks to be correct and it should work. However, you are reading entire files into memory, this has two problems, 1) if the files are huge you may get an out-of-memory exception, and 2) you are going through all of the I/O overhead for the entire file when they might be different in the first byte. To fix this you'd need to use some lower level IO functionality. I'd probably leave it like you have it for now and work on other aspects of the program first, then come back and revisit this later.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ok, this is my code, after some thinking and took some advises:



it looks pretty straightforward and simple, but I got nothing in return (empty lists, and print lines).
I know a have bugs somewhere but I am not so experienced (actually I am new ) to find out where is my logic leak....
would you please help?
 
Carey Brown
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Line 18: for(final Path alreadySeen : uniqueFiles)
uniqueFiles starts out empty so you never enter the loop.

My suggestion is to have the recursive method do nothing but add found files to a List<File>, and not try to do anything else in this method. Then sort the list by file size and traverse the list looking for adjacent entries with the same size and only then call isDuplicated().
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:Line 18: for(final Path alreadySeen : uniqueFiles)
uniqueFiles starts out empty so you never enter the loop.

My suggestion is to have the recursive method do nothing but add found files to a List<File>, and not try to do anything else in this method. Then sort the list by file size and traverse the list looking for adjacent entries with the same size and only then call isDuplicated().



I can' say that i totally understand you, would you do some editing to my code?
 
Carey Brown
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I didn't fill in all the blanks for you but this should move you along a bit. Try to have your methods only do one thing.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:
I didn't fill in all the blanks for you but this should move you along a bit. Try to have your methods only do one thing.




wOOOOOW, dude, i really, really appreciate what you do. But.... is it that simple, huh? As i said, I am new to programming, and i am just diving into Java.
I can't say that I understand this professional code. specially the part after PathComperatorBySize, I do not know where even my isDuplicated code goes in?
I barely read someone's else code.

Should i focus only where TODO place to fill some code, or...?
 
Carey Brown
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:wOOOOOW, dude, i really, really appreciate what you do. But.... is it that simple, huh? As i said, I am new to programming, and i am just diving into Java.
I can't say that I understand this professional code. specially the part after PathComperatorBySize, I do not know where even my isDuplicated code goes in?
I barely read someone's else code.

Should i focus only where TODO place to fill some code, or...?



Focus only on the TODOs for now and use isDuplicated() in one of those. You will also need to get your files ready for output and, finally, output them.

I put the Comparator in there for you because they can be non-intuitive. Suggest you read up on them some time as you will be using them on occasion. Comparators allow you to sort objects by any criteria you care to design, in this case, file size.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:

Boris Petrov wrote:wOOOOOW, dude, i really, really appreciate what you do. But.... is it that simple, huh? As i said, I am new to programming, and i am just diving into Java.
I can't say that I understand this professional code. specially the part after PathComperatorBySize, I do not know where even my isDuplicated code goes in?
I barely read someone's else code.

Should i focus only where TODO place to fill some code, or...?



Focus only on the TODOs for now and use isDuplicated() in one of those. You will also need to get your files ready for output and, finally, output them.

I put the Comparator in there for you because they can be non-intuitive. Suggest you read up on them some time as you will be using them on occasion. Comparators allow you to sort objects by any criteria you care to design, in this case, file size.




OK. My dear friend, thanks to you, I finally got the result(output) I needed. And this is the code for it:...



thus, it would look ridiculous if i post code like this (even if i got the output i needed)
In our "chat" I forgot that my assignment was to have only one function [code=java] listDuplicatedFiles(dir) [code]
and it should be done recursively. I will try editing this code or maybe stick to my previous one.

I got out of the assigment. I will try to fix things tonight and post what I did tomorrow.
Thank you very much for your participation. god bless.
 
Carey Brown
Bartender
Posts: 4519
50
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:

thus, it would look ridiculous if i post code like this (even if i got the output i needed)
In our "chat" I forgot that my assignment was to have only one function [code=java] listDuplicatedFiles(dir) [code]
and it should be done recursively. I will try editing this code or maybe stick to my previous one.

I got out of the assigment. I will try to fix things tonight and post what I did tomorrow.
Thank you very much for your participation. god bless.



Somewhere in line 30 you'll need to call isDuplicated().
 
lowercase baba
Bartender
Posts: 12627
50
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:I can' say that i totally understand you, would you do some editing to my code?


Be EXTREMELY careful here...

You have implied this is an assignment for your university class. Will your professor be OK with someone else writing part of your code? Many schools would consider it a form of cheating, if not out-and-out plagiarism.

Obviously the rules differ from school to school, but I would suggest you be very careful - especially when it is posted on the internet for all to see.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

fred rosenberger wrote:

Boris Petrov wrote:I can' say that i totally understand you, would you do some editing to my code?


Be EXTREMELY careful here...

You have implied this is an assignment for your university class. Will your professor be OK with someone else writing part of your code? Many schools would consider it a form of cheating, if not out-and-out plagiarism.

Obviously the rules differ from school to school, but I would suggest you be very careful - especially when it is posted on the internet for all to see.



It's OK. Because, we went a head of what we have been taught. I wasn't asking to write code for me, just edit the places, where I have bugs or logic inconsistency.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK, I think that this code should do exactly what I need, but i doesn't. some help?


 
Henry Wong
author
Sheriff
Posts: 23568
138
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:OK, I think that this code should do exactly what I need, but i doesn't. some help?



Perhaps this should have been mentioned earlier (but surprisingly, you seem to be getting some responses regardless) ... but you would likely get better (and more) responses, if you TellTheDetails. Also, expect to do the heavy lifting, as we are NotACodeMill.

Henry
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Henry Wong wrote:

Boris Petrov wrote:OK, I think that this code should do exactly what I need, but i doesn't. some help?



Perhaps this should have been mentioned earlier (but surprisingly, you seem to be getting some responses regardless) ... but you would likely get better (and more) responses, if you TellTheDetails. Also, expect to do the heavy lifting, as we are NotACodeMill.

Henry



Dear Henry, I think that in my first post i mentioned all the details of the output I am trying to get. I don't think I left anything behind. I also wasn't trying to make someone finish my assignment.
I post a code that with bugs (that's what newbies like me do) trying to get some help. After all, I think that is what forums (mostly) are.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Henry Wong wrote:

Boris Petrov wrote:OK, I think that this code should do exactly what I need, but i doesn't. some help?



Perhaps this should have been mentioned earlier (but surprisingly, you seem to be getting some responses regardless) ... but you would likely get better (and more) responses, if you TellTheDetails. Also, expect to do the heavy lifting, as we are NotACodeMill.

Henry



and by some help, i don't mean write my code for me, but if I do something wrong, i expected something like " ow, you are trying to do this, while you should try doing that".
You know, even in the gym you get always guidance, when you do something not the way it should. Even if you don't ask for it. (I am not sure how often you visit the gym)
 
author & internet detective
Marshal
Posts: 38502
653
Eclipse IDE Java VI Editor
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Boris Petrov wrote:OK, I think that this code should do exactly what I need, but i doesn't. some help?[/code]


As Henry mentioned, please TellTheDetails. We know you posted the spec at the beginning. However this last post just posts the current state of affairs and says it doesn't work. How exactly doesn't it work. You've already run it so know how it differs from the expected output. Please share.

And yes, we do provide guidance here. Part of that is asking questions that lead the student to the answer. Sometimes answering these questions, prompts the answer.

As an example:
I tested it with a file containing"hello" and another containing "world". I was expecting the name to be printed out once, but it was printed out twice.

I do see a problem in your code though. In PathComparator and processList, you use a different definition of whether two files are the same.
 
Boris Petrov
Greenhorn
Posts: 21
Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jeanne Boyarsky wrote:

Boris Petrov wrote:OK, I think that this code should do exactly what I need, but i doesn't. some help?[/code]


As Henry mentioned, please TellTheDetails. We know you posted the spec at the beginning. However this last post just posts the current state of affairs and says it doesn't work. How exactly doesn't it work. You've already run it so know how it differs from the expected output. Please share.

And yes, we do provide guidance here. Part of that is asking questions that lead the student to the answer. Sometimes answering these questions, prompts the answer.

As an example:
I tested it with a file containing"hello" and another containing "world". I was expecting the name to be printed out once, but it was printed out twice.

I do see a problem in your code though. In PathComparator and processList, you use a different definition of whether two files are the same.



I agree what what you said. I will mark this topic solved, because I already fixed it. Just to mention that i didn't use (back then) PathComperator & processList.
Thank you all for your help.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!