• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Best way of checking if duplicate file exists?

 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi guys,
I just had an interview question as follows:

Suppose you have a music songs storage service in which you can upload your favorite songs there - for example all of Bruno Mars songs.
And it is possible of course that other users will want to upload their Bruno Mars songs - but it will not be space efficient from your side as a storage provider.
You don't want multiple same songs copies on your storage.

Given there can be millions of songs, and that different multiple songs may own the same name ( like: "I love you" ) what is the best way to check if a song already exists in your storage?

I offered to built a tree of bytes to some extent and to verify the beginning of the file for overlapping... I know it is not ideal though...

Thanks, Tal
 
Marshal
Posts: 79153
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Don't give out a straight answer for such an interview question; start by asking things like how the songs are going to be stored, and whether there are any other data, including name of performer.
 
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The first question would be "when does a file already exist?". Is it when the file name matches, the content, or something else? Depending on the answer there are different solutions.
 
Ranch Hand
Posts: 574
VI Editor Chrome Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
While I agree with the Marshall, I have to wonder how well a map would work.  The key is the artist/song title, the value is the music file.  You'll need a bit of hand waving to ensure Fred's "I'm in love" doesn't collide with Fred's "Im in Love" live in Prague, but the ultimate goal is to stuff it into a map.
 
Ranch Hand
Posts: 91
Netbeans IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you can (directly) get or re-build the ISRC (International Standard Recording Code), you could use it to uniquely identify each file/song...

http://isrc.ifpi.org/en/
 
Campbell Ritchie
Marshal
Posts: 79153
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jim Venolia wrote:. . . I have to wonder how well a map would work. . . . .

Maybe a Map would work, but at the early stages of answering such a question in an interview, saying, “I'd use a Map,“ will, at best, elicit a response like, “Why?” and at worst, “Don't call us; we'll call you.”
 
Tal Tab
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Guys thank you for your some serious some derisive replies. However, I'd really like to learn what would be an acceptable solution...
BTW, the music files were just an example. It should work for any file types.
Thanks.
 
Jim Venolia
Ranch Hand
Posts: 574
VI Editor Chrome Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As the Marshall says, questions like this are intended to get you to ask questions to find out exactly what they want.

That said, I don't see why having a map as your underlying data structure wouldn't work.  You'd want to fiddle with the input a bit to get it into a well defined form, then see if the key already exists, if so figure out if you want to replace the existing object or keep the old one.  Do you want to tell the user there's already an object by that name, or silently Do Something?  How many objects do you expect to store?  Will a map work on a dataset that size?  Maybe using a database is the way to go.   Maybe you want to store it all on The Pirate Bay and let them pay for hosting

Doesn't really matter what the answer is, the more intelligent questions you ask the better you look.
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tal Tab wrote:I'd really like to learn what would be an acceptable solution...
BTW, the music files were just an example. It should work for any file types.


Much of the time, they're not asking you this question to see what solution you have. It's to see how you approach problems.  One of the worst things you can do as a coder is immediately start writing code, or deciding on an implementation strategy before you understand the problem.  The best thing to do is push back, and find out what the parameters are. Look for edge cases.  think outside the box.

you can't have an acceptable solution until you know as many details as possible. so again, what makes two files the "same"?  If one file has an addition second of silence on the end, but is otherwise identical, is that the same?  What if it's byte-for-byte identical, but one came from a studio album, and another came from a greatest hits compilation?  What if one file name is in all caps, but another has mixed case?

unless you have the answers to these questions (and many others), you really can't come up with a solution.
 
Campbell Ritchie
Marshal
Posts: 79153
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tal Tab wrote:Guys thank you

That's a pleasure

. . . some serious some derisive replies. . . . .

Nobody was being derisive; I did however point out that interview questions never have a straight answer.
 
Drove my Chevy to the levee but the levee was dry. A wrung this tiny ad and it was still dry.
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic