• Post Reply Bookmark Topic Watch Topic
  • New Topic

Find the proper words in txt file with larg dataset (100 Bilion words)  RSS feed

 
Alex Ardoin
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I had an interview yesterday and I was given an exercise to solve.

The exercise was to implement a programm to find all anagram words in some file.txt document and output them in the console.

sampl.txt


aaab
heeh
aaba
abaa
fjwe
xyz
jpore
fioeoiw
zyx


output:


xyz zyx
aaab aaba abaa





b) another question in the interview was, how would your programm deal with a sample.txt file with 20 Million or 100 Billion of words? And how would you scale your programm if you want to cover this case?
I think if I had a file with a lot of data (billion of lines) I can not  hold all of them in RAM and that is a very difficult task in itself.

If I I want to cover this case I have to use some database to store the sorted words  and then query the database with MySQL.

I just want to hear your opinion to the b part?

I appreciate any help.

Regards
TheOcean

 
Randall Twede
Ranch Hand
Posts: 4696
8
Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm no expert on this. But I think you are on the right track. You can't store all that in ram. The heap.
 
Randall Twede
Ranch Hand
Posts: 4696
8
Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How much they offering you? I didn't read your code, but if it solved the original problem i would hire you. Any decent shop should be able you teach you how to scale.
 
Randall Twede
Ranch Hand
Posts: 4696
8
Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's like asking for a distributed solution. They should teach me that. Imo
 
Randall Twede
Ranch Hand
Posts: 4696
8
Java Scala
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Kind of reminds me of job listings that want 40 yrs experience in a dozen different languages.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!