• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

best performance using minum memory

 
Ranch Hand
Posts: 798
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have an interview question I ever had.

We have a HUGE plain txt dictionary file, like
Life ddddd
Live:eeeeee

Now we need to build a dictionary application, search a word a user inputs, fetch the meaning back to user. Assumption: the file size is HUGE (40GB) we can't load all data into memory.

The question is , how to achieve the best performance using minum memory size ? Using what alogrithm and what Collection object ?

Have any idea?

Thanks.
 
lowercase baba
Posts: 13089
67
Chrome Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
a lot of times, they don't want a specific answer, but just want to see how you approach the problem. are you aware of tradeoffs you are making? are you making bogus assumptions? can you defend whatever answer you give?

often these are more important that the exact algorithm you give.
 
Edward Chen
Ranch Hand
Posts: 798
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Here is my thinking. Any comments are welcome.

Let me assume
1. the dictionary file is in order. (of course)
2. each word is just one line.

Here we only have one choice: randomAccessFile. So, we build an index file for that dictionay.
Dic['A']="100-200"; // a word begining with "A" is located from line 100 to line 200.
Dic['A']['A']="100-150"; // a word begining with "AA" is located from line 100-150.
So we have a 26X26 array, first time, we need to scan the huge file, but second time, we could serialize this array into a file.

User search a word, we just need the first two, then get array data, then use randomaccess file to getLine() to match.

///////////////////////////////////////
Here a problem left, how to scan it ?

Let me assue we have the total line numbers of this file.
We use multithread, for example, we cut wholes files into n parts , each of them will be 100 lines (n*100=total_lines). Then we create n thread to randomaccess file.

Then we use an algorithm like binary search, for example,
1. thread points to 2*100 line, is "Life".
2. "List" is not begining of "L", so we go back to 200-100 line,
3. the word in 200-100 line is "Knife", so we go forward 100/2,
4. repeat, repeat...
5. finally, we could get "L" begining line number.

Just a thinking. what is your idea ? Thanks.
 
reply
    Bookmark Topic Watch Topic
  • New Topic