• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Inverted Index

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My task is to  build an:
Inverted File:
You should build the inverted index that contains the word/posting pairs. In the simplest form a pair as document ID and term frequency is kept there. You can use a hash table for the list. If a word has already existed in the index file, you can store the record containing the id of this document and the term frequency to its posting. Otherwise you should create a new key and store the corresponding
postings in the posting list. The above steps are used to create the index file. After you have such kind of index,
you can provide the search service. You may adopt the simplest tf weighting scheme
and assume that all of the key

This is the code i have so far:




My code reads the term in each text file, and its frequency. But I also want it to display the name of the document id for example:

ID FREQ TERM
Document 1 Hello
Document 1 Example



any ideas? thank you
 
Taylanaa Aksinaa
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My output is:



But under ID, i want it to code the .txt files and give it a ID.
 
Bartender
Posts: 2236
63
IntelliJ IDE Firefox Browser Spring Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you clarify what do you mean by under ID, i want it to code the .txt files and give it a ID?
I don't understand this part.
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please look at the indentation in your code (e.g. lines 56‑75); it is really inconsistent and it will make it much harder for you to find your way around your code.
Why are you using a tree map? Do you need the “K”s sorted? If not, declare all references to your map as Map and instantiate what you have as a HashMap. You will notice you are getting the words in alphabetical order, which you wouldn't with a HashMap.
I don't understand the presence of the method starting line 19. Are you actually using it?
What does the array at the end of the code do?
You have all your methods static, and you are passing parameters from one to the other. But Java® is supposed to be an object language. You aren't creating any objects to encapsulate your data, otherwise you could have the Map as a field.
You don't appear to be closing the Scanner at the end of the code. Find out how to close it with try‑with‑resources; then you will want all the reading code inside the body of the try. I am not sure the return in line 66 is doing you any good; if you fail to find a particular file, you will terminate the method.
I am not sure I understand the instructions about the ID for the file. Sorry.
 
Saloon Keeper
Posts: 10705
86
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your call to scan.next() is sucking in punctuation. Your list ends up with "cats" and "cats,". To eliminate punctuation:
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Carey Brown wrote:. . .

What about "\\W+" as a delimiter?
 
Carey Brown
Saloon Keeper
Posts: 10705
86
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

Carey Brown wrote:. . .

What about "\\W+" as a delimiter?

Possible, but \w also accepts 0-9 and _. \W is of course the inverse of that.
reply
    Bookmark Topic Watch Topic
  • New Topic