• Post Reply Bookmark Topic Watch Topic
  • New Topic

How To Make Search Engine  RSS feed

 
Isaac Hewitt
Ranch Hand
Posts: 191
Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to make a search engine for song titles in the form of MP3 files. The user should type a song into a text field, but not necessarily correctly spelt, nor the precise words of the song title.
Can anyone recommend what Java classes I should consider using? I was thinking along the lines of Java Regex to weed out common words to begin with, and then the Scanner class to compare each word individually, but there may be other newer classes I am not aware of.
Thanks.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The first major design decision would be related to how the data is stored, not details of regex.

SO - where do these titles come from/

Bill
 
Isaac Hewitt
Ranch Hand
Posts: 191
Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The data is stored for now at least simply as MP3 files, but I could put the file names into a database and work from there. I am not familiar with how to set up a data base and what kind. It would be nice to be able to set up a database programatically that would scan all the MP3 files for which there are more than 2000 on the hard drive.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you want to deal with spelling errors you'll also need to use any (or all) of several algorithms used to determine word similarity. A web search will turn up many. Harder if you're dealing with a relational DB unless it has them built-in, or your query can include "similar" spellings (though that seems a limiting approach).
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
IF this was my problem I would first try to capture the title etc. data from the MP3 files.

See the java.io.File class for methods to extract file names from directories.

From this article it looks like extracting additional text data from MP3 files - such as artist, album title, etc. is possible but tricky.

Bill
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are several Java MP3/ID3 libs available; the ones I know about are:

http://www.jthink.net/jaudiotagger/
http://entagged.sourceforge.net/

Makes common MP3 tag manipulation trivial.
 
Leandro Coutinho
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe Hibernate Search can help. But the hibernate site is down now :/
 
Isaac Hewitt
Ranch Hand
Posts: 191
Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you to Leandro Coutinho, David Newton, and William Brogden for your thoughtful suggestions. I will follow through on those.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Once you have a complete file of the text you want to search you can then create a dictionary of all the existing words. Now your problem is ensuring that searches only use existing words when creating a search.

Regex could help with partial word matches, metaphone "sounds like" will help guide to the correct spelling.

If searches can only use existing words, you can pre-process an inverted index leading to almost instantaneous searches.

Bill
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!