Forums Register Login

How To Make Search Engine

+Pie Number of slices to send: Send
I want to make a search engine for song titles in the form of MP3 files. The user should type a song into a text field, but not necessarily correctly spelt, nor the precise words of the song title.
Can anyone recommend what Java classes I should consider using? I was thinking along the lines of Java Regex to weed out common words to begin with, and then the Scanner class to compare each word individually, but there may be other newer classes I am not aware of.
Thanks.
+Pie Number of slices to send: Send
The first major design decision would be related to how the data is stored, not details of regex.

SO - where do these titles come from/

Bill
+Pie Number of slices to send: Send
The data is stored for now at least simply as MP3 files, but I could put the file names into a database and work from there. I am not familiar with how to set up a data base and what kind. It would be nice to be able to set up a database programatically that would scan all the MP3 files for which there are more than 2000 on the hard drive.
+Pie Number of slices to send: Send
If you want to deal with spelling errors you'll also need to use any (or all) of several algorithms used to determine word similarity. A web search will turn up many. Harder if you're dealing with a relational DB unless it has them built-in, or your query can include "similar" spellings (though that seems a limiting approach).
+Pie Number of slices to send: Send
IF this was my problem I would first try to capture the title etc. data from the MP3 files.

See the java.io.File class for methods to extract file names from directories.

From this article it looks like extracting additional text data from MP3 files - such as artist, album title, etc. is possible but tricky.

Bill
+Pie Number of slices to send: Send
There are several Java MP3/ID3 libs available; the ones I know about are:

http://www.jthink.net/jaudiotagger/
http://entagged.sourceforge.net/

Makes common MP3 tag manipulation trivial.
+Pie Number of slices to send: Send
Maybe Hibernate Search can help. But the hibernate site is down now :/
+Pie Number of slices to send: Send
Thank you to Leandro Coutinho, David Newton, and William Brogden for your thoughtful suggestions. I will follow through on those.
+Pie Number of slices to send: Send
Once you have a complete file of the text you want to search you can then create a dictionary of all the existing words. Now your problem is ensuring that searches only use existing words when creating a search.

Regex could help with partial word matches, metaphone "sounds like" will help guide to the correct spelling.

If searches can only use existing words, you can pre-process an inverted index leading to almost instantaneous searches.

Bill
Destiny's powerful hand has made the bed of my future. And this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 1940 times.
Similar Threads
Buy words!
JSP and IIS server??
Search MP3 files
How to code search engine in java.
how to search?
More...

All times above are in ranch (not your local) time.
The current ranch time is
Apr 15, 2024 22:59:58.