aspose file tools*
The moose likes Hadoop and the fly likes Hadoop in enterprise Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hadoop in enterprise" Watch "Hadoop in enterprise" New topic
Author

Hadoop in enterprise

Vicky Pandya
Ranch Hand

Joined: Dec 16, 2004
Posts: 148
Hello Chuck,

What would be the use case to use hadoop in an enterprise searches?
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

If you have a whole bunch of data to search.
Vicky Pandya
Ranch Hand

Joined: Dec 16, 2004
Posts: 148
I do have data to search and use Solr for searching. Trying to understand real usage of Hadoop. If I understand it correct, Hadoop isn't a replacement of Solr/Lucene right?
andrew ennamorato
Ranch Hand

Joined: Oct 03, 2007
Posts: 100
I know of one startup that is looking at migrating to HBase (Hadoop version of Google's BigTable) instead of Oracle. So I'm sure in an avg enterprise there are plenty of DB instances where it might be useful to ride on top of the Hadoop infrastructure for more than just search.
David Newton
Author
Rancher

Joined: Sep 29, 2008
Posts: 12617

No, it's not a replacement for a search engine--it's not a search engine.

You probably are thinking of something like this:

http://katta.sourceforge.net/

You'd probably need an awful lot of data for this to be helpful.
Tibi Kiss
Ranch Hand

Joined: Jun 11, 2009
Posts: 47
Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing.
Every search engine has a data processing requirement until the data is indexed etc.
Really big search engines needs really big data processing frameworks. Hadoop is the one.

But the category of data processing doesn not reduce to search index processing, but there are plenty of problem domains which can be covered. For example DNA alignment in bioinformatics, various other bioinformatics subjects, all becaues of large genome datasets. There are also graph processing problems where the amount of data is huge and cannot be loaded all of them in memory. There are decision systems, EM (Expectation Maximization) algorithms and other AI subjects, especially those which requires strong mathematical background, such as data-mining. In other words I would say that the category of problems which you cannot solve efficiently in J2EE model, or in simple database applications.
Chuck Lam
author
Greenhorn

Joined: Aug 09, 2010
Posts: 12
In fact, the last chapter of my book has a whole case study on how IBM uses Hadoop to implement its intranet search.

Long story short, Hadoop can be helpful in enterprise search when you need to implement search in a distributed system. And the main reasons for needing a distributed system in search are scale and complexity. When you're indexing lots of data (IBM's intranet is quite huge), using Lucene/Solr on a single machine would be too slow. Similarly, if you need to do any complex indexing, such as natural language processing, you will easily outgrow the capability of a single machine.
Srinivas Mupparapu
Greenhorn

Joined: Feb 12, 2004
Posts: 14

To add to what Tibi Kiss has said above, one can use Hadoop to store large data in it and use MapReduce framework to index the data using Lucene. You can then make the resultane Lucene index documents searchable using Solr.
Phil Morettini
Greenhorn

Joined: Sep 18, 2013
Posts: 2
I wanted to let you know about an upcoming webinar about optimizing search in NoSQL database applications:

Rich Search with NoSQL: Why now?

As developers are rapidly moving to NoSQL for its speed and flexibility, search often becomes the new bottleneck. In this webinar we will cover various topics to optimize text search in NoSQL applications. Included will be a live installation/configuration of the SRCH2 search engine in a MongoDB application. Attend the webinar by signing up at: http://srch2.com/webinar.html

 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Hadoop in enterprise