File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Hadoop in enterprise

 
Vicky Pandya
Ranch Hand
Posts: 148
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello Chuck,

What would be the use case to use hadoop in an enterprise searches?
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you have a whole bunch of data to search.
 
Vicky Pandya
Ranch Hand
Posts: 148
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I do have data to search and use Solr for searching. Trying to understand real usage of Hadoop. If I understand it correct, Hadoop isn't a replacement of Solr/Lucene right?
 
andrew ennamorato
Ranch Hand
Posts: 100
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know of one startup that is looking at migrating to HBase (Hadoop version of Google's BigTable) instead of Oracle. So I'm sure in an avg enterprise there are plenty of DB instances where it might be useful to ride on top of the Hadoop infrastructure for more than just search.
 
David Newton
Author
Rancher
Posts: 12617
IntelliJ IDE Ruby
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No, it's not a replacement for a search engine--it's not a search engine.

You probably are thinking of something like this:

http://katta.sourceforge.net/

You'd probably need an awful lot of data for this to be helpful.
 
Tibi Kiss
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing.
Every search engine has a data processing requirement until the data is indexed etc.
Really big search engines needs really big data processing frameworks. Hadoop is the one.

But the category of data processing doesn not reduce to search index processing, but there are plenty of problem domains which can be covered. For example DNA alignment in bioinformatics, various other bioinformatics subjects, all becaues of large genome datasets. There are also graph processing problems where the amount of data is huge and cannot be loaded all of them in memory. There are decision systems, EM (Expectation Maximization) algorithms and other AI subjects, especially those which requires strong mathematical background, such as data-mining. In other words I would say that the category of problems which you cannot solve efficiently in J2EE model, or in simple database applications.
 
Chuck Lam
author
Greenhorn
Posts: 12
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In fact, the last chapter of my book has a whole case study on how IBM uses Hadoop to implement its intranet search.

Long story short, Hadoop can be helpful in enterprise search when you need to implement search in a distributed system. And the main reasons for needing a distributed system in search are scale and complexity. When you're indexing lots of data (IBM's intranet is quite huge), using Lucene/Solr on a single machine would be too slow. Similarly, if you need to do any complex indexing, such as natural language processing, you will easily outgrow the capability of a single machine.
 
Srinivas Mupparapu
Greenhorn
Posts: 14
Java
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To add to what Tibi Kiss has said above, one can use Hadoop to store large data in it and use MapReduce framework to index the data using Lucene. You can then make the resultane Lucene index documents searchable using Solr.
 
Phil Morettini
Greenhorn
Posts: 2
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wanted to let you know about an upcoming webinar about optimizing search in NoSQL database applications:

Rich Search with NoSQL: Why now?

As developers are rapidly moving to NoSQL for its speed and flexibility, search often becomes the new bottleneck. In this webinar we will cover various topics to optimize text search in NoSQL applications. Included will be a live installation/configuration of the SRCH2 search engine in a MongoDB application. Attend the webinar by signing up at: http://srch2.com/webinar.html

 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic