Forums Register Login

Using HADOOP to index Big Data

+Pie Number of slices to send: Send
Hello everybody,


Introduction
As a new member of this forum I must first say that it seems you have quite a professional community going on here!
So, searching the web for answers (including this forum), I decided to ask you directly for help on a HADOOP topic.

Before I get any further, I must say I'm a begginer in HADOOP and Big Data, which dind't stop my company from giving me an important project to handle.
Because of security reasons (iposed by my employer), I cannot share with you all the details of my work and/or other specific technical details. But if finding the help I need depends on these details, I might make an exception or two (just don't tell my boss...).


Environment & Problem Description
I work in a company where the Engineering Department guys produce an amazing amount of CAD files (Computer Assisted Design). So over the years we ended up having hundreds of thousands if not millions of files hosted on different Filler Systems. But quite often, the engineers need to access those files to modify/evolve/consult the information inside. The problem is that even though the engineers know precisely the name of the file they want, it takes quite a while (sometimes more than an hour) for the Filer System to actually find it and send it back to the engineer's PC. And that is because no indexing system exists on the Filer Hosting System (the system tests every single inode until the correct one is found). The files are not very big (a couple of dozens of MB) - but there are so many of them...

So the project I've been given is to study whether HADOOP could help up index those files and send them faster to the engineers.


The Question(s)
Given the fact that HADOOP has its own File System (the HDFS), that means that importing the data into HADOOP will make us double the used disk space. But from what I understood, HADOOP can jump this step if the data is hosted by certain Linux distribution OS. Only problem there is, is that I don't think one can install HADOOP over a Filer System. Does anybody know whether that is even possible?

Whatever the answer to my previous question, the main question I would like to ask is the following.
The only need I have is to index that data. Once the data is indexed by HADOOP there will be no data manipulations/treatments done to it through HADOOP. The data is there only to be found very fast and to be sent back to a client PC. From my understanding, HADOOP is destined to data processing. It is made to create new "result" files based on the existing ones, and not to send back the data it already hosts. Would you agree with this statement?

All in all, should one use HADOOP to index this kind of data?
Would HADOOP do a better job at indexing files than other products?
What other products would you advise me to look closer to in order to solve the problem?

If more details are needed in order to express an opinion, please let me know and I'll give as many as possible.




Thank you in advance for your time and answers!
Any opinion is greatly apriciated!


Think of how dumb the average person is. Mathematically, half of them are EVEN DUMBER. Smart tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com


reply
reply
This thread has been viewed 1945 times.
Similar Threads
Hadoop in enterprise
Is Enterprise Service Bus a "code smell"?
How do we take free bytes from the Linux file system ?
Lucene : Where to use exactly
More...

All times above are in ranch (not your local) time.
The current ranch time is
Mar 29, 2024 03:49:07.