• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Best way to start with Hadoop?

 
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Greetings, Ladies and Gentlemen!

I need some guidelines about how to start with Hadoop. I need a bunch of two things: a book and an installation. Their versions should match Can you give me some advice on this?

I have 32bit Windows 7 with Debian 7 inside my VirtualBox.

Questions:

Is there a big difference between Hadoop releases? I mean 1x and 2x.
There is a Hadoop download on Apache site. But I have seen many opinions that are saying that direct installation is a big pain. Is this correct? I just want to setup a single-node cluster to play with it.
There are Cloudera packs. But unfortunately they are for 64bit machines, as far as I understand.
There is a Horton sandbox I have been downloading for the last 30 minutes.

Something else?
What you can recommend me?

And also I need a book that describes the version I am going to install more or less precisely. Hadoop Definitive Guide is from 2012 - is it still up to date? I cannot figure which version of Hadoop it describes.
 
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The Hortonworks Sandbox is the best place to start, as it gives you a pre-packaged installation with all the core Hadoop tools running in a CentOS VM, as well as a set of useful tutorials to get you started. But you'll have problems if you're on a 32-bit machine, as I'm not sure there are any 32-bit Hadoop platforms these days. Hortonworks Sandbox requires 64-bit, and you'll need plenty of RAM if you're running it inside a VM.

Manual installation is a nightmare, and you need several related tools to get a useful installation working - and there are lots of different (and mutually incompatible) versions of all these tools. If your main goal is to find out what you can do with Hadoop, you'll waste a lot of time just trying to install and configure it if you try and do this manually.

Hadoop 2.x is different from Hadoop 1.x as it uses a different model for managing and distributing processes. The newer Hadoop v.2 (YARN) allows you to use other processing engines instead of MapReduce, but this also means you need to make sure you use compatible versions of all your Hadoop libraries and clients etc.

If I were you I would just work through some of the Sandbox tutorials on HDFS, Pig and Hive first, so you can get a feel for what Hadoop does, before you start worrying about configuration, manual installation etc. Also, things change fast so any books from 2+ years ago were probably written 3+ years ago and may be out of date by now.
 
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Chris,

Is the sandbox installation like an OS installation or does it sit inside an already installed OS?

I understand it all like this, I install Windows OS on my machine, on top of that I install VirtualBox, and now when I boot I get into VirtualBox. Do I need to install another OS on VirtualBox before installing sandbox. I see that there are instruction for Sandbox for Mac and Windows.
 
chris webster
Bartender
Posts: 2407
36
Scala Python Oracle Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The sandbox includes the OS (CentOS I think) so you just need to download the appropriate VM file i.e. for Virtualbox or VMWare Player. Then follow the instructions to load the VM e.g. in Virtualbox. You need a 64-bit host OS and plenty of RAM e.g. 8GB or more.

It may take several minutes to start up all the services, but eventually you should be able to connect to the Hortonworks server via your browser - the VM window should display the IP address to do this. The browser Hue interface allows you to work with HDFS, Pig, Hive, HBase etc, but you can also connect to the VM Linux shell via SSH from your host operating system.
 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Find 3 easy steps to install hadoop.

1.Install VMware player
2.Install hortonworks hadoop sandbox
3.Run hadoop
 
Akhilesh Trivedi
Ranch Hand
Posts: 1609
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jio Thomas wrote:Hi,

Find 3 easy steps to install hadoop.

1.Install VMware player
2.Install hortonworks hadoop sandbox
3.Run hadoop



Is Vmware player free?

https://coderanch.com/t/643229/gc/vmware-player-free
 
Ranch Hand
Posts: 85
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

chris webster wrote:The Hortonworks Sandbox is the best place to start, as it gives you a pre-packaged installation with all the core Hadoop tools running in a CentOS VM, as well as a set of useful tutorials to get you started. But you'll have problems if you're on a 32-bit machine, as I'm not sure there are any 32-bit Hadoop platforms these days. Hortonworks Sandbox requires 64-bit, and you'll need plenty of RAM if you're running it inside a VM.


hello chris, i did some of the sandbox tutorial, i didnt find any tutorial for mapreduce. are there any for mapR?
also, sandbox tutorial are too basic i want to go a level up, suggest me some path to get more knowledge in hadoop.
thanks
 
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Naidu and  all,
I have seen this thread is  started 3  years before. can  you  please suggest after sandbox tutorial, what else  you  did  for learning advanced level.
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic