My boss has asked me to setup single node hadoop on windows and said that we would be using it to create some POC on how to use Big Data in this setup.
So, when I say single node haoop, is that the same as single node hadoop cluster?
I haven't used Hadoop, but I understand that it relies on Unix shell commands, so if you are on a Windows machine, then you either need a Unix-shell for your Windows machine (which is what Cygwin does), or you need a Linux virtual machine running inside your Windows machine (which is what VMWare does). Alternatively, you can either install Linux to dual-boot on your current machine before installing Hadoop, or find a machine that already has Linux on it and install Hadoop there instead.
I don't use Cygwin, so I don't know how you would install/run Hadoop with Cygwin, but I'm guessing you would open a Cygwin shell and then use Unix-style commands as indicated in the Hadoop instructions.
VirtualBox is an easy free alternative to VMWare - you can install VirtualBox on a Windows PC, then install a Linux VM to run inside VirtualBox e.g. here is a quick guide to installing Ubuntu Linux on VirtualBox. Once you've got Linux running inside VirtualBox, I guess you would log into your Linux VM and then follow the instructions for installing/running Hadoop on Linux.
If you're using Ubuntu, remember you may need to use "sudo" for running some commands that require extra permissions e.g. "sudo apt-get install ssh". Also, running a VM inside Windows takes extra memory, so make sure you have enough RAM to run both OS at the same time. If you dual-boot Linux instead, obviously this is not such a problem as you are either running Windows or Linux but not both at the same time.
FWIW, it may be slightly more work to install Linux (as a VM or dual boot) on your Windows PC, but once you've done this it's often much easier to get/install/use the Linux versions of many open source tools than the equivalent Windows versions.
I tried using cygwin for hadoop but as i see there are quite a few issues setting up hadoop.
So, thought of using this vmware and successful partly. though this is a good way to learn MapReduce, I would definitely try to install hadoop using cygwin as well.
Hi Lopez, could you please let me know if you have installed hadoop using cygwin succesfully. I have got few errors while formatting the name node in Hadoop. Could you please help me in resloving the issue. Thanks in advance.
Hi Lopez. Sometimes I get an error regarding SSHD service start. Sometimes I get the error in running the local host and the error message for this is " Connection Refused". Now while extracting the hadoop jar file to the folder "C:\cygwin\user\local" folder I got the eror saying I do not have permissions to do this task. Could you please help me in knowing what might be the reason for this? I have attached the document for this error.