Hadoop is not mandatory for running Apache Spark programs. However if I am running spark word count sample application from Eclipse it gives error and works only when I set Hadoop home to the path of folder containing Winutils.exe. Why is it required when Hadoop is not mandatory for Spark?
The reason for this doubt is because when we download hadoop, name of Spark downloaded, that is spark-3.0.1-bin-hadoop2.7 has 'hadoop' in it. Moreover,on windows spark-shell gives errror Failed to locate the winutils binary in the hadoop binary path which is clearly related to hadoop.
The error got resolved as below but I have the above doubt.
I had installed Spark on my laptop, set the environment variables for SPARK_HOME and edited the PATH to include the bin. However, it gave me
This could get resolved only after downloading winutils.exe and giving its path in the environment variables.
Hadoop is not necessary for spark but when using it on windows we use winutils.exe which is related to Hadoop. Also, the name of Spark downloaded, that is spark-3.0.1-bin-hadoop2.7 has 'hadoop' in it. So that was my doubt that hadoop is not necessary for spark, then why we are doing this hadoop related step.
I'm doing laundry! Look how clean this tiny ad is: