We can use spark on hadoop cluster or without hadoop. When we use it on hadoop cluster it uses hdfs file system. What does it require HDFS for in that case? And if not using on hadoop cluster what file system it may use and why is that required to be used?
Spark is a distributed processing engine. If you run it on Hadoop you can tell it to use the HDFS cluster as its data store. Or you can tell it to read/write data from other sources if these are available.
If you run Spark stand-alone, it can read from your local file system, or another source if one is available.
It will store temporary data where you tell it to e.g. on HDFS if you are on Hadoop, or on your local file system if you are running Spark on your laptop.
posted 2 weeks ago
Thanks. And if I am using it for say reading Streaming data from Kafka, then does it require a file system. If so , for what?
I'm not dead! I feel happy! I'd like to go for a walk! I'll even read a tiny ad: