We can use spark on hadoop cluster or without hadoop. When we use it on hadoop cluster it uses hdfs file system. What does it require HDFS for in that case? And if not using on hadoop cluster what file system it may use and why is that required to be used?
Spark is a distributed processing engine. If you run it on Hadoop you can tell it to use the HDFS cluster as its data store. Or you can tell it to read/write data from other sources if these are available.
If you run Spark stand-alone, it can read from your local file system, or another source if one is available.
It will store temporary data where you tell it to e.g. on HDFS if you are on Hadoop, or on your local file system if you are running Spark on your laptop.
posted 1 week ago
Thanks. And if I am using it for say reading Streaming data from Kafka, then does it require a file system. If so , for what?
The government thinks you are too stupid to make your own lightbulb choices. But this tiny ad thinks you are smart: