In Spark, there are transformation methods like map, reduce, filter. These methods are specific to Spark and are called Spark transformations. Howcome these methods are present in Scala programming language (without Spark).
Mapping, filtering and reducing are some of the most essential operations in any functional language. Scala needs them regardless of whether you use Spark or not.
A better question is, why does Spark for Scala feel that it needs separate versions of these functions?
posted 1 year ago
I think the answer to the question you said as the better question to ask is that since Spark requires these methods to do the work they are intended to on a cluster of machines so they require a different version of these methods. That is what I think. I may be wrong.
Spark got inspired for it's API from Scala's collection API that has those methods (filter, map, flatMap, reduce, etc). When invoked in a Scala collection, they run locally, returning a new collection, when using Spark, the methods will be invoked on the RDD API, and will return a transformed RDD that will run in the Spark cluster.