Thanks
Christopher Webster wrote:RDDs are now provided as DataSets, which have the same action/transformation distinction: https://spark.apache.org/docs/latest/rdd-programming-guide.html
Yes, the operations on Datasets look similar to RDDs. I think some actions like reduceByKey and groupByKey as in RDD are not availble in Dataframes.
I wonder when RDDs are used such less as compared to Datasets and Dataframes, the Spark programming guide has a good RDD programming guide but no good guide for Dataframes and Datasets something similar to how good RDD programming guide is. The way it lists the transformations and actions. I have a hard time knowing what actions does datasets support for in case of RDD I can easily see in the section for actions in the rdd programming guide.
Christopher Webster wrote:DataFrames are now based on DataSets instead of old-style RDDs
What exactly does that mean? From what I understood Dataframes look way different that Datasets and a program using dataframes looks like the below:
I am trying to understand that where exactly is Lazy Evaluation happening in the above code? ( I know that for RDD/DataSet code)
This is very different from Dataset program.(which looks somewhat like RDD code involving transformations and actions).
Christopher Webster wrote:
RDDs/DataSets are a lower-level construct. DataFrames are and always have been based on these. So nothing has really changed here.
Isnt only RDD low level construct(with low level apis like reduceByKey,groupByKey) and both Datasets and Dataframes high level constructs?