• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Is Scala still the most preferred for Spark even when using Datasets and Dataframes instead of RDD?

 
Ranch Hand
Posts: 2925
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Scala has been the preferred language for Spark. Spark was built using Scala. If we use Scala for Spark , it is easier to do functional programming using Functions, the code is concise and it is faster. Thr RDDs are slower with non JVM languages. Now we have other options apart from RDDs which are Dataframes and Datasets. So if we are using Dataframes/Datasets, is scala still the preferred language? Thanks.
 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
AFAIK, Spark is still implemented in Scala, so the Scala APIs are usually delivered first and are most complete.  Spark SQL, DataFrames, DataSets etc have been in Spark for a couple of years now.  There is no reason to switch between languages for different Spark libraries, if the library you need is available int he language you are using.

https://coderanch.com/t/733230/open-source/Spark-Action-Pros-Cons-language

 
Monica Shiralkar
Ranch Hand
Posts: 2925
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks. Yes, whether it is dataframes , datasets or the RDDs, programming in Scala is beneficial. I think because of 1) faster execution 2) easy to code (For e.g while using Functions) 3) concise code.
reply
    Bookmark Topic Watch Topic
  • New Topic