• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

Spark in Action: Pros and Cons of each language for Spark

 
Rancher
Posts: 594
9
Android Tomcat Server Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Jean-Georges Perrin,

As the book said, it cover three programming languages, which are Java, Scala and Python.

From your experience,  what should be considered when choosing a language for Spark ? Does they have pros and cons of each ?

Thanks.
 
Ranch Hand
Posts: 32
3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As the author hasn't replied, here's my personal take as an occasional user of Spark since around 2014.

Scala:

* AFAIK Spark is still written in Scala, which means new features appear in the Scala APIs first.
* This means the Scala API is usually a bit ahead of the others, and it will never be behind them.
* Personally, I find Scala is a very natural language for this kind of processing (which is why Spark is based on it), so I am most comfortable with the Scala API.  YMMV of course.

Python:

* Python is very widely used with Spark, as it is a much more popular language than Scala generally, and it is often used by data scientists.
* Python is also a popular choice for people who use interactive notebook interfaces, like Jupyter, with Spark (although you can also use Scala with notebooks these days).
* But the Python API is usually a little behind the Scala API, and some features are slower/harder to implement in Python than in Scala.
* So Python is a good choice for data scientists or if you are not concerned about having the very latest API features.

Java:

* There is no good reason to use Java with Spark.  
* Although Java now offers Lambdas etc, it is still really clunky to write good functional code with Java compared to Scala.
* And Python is a much nicer language for data science and notebooks etc.
* If you're using Spark, pick a language API that works well with Spark and does the things that Spark does well.




 
Ranch Hand
Posts: 2512
13
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Christopher Webster wrote:
There is no good reason to use Java with Spark.



While Python is the preferred choice while going for Spark ML , for other cases I think, suppose the team has developers who are good in Java(instead of Scala), if we go for Java, we still can still have the option of moving to Scala later when we have that skill set in team. However if today we go for Python, then it is like another route altogether as then relatively it would be less likely to be able to move to Scala. The reason for this is that Scala and Java, the JVM languages have more in common than Scala and Python.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic