posted 7 years ago
Other than knowing java already, not really. Unlike python or groovy, Scala and Java are both strict types.
Benefits to using scala:
You do not need to declare types for variables, but you might want to for practical reasons.
Spark was built using scala, so you have more libraries and full access to spark libraries you won't have in java.
It takes less code than it would in Java.
Simpler more functional coding.
Import aliases and the ability to import entire classes as locally defined. Sort of like python.
Cons to using scala:
You are basically learning a new language even know it is based off java.
Some things can be very frustrating that should be simple.
There are many formats for writing scala, it can make it difficult to read code from others if you do not know all the variences.
For example you don't a user could write abc.>6 where the . Can be confusing. There are more complex variations too which can be difficult.
These are just my personal opinions. I've gone from Java > Groovy > Python > Scala.
Groovy being a favorite because it is extremely flexible and allows the least code, but has overhead.
Scala/Python are my next favorites.
Scala I really hate the strict types it makes it difficult for writing UDF. While scala has an EitherOr type spark doesn't support the type even know you return StringType for example.
Groovy doesn't care what type you return, it will find the best match if you don't declare.
Imo iterating dynamic json is impossible in scala as well. In groovy or python you don't need to know the types of key names but in scala you need to know. There are some libraries but none are great.
Overall I'd say go to Scala.
Python is not yet supported with newer Kafka libraries, and has less support than java, which has less support than Scala.