G Mukherjee wrote:Which criteria should be applied while choosing the specific language out of the supported ones for the data analysis scripts on Databricks? Does one provide benefits over the others?
I believe the main thing to think about is who is going to use this language and what is the knowledge gap. Are data scientists going to target Databricks, or is it only engineers? Data scientists might be more familiar with Python, R, SQL while software engineers might prefer
Java or Scala. Are people willing to standardize on a language and learn it if they are not familiar with it? From my experience, this is the main challenge in a data team, as different disciplines come with different language knowledge.
In terms of mechanics, I believe all Databricks languages target the same underlying API, so there shouldn't be any major differences in capabilities. So it is mostly a people/standardization problem: Having multiple languages makes things harder to maintain. Standardizing on one require some training for part of the organization.