I am planning to build a software application which will be data intensive and involving huge set of users to access host of product services, Now my question is to decide on technology stack, Should I use Java or Scala as the language of choice along with other Tech stack which include Apache, Solr, Kafka, Hadoop, Pig, Hive, HBase, JSON@MongoDB.
Any pointers how to oversee this problem before picking tech stack for final product.
That's a pretty heavy stack, so you might want to try out a few simple "spikes" using Scala and Java to get a feel for how they work with these different technologies.
We've been looking at Kafka with Scala, which seems fairly straightforward so far. We're also using Apache Spark's Scala API, which works well, and currently looking at combining Spark + Cassandra (a highly scalable column-family NoSQL database), which looks promising. However, MongoDB and Spark seems a bit clunky by comparison.
We looked at Hadoop/Hive/Pig, but right now we feel we can met our needs with NoSQL (Cassandra and possibly MongoDB) for distributed data and Apache Spark for distributed processing. Hadoop is a monster stack that imposes a lot of demands in terms of maintenance and admin, and we decided we can live without it for our project.
Scala offers interesting opportunities for building scalable applications based on the "Reactive Stack" e.g. Akka, Play, Kafka etc, partly because functional programming tends to be better suited to distributed processing. But Java probably offers many more libraries and mature tools, especially around Hadoop.
So your best approach is probably to do some experiments for yourself.