Win a copy of Emmy in the Key of Code this week in the General Computing forum!

Robert-Zsolt Kabai

Greenhorn
+ Follow
since Aug 16, 2011
Robert-Zsolt likes ...
Chrome Eclipse IDE Java
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Robert-Zsolt Kabai

Welcome to the staff and thank you for being available for out questions.
Hi,

I'm wandering of how the data access and interoperability will evolve in the near future. You as authors may have some information, a vision or an opinion about that.
While the current data access methods are fine, the only way for mahout algorithms to use data from other Hadoop projects that have some table storage implemented (Hive, Cassandra, HBase) is to do a series of data extractions and transformations that is quite painful as multiple HDFS writes are necessary for it. This is of course because I can't just tell Mahout to use one of these tables. First we extract data from Hive/Cassandra/HBase and write it to a csv file on the HDFS, then start converting that csv data to the vector type data that mahout algorithms can eat. This is of course a lot of I/O work and that's a lot of time and resources.

Do you see the possibility of these operations and dataflow between these tools evolving to be more effective? After all, we have some data storage tools and some data analytics tools(like Mahout) and the need for the data flow to be effective is obvious. I've seen a current incubator project started to try to somewhat standardize table data to help interoperability, named HCatalog. Do you think this may be the short/long term answer for the question?

Thank you for your answers.
Robert

First of all greetings to the authors and thanks for being available for questions.

In the past few years we've seen quite some progress in all Hadoop related projects, including Mahout.
How can the books keep up with the pace of change in the Hadoop ecosystem?
Do you think as Mahout is still heavily under construction and many more features and algorithms yet to come, can we expect a second release of the book later on? Is it in plan to follow the roadmap and release an update of the book too after Mahout reaches certain milestones in the future?

Thank you for your answers.

Robert