This week's book giveaway is in the Design and Architecture forum.
We're giving away four copies of Communication Patterns: A Guide for Developers and Architects and have Jacqui Read on-line!
See this thread for details.

Robert-Zsolt Kabai

Greenhorn
+ Follow
since Aug 16, 2011
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
In last 30 days
0
Forums and Threads

Recent posts by Robert-Zsolt Kabai

Welcome to the staff and thank you for being available for out questions.
Hi,

I'm wandering of how the data access and interoperability will evolve in the near future. You as authors may have some information, a vision or an opinion about that.
While the current data access methods are fine, the only way for mahout algorithms to use data from other Hadoop projects that have some table storage implemented (Hive, Cassandra, HBase) is to do a series of data extractions and transformations that is quite painful as multiple HDFS writes are necessary for it. This is of course because I can't just tell Mahout to use one of these tables. First we extract data from Hive/Cassandra/HBase and write it to a csv file on the HDFS, then start converting that csv data to the vector type data that mahout algorithms can eat. This is of course a lot of I/O work and that's a lot of time and resources.

Do you see the possibility of these operations and dataflow between these tools evolving to be more effective? After all, we have some data storage tools and some data analytics tools(like Mahout) and the need for the data flow to be effective is obvious. I've seen a current incubator project started to try to somewhat standardize table data to help interoperability, named HCatalog. Do you think this may be the short/long term answer for the question?

Thank you for your answers.
Robert

First of all greetings to the authors and thanks for being available for questions.

In the past few years we've seen quite some progress in all Hadoop related projects, including Mahout.
How can the books keep up with the pace of change in the Hadoop ecosystem?
Do you think as Mahout is still heavily under construction and many more features and algorithms yet to come, can we expect a second release of the book later on? Is it in plan to follow the roadmap and release an update of the book too after Mahout reaches certain milestones in the future?

Thank you for your answers.

Robert