Hi guys : for those of you that use hadoop --- do you manage your data directly ? Or do you just dump it all in hbase ?
- In general most people I hear of using hadoop are doing it to store millions or billions of records for map/reduce
- The Hadoop m/r api can directly read/write to hbase tables.
- It seems like storing results of m/r jobs in programatically created folders, rather than in a single, machine-managed map (like hbase) is an errand which might be prone to errors in subsequent read and data cleaning stages which might occur.