I need to run analytics using the hadoop framework on data generated by the users of my application. The data is stored in Neo4J database.
From what I understand, in order to analyse this data, I need to push this into HDFS and then run the batch programs as part of my analysis.
However, I cannot find a tool / connector that could seamlessly transfer the graph data in Neo4J to HDFS. Could anybody please suggest what is the standard approach for such usecases? Or am I doing something completely wrong?
The default InputFormat in Hadoop MR is a text file (stored in hdfs) - this also gives high performance.
There are other input formats supported as well. You can check out this API page to find out more.
For example, DBInputFormat is provided for using RDBMS databases as the input. This means you can connect to the database directly (and not really use HDFS) - but, one would think it would hit performance - am not sure by how much.
However, there is none provided by Hadoop themselves specifically for Neo4j. So, you should look to see if there is one available in the Neo4j site.