This week's book giveaway is in the Android forum.
We're giving away four copies of Head First Android and have David & Dawn Griffiths on-line!
See this thread for details.
Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Piet Souris
  • Al Hobbs
  • salvin francis

Hadoop for Neo4J

Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I need to run analytics using the hadoop framework on data generated by the users of my application. The data is stored in Neo4J database.
From what I understand, in order to analyse this data, I need to push this into HDFS and then run the batch programs as part of my analysis.
However, I cannot find a tool / connector that could seamlessly transfer the graph data in Neo4J to HDFS. Could anybody please suggest what is the standard approach for such usecases? Or am I doing something completely wrong?
Posts: 1104
Netbeans IDE Java Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi titash,
Welcome to the Ranch!

The default InputFormat in Hadoop MR is a text file (stored in hdfs) - this also gives high performance.

There are other input formats supported as well. You can check out this API page to find out more.
For example, DBInputFormat is provided for using RDBMS databases as the input. This means you can connect to the database directly (and not really use HDFS) - but, one would think it would hit performance - am not sure by how much.

However, there is none provided by Hadoop themselves specifically for Neo4j. So, you should look to see if there is one available in the Neo4j site.
Consider Paul's rocket mass heater.
    Bookmark Topic Watch Topic
  • New Topic