Win a copy of Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams this week in the Reactive Progamming forum!

Amy Hodler

Author
+ Follow
since Sep 03, 2019
Cows and Likes
Cows
Total received
1
In last 30 days
1
Total given
0
Likes
Total received
5
Received in last 30 days
5
Total given
1
Given in last 30 days
1
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Amy Hodler

Hi Everyone,

I've enjoyed answering questions this week on graph algorithms and I wanted to leave behind some resources in one spot:

  • Free Digital copy of the O'Reilly Graph Algorithms book here: https://neo4j.com/graph-algorithms-book/?utm_source=coderanch
  • For general Spark questions, subscribe to users@spark.apache.org at the Spark Community page
  • For learning Neo4j - the Developer Resources page aggregates training, sandboxes, etc https://neo4j.com/developer/
  • For Neo4j questions, visit the Neo4j Community https://community.neo4j.com/


  • Also, the Neo4j Online Developer Conference is coming next week! https://neo4j.com/online-summit/


    Hi Sean,

    I love book recommendations! I've added yours to my queue.  

    After looking at the chapters, it seems the two that mentioned cover some graph algorithms with particuliar focus on the pathfinding and how the algorithms work.  

    This Graph Algorithms books includes pathfinding, community detection, and centarlity algorithms. (We also slipped some Link Prediction algos into the last chapter.) We spend a lot of time on helping people understand when to use the different algorithms with examples. And then we have 2 chapters using fictional workflow to provide an end-to-end feel for how they might work in a real-work scenario.  So overall, I'd say this book is intended as a practical guide to give people the confidence to apply graph algorithms in either Spark or Neo4j. There's some overlap in that we also try to explain how these algorithms work but we're lighter on theory and heavier on usage.

    Now I have to share a list of books I like.  
    https://neo4j.com/blog/top-13-resources-graph-theory-algorithms/
    My all-time algo favorite is The Algorithms Design Manual by Dr Skiena.
    Very True Tim!

    I'm often thinking about graphs from the analsys perspective (i.e. Graph Theory stuff) which of course is just one area.  

    (graphs)-[ARE]->(everywhere)  

    :-)  

    Hi D.J.

    Graphs aren't scary once you get to know them. They are actually immensely fun. I have 3 Neo4j recommendations:

    1) Neo4j has some free online tutorials so you can get a level set of the basics: https://neo4j.com/graphacademy/

    2) Then the Sandboxes!  https://neo4j.com/sandbox-v2/
    Anyone can spin up a free sandbox that is hosted online with pre-loaded data. This was my first introduction to Neo4j and it was great because I knew there was no way I could break anything. Also, because you have preloaded data and some tutorials (click button -> auto-runs a query) you have some basics built in that you can then change. There's an algorithm one built on Game of Thrones, your own Twitter account, and some serious ones like crime investigation.

    3) And if you just need inspiration, try the GraphGists which are example graphs that developers share online: https://neo4j.com/graphgists/

    Please have fun!
    Hi Alex,

    We used spark-2.4.0-bin-hadoop2.7 in the book but there is a newer 2.4 version that I would expect to work.  http://spark.apache.org/downloads.html
    You can download Neo4j here: https://neo4j.com/download/ and there's a nice post on adding the algorithms plug-ins here: https://medium.com/neo4j/explore-new-worlds-adding-plugins-to-neo4j-26e6a8e5d37e

    We did not include information about graph databases because it was well covered in another O'Reilly book. If you're interested in the Neo4j Graph Database, there is a book with that focus: https://neo4j.com/lp/book-graph-databases


    Hi Divya,

    I'm not sure I totally understand your question, so let me know if I don't answer it.

    Our focus is on graph algorithms that can be used for analysis or feature engineering.  We've seen graph features greatly improve the ROC curves for ML predictions and knowledge graphs help for adding more context to AI systems. In regards to features for training ML models, we see a lot of use of different community detection algorithms for fraud detection, centrality algos for finding influencer for recommendations, and a mix for disambiguations.  We also see quite a bit of use similarity and link prediction algorithms for feature engineering.

    If you're interested in running ML/DL inside of a graph, this is a great paper on the concepts: https://blog.acolyer.org/2018/09/19/relational-inductive-biases-deep-learning-and-graph-networks/  (Note that I think we are several years from seeing this type of solution in the market.)
    Hi Awais,

    Our focus in this book is on using graph algorithms for analysis and feature engineering for machine learning. (More classical graph theory uses.) We do not include any content on neural networks.  However, our team is extremely interested in Graph Native Learning as outlined in the Google DeepMind paper: https://blog.acolyer.org/2018/09/19/relational-inductive-biases-deep-learning-and-graph-networks/.  We believe that in the future people will be running ML/DL inside graphs but this is going to take time to emerge.

    1 - I don't believe you must learn graph theory to understand NN but as I mentioned above, I believe they point to some promising directions. And Graph Theory itself is just plain fun.
    2- Graphs help with AI in 2 big ways today:
  • For graph feature engineering because relationships are often the strongest predictors of behavior. This is the focus on chapter 8 of the book.
  • By using Knowledge Graphs to help AI systems make better heuristic decisions by adding context.

  • 3 - Yes!  There are many many examples with sample data and code on github you can play with.

    Hi Fei,

    It's a bit of both but the bulk of the material is focused on hands-on examples.

    The first two chapters provide background on graph and graph analytics concepts for those that are new to the ideas. We have 3 chapters that deal with the classic graph algorithm categories: pathfinding, centrality, and communities.  In those chapters, we go into some detail on key algorithms and how they work (how they calculate), then overview examples uses, and finally show code for how to run it in Spark / Neo4j.  We also have 2 chapters that are examples of workflows to solve problems like recommendations and link prediction.

    If you're debating on whether to pick up a hard copy, you can download a free digital version for a while at https://neo4j.com/graph-algorithms-book/.

    Hi Paul,

    I wrote a reply on "What are graph algorithms" that may also be helpful for some of the basic concepts:
    https://coderanch.com/t/717615/AI-artificial-intelligence-machine-learning/engineering/Graph-Algorithms

    To answer your question more specifically, graph algorithms used in AI scenarios are generally the ones employed for graph feature engineering for better ML predictions.  The algorithms we see used frequently are often related to:
  • Communities: For example, Clustering Coefficient, Triangles, Label Propagation, and Union Find
  • Centralities (influencers): For example PageRank and Betweenness Centrality
  • Similarities: For example Cosine Similarity and Jaccard Similarity
  • Link Prediction: For example, Common Neighbors and Preferential Attachement


  • You can definitely use graphs to map friends' connections but when they are used in the context of ML/AI we're often converting to a metric that can be learned on. For example, maybe we want to learn based on the number of friends (the node degree) or the community label based on tight groups of friends.  
    Graph Algorithms can be used in many different systems, so you do not need to use Spark or Neo4j to use graph algorithms.  We use Spark and Neo4j to showcase graph algorithms in the book because they both had unique qualities. Spark is a popular scale-out computing framework with libraries to support a variety of data science workflows. Neo4j offers a high-performance graph-native platform with over 45 graph algorithms and the ability to persist graphs.

    Most people use a graph query when investigating limited areas of a graph or a specific question.  For example, how many hops between node A & B.  However, we would want to use a graph algorithm when we looking for more holistic analysis of the graph structure.  For example, finding all the communities based on the number of relationships among nodes.

    If you do not need to store your graph and you're working with a small dataset, there are a number of platforms you might use. NetworkX has quite a few algorithms and is often used in academic settings with small graphs.  

    We wrote the book so that the concepts about graph analysis and how certain algorithms calculate results could be applied more generally.  For those just getting started, you might want to download the free digital copy and check out the first few chapters: https://neo4j.com/graph-algorithms-book/



    Hi Carl!
    Let's look at this in 3-parts: graphs, their algorithms, and what that has to do to AI/ML.

    1) Graphs:
    Graphs are uniquely suited to manage/analyze connected data because they are, very simply, a mathematical representation of a network. The objects that make up graphs are called nodes (or vertices) and the links between them are called relationships (or edges).

    2) Graph Algorithms:
    Graph algorithms are built to operate on relationships and are exceptionally capable of finding structures and revealing patterns in connected data. These algorithms calculate metrics based on the relationships between things.  (So in other words, we would as or more interested in how many connections someone has and what type of relationships those are.) Graph algorithms serve us well when we need to understand structures and relationships to do things like forecast behavior, prescribe actions for dynamic groups, or find predictive components and patterns in our data.

    3) AI/ML Fit:
    Although only a few algorithms (like Label Propagation) are considered by some to be ML itself.  However, improving the accuracy of machine learning predictions is a popular use of graph algorithms. We know that relationships are some of the strongest predictors of behavior. We also know that more information makes our ML models more predictive but that data scientists rarely have all the information they want to train on. Graph Algorithms helps us incorporate this highly predictive information (that we already have hidden in the data!) to increase the accuracy, precision, and recall of machine learning models.

    People use graph algorithms for graph feature engineering to create scores that they can extract and use in their machine learning pipelines. In the last chapter of the book, we walk through an example and compare the predictive quality of different models as we add more “graphy” features.
    Hi Everyone.  Super excited to be answering questions this week.   Thanks for the warm welcome!

    -AH