Feineigle.com - O'reilly Graph Databases

Home · Book Reports · 2016 · O'reilly Graph Databases

Published: October 5, 2016
Tags:  Database · Graph · Software



The book in...
One sentence:
Graph database introduction for Neo4j and Cypher.

Five sentences:
Graphs often map more naturally to the entities (nodes) and their relationships than 'traditional' RDBMS do. Business and technical sides can now communicate because the whiteboard model is the model. The questions you want to ask of your data will drive how to model your graph. Graphs, as opposed to other NOSQL databases, retain ACID transactions, fast lookups, and scalability. While there is no one-size-fits all database solution, graphs are a formidable option in any realm and really shine with densely connected datasets.

designates my notes. / designates important.


Thoughts

Easy to understand introduction to graph databases. Neo4j is, so far, useful and intuitive. In the little I’ve used neo4j (about a week fiddling around with it) it seems very straight forward to use with cypher and py2neo.

I have been wanting to try MongoDB for a while, and this book made me want to look at the newer database options more than ever. Cassandra also looked appealing.

The mapping structure does emulate how I think. The connections or relationships are intuitive. While working on CONected I was able to sketch a few ideas, each covering a particular domain, and then amalgamate them together in one graph. It is too early to tell how well it will work, but it was quick to develop and, again, is intuitively linked to the real world.

The Shakespeare example, with 3 overlapping domains, is great. In developing a graph for CONected, it seems like I'll want to model people in an organization, then multiple of these organizational subgraphs into a larger national and global graphs. A similar model can be constructed of government departments inside national governments inside global governments. Something json-like:
{"World Gov":
  {"Nato": {"USA Gov": {"CIA":["Alice", "Bob", "Charlie"],
                        "Whitehouse":["George", "George", "Bill"],
                        "TV":["Lies", "About", "Fantasies"]},
           "English Gov": {"Parliament":["Tony", "Margaret"],
                            "Monty Python": ["Holy Grail"],
                            "Tea": ["Black", "Green"]}
           }
  }
  {"EU": ["..."] }   
   "Japanese Gov": {"Whales": ["Sushi", "Dinner"],
                    "Anime": ["Luffy", "Chopper"],
                    "Sexbots": ["Beep", "Boop", "Ooh"]}
}

CHAPTER 1 - Introduction:

Page 9:

CHAPTER 2 - Options for Storing Connected Data:

page 12:
page 17:
page 18:

CHAPTER 3 - Data Modeling with Graphs:

page 25:
page 27:
page 28:
page 29:
page 37:
page 39:
START user=node:users(name = 'User 3')
MATCH (user)-[*1..5]-(asset)
WHERE asset.status! = 'down'
RETURN DISTINCT asset
page 40:
page 43:
page 45:
page 46:
START theater=node:venue(name='Theatre Royal'),
      newcastle=node:city(name='Newcastle'),
      bard=node:author(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
      (play)<-[w:WROTE_PLAY]-(bard)
WHERE w.year > 1608
RETURN DISTINCT play.title AS play
page 48:
page 49:
START theater=node:venue(name='Theatre Royal'),
      newcastle=node:city(name='Newcastle'),
      bard=node:author(lastname='Shakespeare')
MATCH (newcastle)<-[:STREET|CITY*1..2]-(theater)
      <-[:VENUE]-()-[p:PERFORMANCE_OF]->()-[:PRODUCTION_OF]->
      (play)<-[:WROTE_PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance_count
ORDER BY performance_count DESC
page 50:
START bard=node:author(lastname='Shakespeare')
MATCH (bard)-[w:WROTE_PLAY]->(play)
WITH play
ORDER BY w.year DESC
RETURN collect(play.title) AS plays
page 61:

CHAPTER 4 - Building a Graph Database Application

page 63:
page 65:
page 66:
page 69:
START ord=node:orders(orderid={orderId})
MATCH (ord)-[:DELIVERY_ADDRESS]->(address)
RETURN address.first_line, address.zipcode
page 71:
START timeline=node:timeline(name={timelineName})
CREATE UNIQUE (timeline)-[:YEAR]->(year{value:{year}, name:{yearName}})
              -[:MONTH]->(month{value:{month}, name:{monthName}})
              -[:DAY]->(day{value:{day}, name:{dayName}})
              <-[:BROADCAST_ON]-(n {newNode})
START timeline=node:timeline(name={timelineName})
MATCH (timeline)-[:YEAR]->(year)-[:MONTH]->(month)-[:DAY]->
      (day)<-[:BROADCAST_ON]-(n)
WHERE ((year.value > {startYear} AND year.value < {endYear})
      OR ({startYear} = {endYear} AND {startMonth} = {endMonth}
          AND year.value = {startYear} AND month.value = {startMonth}
          AND day.value >= {startDay} AND day.value < {endDay})
      OR ({startYear} = {endYear} AND {startMonth} < {endMonth}
          AND year.value = {startYear}
          AND ((month.value = {startMonth} AND day.value >= {startDay})
              OR (month.value > {startMonth} AND month.value < {endMonth})
              OR (month.value = {endMonth} AND day.value < {endDay})))
      OR ({startYear} < {endYear}
          AND year.value = {startYear}
          AND ((month.value > {startMonth})
              OR (month.value = {startMonth} AND day.value >= {startDay})))
      OR ({startYear} < {endYear}
          AND year.value = {endYear}
          AND ((month.value < {endMonth})
              OR (month.value = {endMonth} AND day.value < {endDay}))))
RETURN n
page 72:
page 83:
page 85:
page 89:
page 91:
page 92:
page 94:
page 95:

CHAPTER 5 - Graphs in the Real World

page 103:
page 104:
page 111:
MATCH p=(subject)-[:WORKED_ON]->()-[:WORKED_ON*0..2]-()
      <-[:WORKED_ON]-(person)-[:INTERESTED_IN]->(interest)
page 112:
page 117:
page 118:
page 119:
START admin=node:administrator(name={administratorName})
MATCH paths=(admin)-[:MEMBER_OF]->()-[:ALLOWED_INHERIT]->()
      <-[:CHILD_OF*0..3]-(company)<-[:WORKS_FOR]-(employee)
      -[:HAS_ACCOUNT]->(account)
WHERE NOT ((admin)-[:MEMBER_OF]->()-[:DENIED]->()<-[:CHILD_OF*0..3]-(company))
RETURN employee.name AS employee, account.name AS account
UNION
START admin=node:administrator(name={administratorName})
MATCH paths=(admin)-[:MEMBER_OF]->()-[:ALLOWED_DO_NOT_INHERIT]->()
      <-[:WORKS_FOR]-(employee)-[:HAS_ACCOUNT]->(account)
RETURN employee.name AS employee, account.name AS account
page 120:
page 122:
START admin=node:administrator(name={adminName}),
      company=node:company(resourceName={resourceName})
MATCH p=(admin)-[:MEMBER_OF]->()-[:ALLOWED_INHERIT]->()
      <-[:CHILD_OF*0..3]-(company)
WHERE NOT ((admin)-[:MEMBER_OF]->()-[:DENIED]->()
          <-[:CHILD_OF*0..3]-(company))
RETURN count(p) AS accessCount
UNION
START admin=node:administrator(name={adminName}),
      company=node:company(resourceName={resourceName})
MATCH p=(admin)-[:MEMBER_OF]->()-[:ALLOWED_DO_NOT_INHERIT]->(company)
RETURN count(p) AS accessCount
page 123:
START resource=node:resource(name={resourceName})
MATCH p=(resource)-[:WORKS_FOR|HAS_ACCOUNT*1..2]-(company)
      -[:CHILD_OF*0..3]->()<-[:ALLOWED_INHERIT]-()<-[:MEMBER_OF]-(admin)
WHERE NOT ((admin)-[:MEMBER_OF]->()-[:DENIED]->()<-[:CHILD_OF*0..3]-(company))
RETURN admin.name AS admin
UNION
START resource=node:resource(name={resourceName})
MATCH p=(resource)-[:WORKS_FOR|HAS_ACCOUNT*1..2]-(company)
      <-[:ALLOWED_DO_NOT_INHERIT]-()<-[:MEMBER_OF]-(admin)
RETURN admin.name AS admin
page 132:

CHAPTER 6 - Graph Database Internals

page 141:
page 142:
page 143:
page 150:

CHAPTER 7 - Predictive Analysis with Graph Theory

page 165:
  1. Pick the start and end nodes, and add the start node to the set of solved nodes (that is, the set of nodes with known shortest path from the start node) with value 0 (the start node is by definition 0 path length away from itself).

  2. From the starting node, traverse breadth-first to the nearest neighbors and record the path length against each neighbor node.

  3. Take the shortest path to one of these neighbors (picking arbitrarily in the case of ties) and mark that node as solved, because we now know the shortest path from the start node to this neighbor.

  4. From the set of solved nodes, visit the nearest neighbors (notice the breath-first progression) and record the path lengths from the start node against these new neighbors. Don’t visit any neighboring nodes that have already been solved, because we know the shortest paths to them already.

  5. Repeat steps 3 and 4 until the destination node has been marked solved. Efficiency of Dijkstra’s Algorithm Dijkstra’s algorithm is quite efficient because it computes only the lengths of a relatively small subset of the possible paths through the graph. When we’ve solved a node, the shortest path from the start node is then known, allowing all subsequent paths to safely build on that knowledge.

page 173:
page 175:
page 179:
page 180:

APPENDIX A - NOSQL: Overview

page 185:

{% filter markdown %}

  1. Atomic - All operations in a transaction succeed or every operation is rolled back.

  2. Consistent - On transaction completion, the database is structurally sound.

  3. Isolated - Transactions do not contend with one another. Contentious access to state is mod‐ erated by the database so that transactions appear to run sequentially.

  4. Durable - The results of applying a transaction are permanent, even in the presence of failures {% endfilter %}

{% filter markdown %}

{% filter markdown %}

  1. Basic availability - The store appears to work most of the time.

  2. Soft-state - Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.

  3. Eventual consistency - Stores exhibit consistency at some later point (e.g., lazily at read time). {% endfilter %}

{% filter markdown %}

page 186:
page 189:
page 192:
page 193:
page 196: