[Home] [Articles, Categories, Tags] [Books, Quotes]
Neo4j
Tags:
Posted: 2016-10-05
Last Update: 2016-10-05

TODO:

UNSORTED NOTES:

INTRO

TODO:

The first major decision to make is what database to use. A graph database provides much faster traversing than a typical RDBMS.After some minor research into graph dbs, it seems that Neo4j is essentially the industry standard. It does not have a rigid schema, like all NOSQL. You can add Nodes and Relationships easily and the queries are very sane. It is powerful enough to power Walmart's recommender system and eBay's extra fast shipping. py2neo provides an interface that looks very easy to use.

Sadly it runs in Java and is server based so I think I need to install it first, like pretty much everything except SQLite.

Query language called CYPHER, inspired by SQL. Looks very nice at first glance.

Native restful api access in python might be better than the py2neo. Going to need to look into what py2neo can do.

Is it better to have 2-3 simple queries or 1 more complex one? Depends on how it will be searched. I'd favor simple over complex, unless you needed every bit of performance.


Learn

TODO:

How to transform typical RDBMS into a graph:

TODO:

from Intro_to_Graphs_and_Neo4j on youtube:

Neo4j and Flask:

from How to build a Python web application with Flask and Neo4j - PyCon SE 2015 on youtube:


COMMANDS

TODO:

update

2016-09-20, Installed neo4j/py2neo on laptop. Cypher Query Language (CQL) seems very nice indeed. Inspired by ASCII art it is very readable.

Lots of these queries from here: Tutorial_-Neo4j_and_Python_for_Dummies-_PyData_Singapore

TODO:

TABLE OF CONTENTS

Cypher in python:

1
2
3
graph.cypher.execute("cypher syntax string") #outdated?
graph.run("cypher") #this is the new way?
#returns a list

Direction:

1
2
3
MATCH (n1:node1)-[r:RELATIONSHIP]-(n2:node2) RETURN n1, n2;
MATCH (n1:node1)-[r:RELATIONSHIP]->(n2:node2) RETURN n1, n2;
MATCH (n1:node1)<-[r:RELATIONSHIP]->(n2:node2) RETURN n1, n2;

CREATE:

1
2
3
4
CREATE (e:Employee {Name:"Eric", Surname:"Lee", Gender:"M"});
CREATE (e:Employee {Name:"Eric"})-[:WORK_IN]->(c:Company {Name:"Silicon Cloud"})];
CREATE (n:Skill {Name:"Neo4j"})<-[:KNOWS]-(e:Employee {Name:"Eric"})
       ->[:WORK_IN]->(c:Company {Name:"Silicon Cloud"})];

CREATE UNIQUE:

1
CREATE UNIQUE #match what it can, create what is missing

MATCH:

Nodes:

1
2
3
4
MATCH n RETURN n; #returns everything
MATCH (e:Employee) RETURN e; #returns all nodes with label "Employee"
MATCH (e:Employee) RETURN e LIMIT 100;
MATCH (e:Employee) RETURN e.Gender; #all Employee node genders

MATCH:

Relationships:

1
2
3
MATCH (e:Employee)-[:KNOWS]->(s:Skill) RETURN e.Name, s.Name;
MATCH (e:Employee)-[:KNOWS]->(s:Skill) WHERE s.Name="Neo4j" RETURN e.Name;
MATCH (e.Employee)-[r]->(s:Skill) RETURN r;

MERGE:

1
2
3
MERGE (a:user {name:'Bob'}) #is like a MATCH or CREATE
ON CREATE SET a.age=23
ON MATCH SET a.age=22

Make Graph and Nodes (python and py2neo):

1
2
3
4
5
6
7
from py2neo import Graph, Node, Relationship
graph = Graph()
eric = Node("User", Name="Eric", Gender="M")
sci = Node("Company", Name="SCI")
ericWorkInsci = Relationship(node1, "WORK_IN", node2, Since=2016) #see below
#                relationship name = WORK_IN,         Since = relationship property
graph.create(ericWorkInsci)

Relationship Properties:

1
CREATE (User{Name:"Eric"})-[:WORK_IN{Since:2016}]->(Company{Name:"SCI"}) #see above

WHERE:

1
2
3
MATCH (e:Employee) WHERE e.Name="Eric" RETURN e;
MATCH (e:Employee) WHERE e.Name=~"E.*" RETURN e; #regex
MATCH (e:Employee) WHERE e.Name STARTS WITH "Er" RETURN e;

Modify properties in python:

1
2
3
4
eric = graph.find_one("User", "Name", "Eric") #get node
eric["Surname"] = "Lee" #add surname
eric.properties["Surname"] = "Lee"
eric.push() #push changes back to neo4j db

ORDER BY:

1
2
3
MATCH (n)
RETURN n 
ORDER BY n.name, n.age SKIP 3 LIMIT 2

UNION / UNION ALL:

1
2
3
MATCH (x:MALE)-[:FRIEND]->() return x.name, labels(x)
UNION
MATCH (x:FEMALE)-[:FRIEND]->()return x.name, labels(x);

DELETE:

1
2
MATCH (n)-[r]-(q) DELETE n,r,q #deletes all connected nodes
MATCH (n) DETACH DELETE n #Delete all nodes and relationships.

DELETE in python:

1
2
eric = graph.find_one("User", "Name", "Eric")
eric.delete()

REMOVE:

1
2
REMOVE n:Person    #Remove a label from n.
REMOVE n.property  #Remove a property. 

IN:

1
2
3
4
MATCH (n)
WHERE n.name IN["John","Andrew"]
  and n.age is Not Null
RETURN n;

ID:

NULL:

Built in functions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
timestamp() #returns time in milliseconds since 1970, Jan, 1 UTC
tail() drops the first node in a path
nodes()
relationships()
count()
all()
reduce()
length() of path

shortestPath()

? need to set somehow?
allSimplePaths()
allPaths()
dijkstra()

MY EXAMPLES

Installed on GIGA. Could not connect for some reason, then I tried secure on port 7473, then back to normal on 7474 and it worked. Once finally working...

The :help system is terrific. Lots of great (at least beginner) examples.

Initial Database class to access neo4j via py2neo:

I first tried the py2neo library. Wanting to learn more cypher I was only using the run() method after connecting (and authenticating) to the graph. Thus far I really like that the simple stuff -- querying for a particular node or adding a new node-- is at least as easy to read as SQL.

When trying to construct some dynamic queries, where I was setting a variable amount of attributes, I resorted to string constuction.

TODO:

I bet there is a way around this using the py2neo library. If not, the official native python restful api library (see next section) will work.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import py2neo
import config

class Database(object):
  py2neo.authenticate("localhost:7474", "neo4j", "password")
  graph = py2neo.Graph()

  @classmethod
  def run(cls, query, fetchone=False, **kwargs):
    '''Returns a list of dicts.
    If fetchone=True, returns a single dict.
    '''
    rv = cls.graph.run(query, **kwargs)
    if fetchone:
      try:
    rv = rv.data()[0]
      except IndexError:
    return None
      return rv if len(rv) > 0 else None
    else:
      rv = rv.data()
      return rv if len(rv) > 0 else None

A few simple queries relating to my users model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from datetime import datetime
from common.database import Database
from common.utils import Utils

@staticmethod
def new(username, password, email):
  '''Creates a new User object and write it to neo4j.
  '''
  cmd = '''MERGE (a:user {name:{username},
              password:{password},
              joined:{stamp}
             })'''
  Database.run(cmd, username = username,
            password = Utils.encrypt_password(password),
            stamp    = datetime.strftime(datetime.now(),
                         "%Y-%m-%d %H:%M"))

@classmethod
def Fetch(cls, username):
  '''Only for fetching users you know exist
  '''
  cmd = '''MATCH (u:user)
       WHERE u.name = {username}
       RETURN u.name     AS username,
          u.password AS password,
          u.email    AS email,
          u.joined   AS joined'''
  res = Database.run(cmd, username=username, fetchone=True)
  return cls(**res)

def updatePassword(self, newpassword):
  ''':param newpassword: to be hashed
  '''
  newpassword = Utils.encrypt_password(newpassword)
  cmd = '''MATCH (u:user)
       WHERE u.name = {username}
       SET u.password = {newpassword}'''
  Database.run(cmd, username=self.username, newpassword=newpassword)

Native python


ACCESS

Rest API:

With curl:

1
2
curl --user  username:password localhost:7474/db/data/labels/
[ "Organization", "user", "Event", "Person" ]
1
curl --user  username:password localhost:7474/db/data/
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "extensions" : { },
  "node" : "http://localhost:7474/db/data/node",
  "relationship" : "http://localhost:7474/db/data/relationship",
  "node_index" : "http://localhost:7474/db/data/index/node",
  "relationship_index" : "http://localhost:7474/db/data/index/relationship",
  "extensions_info" : "http://localhost:7474/db/data/ext",
  "relationship_types" : "http://localhost:7474/db/data/relationship/types",
  "batch" : "http://localhost:7474/db/data/batch",
  "cypher" : "http://localhost:7474/db/data/cypher",
  "indexes" : "http://localhost:7474/db/data/schema/index",
  "constraints" : "http://localhost:7474/db/data/schema/constraint",
  "transaction" : "http://localhost:7474/db/data/transaction",
  "node_labels" : "http://localhost:7474/db/data/labels",
  "neo4j_version" : "3.0.6"

Neo4j browser:

Database info
Favorite queries
Docs
:help
:history
Keyboard shortcuts

:clear the stream (executed commands frames)

:GET /db/data

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "extensions": {},
  "node": "http://localhost:7474/db/data/node",
  "relationship": "http://localhost:7474/db/data/relationship",
  "node_index": "http://localhost:7474/db/data/index/node",
  "relationship_index": "http://localhost:7474/db/data/index/relationship",
  "extensions_info": "http://localhost:7474/db/data/ext",
  "relationship_types": "http://localhost:7474/db/data/relationship/types",
  "batch": "http://localhost:7474/db/data/batch",
  "cypher": "http://localhost:7474/db/data/cypher",
  "indexes": "http://localhost:7474/db/data/schema/index",
  "constraints": "http://localhost:7474/db/data/schema/constraint",
  "transaction": "http://localhost:7474/db/data/transaction",
  "node_labels": "http://localhost:7474/db/data/labels",
  "neo4j_version": "3.0.6"
}

:GET /db/data/labels

1
2
3
4
5
[
  "user",
  "Person",
  "node"
]

Neo4j shell:

The neo4j-shell allows you to access the browser via a command line. This is much needed relief from the heavy and slow browser GUI. Here it can be seen returning an entire node, a node's name, or a property, search_history, of a node.


Performance:

Profiling:

Using the profile command returns some basic information concerning the speed at which the query runs. At the bottom of the table you can see a value, "Total database accesses", which is useful in estimating the efficiency of you queries.

The first image shows a query that is not restricted in any way, so the search end up looking through every possible node AND checking the WHERE clause. This results in almost 1000 database accesses.

1
2
3
MATCH (n)
WHERE n.name = "Hillary Clinton"
RETURN n;

The second image shows a query that is restricted to "Person" nodes only. This simple restriction pares the database accesses down 10 fold, to about 100.

1
2
3
MATCH (n:Person)
WHERE n.name = "Hillary Clinton"
RETURN n;

Indexing:

You can further decrease the databases accesses by using INDEX.

Without indexing, even a search restricted to a node type and particular name still has to scan all of the potential nodes that could fit the bill.

1
2
3
MATCH (n:Person)
WHERE n.name = "Hillary Clinton"
RETURN n;

After we create an index, the database has a much faster way to find nodes. It should be noted that the index you use must be unique.

1
CREATE INDEX ON :person(name)

Now you only have to scan a handful of nodes at most to get what you are looking for.

1
2
3
4
MATCH (n:Person) 
USING INDEX n:person(name)
WHERE n.name = "Hillary Clinton"
RETURN n;

Backup:

Backup

1
2
sudo service neo4j stop
rsync -r /var/lib/neo4j/data/databases/CONected_01.db/ ~/db_backups/

Restore

1
2
3
sudo service neo4j stop
sudo rsync -r /home/egg/tmp/ /var/lib/neo4j/data/databases/CONected_01.db/
sudo chown -R neo4j /var/lib/neo4j/data/databases/CONected_01.db/

Multiple Databases

1
2
dbms.active_database=graph.db    # The name of the database to mount
dbms.active_database=new_one.db  # add new










[About] [Contact]