designates my notes. / designates important.
Uses neo4j 2.2
Although old, this book provided an overview of both the cypher and neo4j’s technical side.
A whole book could be written demonstrating uses of neo4j in Flask or Django. The overviews there seemed too condensed. I would have rather had more details for the topics in chapter 7, on deployment/spanroduction options.
neo4j-shell
neo4j-shell -path
neo4j-shell -pid
neo4j-shell -readonly: This option connects to the local database in the READ ONLY mode.
neo4j-shell -c
neo4j-shell -file
neo4j-shell –config -
The various phases of the Cypher query execution are listed as follows:
Parsing, validating, and generating the execution plan
Locating the initial node(s)
Selecting and traversing the relationships
Changing and/or returning the values
Let’s consider another scenario where we want to count the number of nodes with a unique name:
MATCH (x:MALE) WHERE x.age is NOT NULL and x.name is NOT NULL RETURN
count(DISTINCT x.name);
MATCH (n:MALE {name: "Andrew", age:24}) remove n.age return n;
MATCH (n)
where n.name IN["John","Andrew"] and n.age is Not Null
return n;
MATCH (n)
where n.name =~"J.*"
return n;
MATCH (n)return n ORDER by n.name, n.age SKIP 3 LIMIT 2;
MATCH (x{ name: "Bradley" })--(y)-->()
WITH y, count(*) AS cnt
WHERE cnt> 1
RETURN y;
MATCH (x:MALE)-[:FRIEND]->() return x.name, labels(x)
UNION
MATCH (x:FEMALE)-[:FRIEND]->()return x.name, labels(x);
MATCH (x{name:"Bradley"})-[:FRIEND]->(friend)<-[:FRIEND]-(otherFriend)
return distinct friend.name as CommonFriend;
How aare x and otherFriend identified as being friends?
the next example seems right, with the WHERE identifying the relationship (or lack of) between me and otherFriend
I am Bradley and I want to know the people who are friends of my friends but are not my friends:
Match (me{name:"Bradley"})-[r:FRIEND]-(myFriend),(myFriend)-[:FRIEND]-(otherFriend)
where NOT (me)-[:FRIEND]-(otherFriend)
return otherFriend.name as NotMyFriends;
MATCH (movie:MOVIE)<-[r:HAS_RATED*0..]-(person)
return movie.name as Movie, count(person)-1 as
countOfRatings order by countOfRatings;
MATCH (x{name:"Bradley"})-[:FRIEND]->(friend)-[r:HAS_RATED]->(movie)
return friend.name as Person, r.ratings as Ratings,movie.name as Movie;
In contrast to CREATE UNIQUE, MERGE can work upon indexes and labels, and can even be used for single node creation.
The MERGE clause was introduced in Neo4j 2.0.x and may replace CREATE UNIQUE.
MATCH (f:FEMALE {name: "Sheena"})
SET f:NONVEG
return f;
MATCH (f:FEMALE {name: "Sheena"})
REMOVE f:NONVEG
SET f:VEG
return f;
We can also add multiple labels by separating them with a : symbol. For example, let’s assume we also need to add the country as a label for the node Sheena, so the previous SET statement can now be rewritten as SET f:VEG:US.
Unlike properties and labels, there is no syntax for updating relationships. The only process to update relationships is to first remove them and then create new relationships.
is this true in 3.x+?
Indexes are leveraged automatically by Cypher queries.
The following Cypher statement creates an index on label MALE and property name:
CREATE INDEX ON :MALE(name);
For listing the available indexes, execute the following command on your
neo4j-shell: Schema ls
The following Cypher command can be used to delete the indexes:
DROP INDEX ON :MALE(name);
MATCH (n:MALE)
USING INDEX n:MALE(name)
where n.name="Matthew"
return n;
Index sampling is the process where we analyze and sample our indexes from time to time, and keep the statistics of our indexes updated; these keep on changing as we add, delete, or modify data in the underlying database.
We can instruct Neo4j to automatically sample our indexes from time to time by enabling the following properties in <$NEO4J_HOME>/conf/neo4j.properties:
index_background_sampling_enabled: This is a Boolean property that is by default set to False. We need to make it True for automatic sampling.
index_sampling_update_percentage: It defines the percentage size of the index which needs to be modified before Neo4j triggers sampling.
We can also manually trigger the sampling from neo4j-console by using the schema command:
schema sample -a: This will trigger the sampling of all the indexes
schema sample –l MALE –p name: This will trigger the sampling on the index defined on label MALE and the property name
Append –f to the schema command to force the sampling of all or a specific index.
We can analyze the execution plan by two different ways:
EXPLAIN: If we want to see the execution plan of our Cypher query but do not want to execute it, then we can prefix our queries with the EXPLAIN keyword and it will show the execution plan of our Cypher query but will not produce any results
PROFILE: If we want to execute our queries and also see the execution plan of our Cypher query, then we can prefix our queries with the PROFILE keyword and it will show the execution plan of our Cypher query along with the results
For example, let’s understand the execution plan of the following query, which finds a person by the name Annie:
PROFILE MATCH(n) where n.name="Annie"
return n;
How to interpret the EXPLAIN/spanROFILE results
Compiler CYPHER 2.2: This tells us the version of the compiler which is used to generate this explain plain.
Planner COST: This tells us that Neo4j is using cost based optimizer and the next set of statements will show the execution plan of our query.
Filter: This is the starting point and it signifies that the provided query will use a filter to produce the results.
AllNodesScan: This is the second step within Filter and signifies that Cypher will be scanning all the nodes for generating the results. If you are familiar with Oracle then it is similar to Full Table Scan (FTS) shown in the explain plain of SQL.
Operator: This shows the kind of operators used for the execution of the query. In the screenshot being discussed, it shows two operators—Filter and
AllNodesScan. Depending on the given Cypher query, a different filter will be applied.
EstimatedRows: This defines the estimated number of rows that need to be scanned by a particular filter.
Rows: This defines the number of actual rows scanned by the filter.
DbHits: This is the number of actual hits (or I/O) performed for producing the results by a particular filter.
Identifiers: This refers to the identifiers defined and used for each filter.
Other: It refers to any other information associated with the filters.
PROFILE MATCH(n) where n.name="Annie"
return n;
AllNodesScan which is not at all good to have in production systems. It will result in a very heavy operation where it will scan all the nodes, which means the complete database.
User (n:LABEL) instead of (n) to reduce scan range.
PROFILE MATCH(n:FEMALE) where n.name="Annie"
return n;
The filters have changed and now it is using NodeByLabelScan, which is much better as it is now filtering upon labels and then by the property in the where clause. So, no more full scans.
The EstimatedRows, Rows, and DbHits values have significantly reduced
Creating an INDEX on FEMALE will be even faster
Now the query filters are again changed and it is using NodeIndexSeek, which is further leveraging our newly created Indexes.
As a result, the total DbHits value has reduced to just 2 from 28 (18+10), which means that the total cost of the query has improved by 85 percent.
there is no golden rule for performance optimization
The following are a few of the common methods exposed by the py2neo.cypher. CypherTransaction class:
begin(): Starts a transaction and returns the object of py2neo.cypher. CypherTransaction.
append(): Appends the Cypher queries to the existing transactions.
commit(): Sends all the Cypher queries in a transaction to the server and marks the transaction as completed.
rollback(): Rollbacks all the changes within the current transaction.
process(): Intermittently sends few transactions to the server and leaves the transaction open for further statements. It can be used to form a process where we can process multiple transactions in batches.
Following is the code snippet for using transactions for Cypher statements: note that this is using neo4j 2.2 and py2neo 2.0
def executeCypherQueryInTransaction():
print("Start - execution of Cypher Query in Transaction")
#Connect to Graph
graph=connectGraph()
#begin a transaction
tx = graph.cypher.begin()
#Add statements to the transaction
tx.append("CREATE (n:Node1{name:'John'}) RETURN n")
tx.append("CREATE (n:Node1{name:'Russell'}) RETURN n")
tx.append("CREATE (n:Node1{name:'Smith'}) RETURN n")
#Finally commit the transaction and get results
results = tx.commit()
#Iterate over results and print the results
for result in results:
for record in result:
print(record.n)
print("End - execution of Cypher Query in Transaction")
def testIndividualNodes(self):
#Define a Node which we need to check
bradley = Node('MALE','TEACHER',name = 'Bradley',
surname = 'Green',
age = 24,
country = 'US')
#Now get the Node from server
# would be graph.run in newer py2neo (3.0+?)
results = self.graph.cypher.execute('''MATCH (n)
WHERE n.name='Bradley'
RETURN n as bradley''')
#Both Nodes should be equal
self.assertEqual(results[0].bradley, bradley)
file system should also support features such as flush (fsync, fdatasync) (http://en.wikipedia.org/wiki/Sync_(Unix)), and therefore Neo4j recommends using at least an ext4 filesystem (http://en.wikipedia.org/wiki/Ext4), but better would be to use ZFS (http://en.wikipedia.org/wiki/ZFS).
Refer to https://structr.org/blog/neo4j-performance-on-ext4 for more information on performance improvements on ext4.
caching options: file buffer and object cache
File buffer = writes to cache only, flushes to disk on logical log rotation. Imporves performance by optomizing writes in batches.
Various caching config options
Object cache = Uses a java object. Faster?, uses JVM heap.
More cache options
Java packages that make up neo4j
All endpoints are relative to http://
The default representation for all types of request (POST/spanUT) and response is JSON.
In order to interact with the JSON interface, the users need to explicitly set the request header as Accept:application/json and Content-Type: application/json.