Complex Data Relationships with Neo4j

Neo4j to be an ideal solution for healthcare data analysis. Cypher queries simplified and expedited data retrieval, showcasing impressive analytical possibilities.

I found myself facing a dilemma while working on a data model to incorporate medical nomenclature records (body structures, procedures, disorders, etc.), their associated relationships, and additional Healthcare data. I feared that the complexity of this data would require an overly complex RDBMS or MongoDB solution, thus making data loading and retrieval sub-optimal. The amount of logic to identify and maintain the relationships and hierarchical structures render my initial approach too costly.

My ultimate goal: I want to be able to find every patient related by diagnosis, by body structure, by hospital, and by geographical location with cost aggregations. The key is the relationships. I need a solution designed with relationships at its heart.

Neo4j opened up analytic possibilities that were otherwise unobtainable.

This is when I came across Neo4j.  Neo4j opened up analytic possibilities that were otherwise unobtainable. My complex data set was perfect for Neo4j (vise versa).  Going from concept to a functioning  data load was quick and easy with Neo4j and the R package RNeo4j . It took me a few days to understand the data and then a few hours to build the R code to load Neo4j.

 

Neo4j Simplifies Healthcare Data

For this example  I am only focusing on the back body structure – 3 relationships deep resulted in ~70,000 records. I  captured Texas hospitals using R and Google Places API Using R and Google Places API Geocode Locations. The patients were generated by  R code, not real people.

Relationships: A County ‘is in’ a State, Hospital (Providers) are ‘Located In’ Counties, Patients ‘Visit’ Hospitals and ‘Live In’ Counties. Patients are also ‘Diagnosis’ based on ICD-10 / ICD-9 mappings to medical disorders, which in-turn have complex relationship structure.

 

Partial Relationship Graph
Partial Relationship Graph

Above is a partial diagram of the graph data. Large red nodes are the states, which connect to the light blue counties. The green nodes are the patients, which connect to the large purple hospitals and to the counties. Patients also connect to the red diagnosis which in-turn connects to the large yellow body structures. It looks so simple and accessible in the diagram, and it is with Cypher.

Cypher Queries

It is hard to explain how awesome it is to use Cypher to match records that otherwise would have either required multiple table joins in a large SQL query or just been unrealistic to identify in a timely manner.

This might be the coolest thing ever…with regards to data analytics. Two lines of code can traverse ‘n’ relationships deep to find the shortest path.

MATCH(h:Hospital {providerId:10}),(b:body_structure {conceptId:2748008}),
p=shortestPath((h)-[*0..10]-(b)) RETURN p
Using ShortestPath Cypher function
Using ShortestPath Cypher function

 

The above query uses the Cypher ‘shortestPath’ function to find the shortest relationship path from the hospital “UT Southwestern Medical Center” (id 10) to the Spinal cord body structure (id 2748008)

This only returns the one result. I can instead easily find all patients that have visited “UT Southwestern Medical Center” and received a diagnosis linked to the Spinal cord body structure.

MATCH(h:Hospital {providerId:10}),(b:body_structure {conceptId:2748008}),
p=allShortestPaths((h)-[*0..10]-(b))
WHERE NONE (r IN rels(p) WHERE type(r)= "Lives In")
RETURN p
using allShortestPaths Cypher function
using allShortestPaths Cypher function

This Cypher code uses a predicate to ensure my path does not jump to a different hospital by the county node and the ‘Lives In’ relationship. If I took the predicate off, it is possible for Patient 11 to connect to a county shared by another hospital with patients that match my body structure condition.

Conclusion

I am finding many different ways to analyze data that was otherwise cumbersome to bring together. So far I am impressed.

This is a very brief use case on how I am using Neo4j with complex Healthcare and medical data. Please let me know if there is interest in a deeper dive into R and Neo4j.

Regards,

Jonathan

Leave a Reply

Discover more from Stochastic Coder

Subscribe now to keep reading and get access to the full archive.

Continue reading