Graph Cache: Caching data in N Dimensional structures

  1. Its about how the data is related which is always important to know
  2. They are great for generating insights as the relational metadata gives the necessary context to connect the dots.
  3. You start thinking in terms of Vertices and Edges which are extra dimensions to your data as opposed to just storing Key Value pairs or rows of data mapped by a primary key.
  4. A lot of data redundancy is eliminated intuitively
  1. Query optimization options are limited although this challenge is certainly being addressed in newer versions.
  2. Data partitioning is a tough proposition so horizontal scaling becomes challenging.
  3. Computationally intensive or costly queries — Most real-world graphs are highly dynamic and often generate large volumes of data at a very rapid rate. One challenge here is how to store the historical trace compactly while still enabling efficient execution of point queries and global or neighborhood-centric analysis tasks.
  4. Queries spanning multiple hierarchies can be time consuming and no suitable for real time querying.
  5. Design is only as good as the implementation and this is where the human element comes in. Not everyone can think in terms of graphs.

The Background

  • Node to Edges relationships
  • Caching entire subtrees against a unique index value

Example Number One:

Fig. 1
Fig. 1
Fig. 2
  1. I have avoided repetition of values of Stock name and Days traded.
  2. I have also given meaning to the relation ship by adding metadata to the edges.
  3. I now have temporal data spanning across 7 days (I only show 4 in the image for sake of simplicity) and can query my Graph to get data for any day I require.

Now what if I were to cache what I am looking for up front?

Fig. 3
  1. Flexibility in querying and doing it intuitively
  2. Data partitioning
  3. No longer computationally intensive as unique keys now access slices of deeper relationships from the deeper Graphs.
  4. Faster and performant querying.

Example Number two:

  • Route to Node/Page mapping
  • Index to Node/Page mapping
  1. Fast retrieval of page navigation routes from Server side instead of maintaining this information in the JS code which makes it configurable and changeable even as the application is running. This will allow dynamic restructuring of an entire website at runtime which can provide a unique experience to the user even as he is navigating though the site.
  2. Maintaining Static and recent historical data for the Site in the graph cache in server side memory lends to near real time retrieval of website data, not having to worry about Timeouts from the javascript code and building intelligence around recovering from failure scenarios when data might not be available from an upstream system but since it is already present in the cache, the website does not suffer from lack of data to display.
  3. Intelligently creating indexes around the data stored in the Webpages allows application and website developers to decide what graphs need to be cached and the depth of the Graph itself so that the way information is stored and fetched from the Cache is customizable, flexible and open to extension by development teams.

Several other use cases exist such as:

  1. Caching Application dependencies when your microservice depends on several upstream and downstream applications and you want to maintain health and other metrics information in the Graph.
  2. Storing data for significantly faster retrieval when training an ML algorithm that relies heavily on th relationship between data points stored in nodes.
  3. Storing JSON schemas as relationships when talking to databases like mongo db.

Some best practices..

  1. Have the cache in memory to keep it most performant
  2. Not caching deep graphs as that would hit the performance
  3. Storing the data as intuitively as possible.
  4. Ensure data partitioning for scale at the time of storing

Sharding and Distributed in Memory Caching

Summary

References:

  1. https://graphql.org/
  2. https://openproceedings.org/2017/conf/edbt/paper-119.pdf
  3. https://neo4j.com

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Microsoft Excel — Powerful tool for Data analytics

How to facilitate understanding of data? Data visualization [Examples and tools]

Poor Oversight Leads To A State Wasting $365k On A Broken Model That Archaeologists Are Told To Use

The Journey of a Data Scientist: Chiller Surge Counts

Must know WOMEN in the Data Science Industry!!!

Toward the Information Age

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ritesh Shergill

Ritesh Shergill

More from Medium

Threads in Java…

Circuit Breakers: the What and the Why?

Factory Pattern (case study: Menu Creation for specific role)

Testing — Unit Testing