Possible Approaches for Allowing Rich Queries in Ethereum dApps

Blockchains are very good at storing data in an immutable way and for random access. But if you need to perform rich queries on them it can get tricky.

The easiest way to illustrate this is with CryptoKitties. Retrieving a CryptoKitty with a specific ID is easy simply use getKitty(kittyId). But if we need something more complex like all kitties with a certain attribute or even simply all kitties then it starts to get tricky.

There are a number of possible approaches to deal with this which I will detail below.

Approach 1 - Offchain DB

The first approach is to:

Use an offchain DB to store your chain data in a traditional relational database.
- This database will be keep the most up to date version of the given on chain entity. If there is ever a reconciliation issue the onchain value is the version of the truth.
- The offcain DB is purely for reads, it should never be updated directly.
  - Instead your offchain server (that interacts with the DB) should listen to all required Ethereum events and update the state of an entity accordingly.

For example with CryptoKitties we would have:

An offchain:
- API server:
  - Which would listen for the following events to update the offchain DB entities appropriately:
    - Transfer and Approval: for owner data.
    - Birth: for new kitty data.
  - This API server should be very light and have minimal business logic.
    - The smart contract code should after all enforce business rules with or without your offchain server.
Traditional database
- Which stores kitty and owner data in the following tables:
  - ownerTable
  - kittyTable
  - ownerToKittyTable

Pros

You can run rich queries on your chain data.

These queries run very fast.

Returning multiple complex objects is not a problem.
The dApp portion of this is very light.
A standard layered architecture can be used

This is very mature and allows things like caching and other techniques to deal with performance bottlenecks.

Cons

There is some centralisation with part of the dApps dependencies being moved offchain.

But business rules should never be enforced offchain.
- Assume and allow for anyone to attach a dApp to your contracts.

This can add complexity as now you need to worry about this offchain infrastructure as well as update your dApp to call the chain for some things and the offchain server for others.

Approach 2 - Frontend DB and Event Sourcing

Another way of dealing with this issue is by using a front-end DB like PouchDB or GunDB. This would be done as follows:

On first login from a given browser/device:

Play all events related to that user:
- If the event is related to an entity:
  - That does not yet exist in our frontend DB then add it to our DB.
  - That already exists in our frontend DB then update our entry to reflect the newest state.
- Save the last block number that we have the state for.

Listen to all required events in the UI from the last recorded block number:

When an event is caught:
- If the event is related to an entity:
  - That does not yet exist in our frontend DB then add it to our DB.
  - That already exists in our frontend DB then update our entry to reflect the newest state.

Render the UI based on the values stored in our frontend DB.

Pros

This is fully decentralized - there is no offchain dependency at all

Cons

This adds some complexity in terms of reconciling our frontend DB's value with the offchain value.
If there is a lot of data the initial load on a new device could be pretty long.
Large amounts of data could cause performance issues with he dApp due to the dApp being much heavier.

Conclusion

The first solution mixes traditional offchain and onchain tech which allows traditional means to be used for rich queries and dealing with performance bottlenecks. Yet this introduces some centralisation. The second approach is fully decentralised but is not ideal for dApps with plenty of data as this data needs to be stored and synced with a frontend DB.

Neither approach is necessarily right or wrong. As with many things in programing, tradeoffs with different approaches need to be considered together with the pros and cons of each solution. These tradeoffs have to be considered in the context of the problem being solved and based on this a decision on approach taken.