I have done some comprehensive research on using CQL and Spark SQL.
Cassandra is a write optimized database. Cassandra user guides suggest people to duplicate data if it optimize the read operations. CQL or Cassandra APIs does not provide support for JOIN operations across multiple tables. Which means running complex queries across multiple tables can’t be supported. This is purely based on the NoSQL concepts. As I shown earlier, spark SQL allow users to write complex SQLs to do more drill down analysis. If we use CQL, we only have ability to run CQL on single resources only.
Complex query example which selects patients based on valueQuantity on Spark SQL.
SELECT patient.id, observation.id, observation.subject, observation.valueQuantity FROM patient inner join observation where observation.subject.reference == patient.id and observation.valueQuantity.value > 15
Another limitation of Cassandra is that, WHERE clause only can be used with indexed columns. As an example, if user want to search patients by firstname, then firstname should column should be indexed. Likewise, if we needs to provide more query options based on columns, we will need to create secondary indexes. Cassandra doesn’t encourage people to create large number of indexes as it will affect the performance of write queries.
Cassandra CQL provides option to filter non indexed columns via ALLOW FILTERING option. But usage of this isn’t highly encourage.
According to the online resources, it suggest to use Spark as a viable option to execute complex queries.
@sunbiz @yashdsaraf I think we will need to go through our data model of storing patient carefully. According to my understanding with FHIR, we will need to support basic search operations based on resource attributes. If our data model doesn’t fit in, it will create performance issues when data grows up. I’m going to write a blogpost about my findings.
According to the aforesaid limitations, I think Spark is the viable option. Are we going to have a modular approach in LibreHealth? If so we might fit FHIR Analytics as a separate module and use it appropriately.