What is Caching in Semantic Layer?
Caching refers to storing copies of data in a temporary storage location so that future requests for processing and retrieval of the same data become faster. In the context of a semantic model, the results of frequently queried data are preserved based on the semantic meaning or context. This cached data can be accessed by the model to deliver results when an identical or similar query to any previous query is run by the user.
The purpose of caching is to expedite data retrieval and enhance query performance. While users interact with the data model by running queries, the cache will go on to store the results of all queries. Any subsequent query that matches a cached one will retrieve results significantly faster from the cache, bypassing the need to access the underlying data source again.
As part of the semantic layer, the intelligent caching method reduces the number of times the system must access a data source to get results. Since there are no external caching layers, the data stays in the same location and queries can be scaled, based on the cloud platform used. Consequently, it brings more efficiency in query processing, use of process resources and overall system performance. Kyvos offers many types of caching mechanisms to build and consume semantic models faster, with more accuracy.
Types of Caching in Kyvos
The caching mechanism accelerates the process of data retrieval by bypassing the slower data storage and accessing data directly from the in-memory storage. The cache is populated with data at boot-up and synced with the main repository every 15 minutes. Eventually, the cache will be updated or invalidated whenever the underlying database changes to provide up-to-date results. Through extensive caching, the system can improve query performance exponentially and optimize resource utilization. Thus, through the application of caching in multiple layers in the different modules, the various types of cache in Kyvos are:
BI Server Caching
BI server cache is utilized for caching pre-computed queries as well as static information, such as user permissions. All the data that is not prone to continuous changes becomes a part of this cache, and static data does not take up much space. The cache stores pre-computed results of all the queries. When a similar one is executed, the BI server cache directly furnishes the results and does not direct the query to the query engine. In case a particular query is run for the first time, the BI server refers the query to the query engine. Under the static and non-static data stored in the BI server cache, the various types of caches are –
- Entity Cache: Within each BI server, an in-memory cache is maintained to store information about entities, such as datasets, relationships, semantic models, worksheets and workbooks. It prevents costly and resource-extensive hits on the repository. The entity cache is populated with data during the BI server bootup and is updated whenever there is a change in an entity. The changes include events like creating a new dataset, modifying an existing one or deleting an entity entirely.
- Folder Cache: Reasonably related entities grouped for improved collaboration and access to data are in the folder cache. It is dynamically updated whenever a folder’s state changes. This cache is vital for streamlining folder management by providing a fast and efficient way to retrieve folder details.
- Entity Properties Cache: An expansive range of properties applied to entities helps control their performance and resource utilization. The entity properties cache stores different properties of entities so that the process of accessing the entity configuration settings is eased.
- Users’ Cache: It holds user details like usernames, permissions and group memberships. These details determine a user’s access rights and govern access restrictions. To make security checks a latency-free process, the users’ cache makes verification of user permissions faster without compromising performance.
- Group Cache: Managing multiple users and executing batch operations are made easier by organizing users into groups. The group cache holds this information to streamline user administration tasks later.
- Access Rights Cache: All shareable entities are assigned permissions and stored in the access rights cache to make it readily available at bootup. These permissions determine which users can view, modify or are to be denied access to specific entities.
- Data Security Rules Cache: The rules for data security can be specified to control access to individual data points. When a user is browsing data, rather than retrieving this information from the repository, they can find these rules cached in the data security cache so that the browsing performance is not hindered.
- Cuboid Distribution Cache: Effectively fulfilling a browsing request received by the system requires essential details about cuboids that include dimensions and measure sets they belong to, query engines where they are replicated and other filtering criteria. The cuboid distribution cache makes this information readily available which improves the browsing performance significantly.
- Result Cache: For any queries, the BI server involves data processing costs, and the query engine needs to incur the costs of transferring it across the network. Additionally, as the data cardinality rises, the associated costs also increase. The result cache helps mitigate these costs by storing the results of previously executed browsing requests at BI server level.
Query Engine Caches
Optimizing system performance and improving query response times are the main functions achieved through the query engine cache. For a user running any query, it is first sent to the BI server which checks whether this query has been completed before. If yes, the results are served from the BI server or Result cache. If not, the query is sent to the query engine.
The system prioritizes checking the caches with the fastest access times and expedites the data retrieval process. Leveraging this multi-layered approach ensures efficient data retrieval in a cascading manner. The various types of caches in this category are –
- Look Ahead Cache: Data blocks are used multiple times when serving each individual query. The look ahead cache stores a single cuboid block in memory for the duration of a single query. Since the recently accessed cuboid block is already in memory, the need to read it from the disk multiple times and to decompress it again for each access is eliminated. The result is better and accelerated query performance because the cache makes the cuboid blocks readily available.
- MOLAP Semantic Model Cache: The metadata held within the MOLAP semantic model is initialized whenever active browsing operations are performed. This metadata includes all dimensions, measures and hierarchies of the semantic model. Given that this process is highly time-consuming and memory-intensive, the model is initialized once and then it is stored in the cache. Browsing experience and query response time is vastly improved because this cache eliminates the need to initialize the model every time for subsequent browsing sessions.
- Cuboid Blocks Cache: The cache stores frequently used cuboid data blocks in off-heap memory. It is automatically populated as cuboid blocks are accessed to serve queries. The result is significantly faster retrieval of information from cuboids that enhances the query performance.