What this blog covers:
- Importance and role of OLAP in the times of AI
- When the magnitude of queries rises, how OLAP helps with complex use cases
- How OLAP helps improve query performance amid the rise of LLMs
Overview
In the era of data analytics, pre-aggregated OLAP (Online Analytical Processing) has stood as a stalwart of enterprise business intelligence (BI) platforms for over a quarter of a century, delivering consistently rapid “speed of thought” response times for staple slice and dice data requests. , However, its significance has been somewhat overshadowed in the recent decade with the advent and rapid rise of Big Data and highly scalable technologies such as Hadoop, in-memory databases like Spark and platforms like Snowflake. These newer technologies offered improved scalability and performance, particularly with large and complex data sets. That led to a shift in focus away from OLAP as the centerpiece of BI, a position it had enjoyed in the 2000s.
Even though data volumes have exploded over the last decade or so of the Big Data era, the new technologies pretty much kept their promise of scalability very well, readily handling the bulk of use cases. They could also handle the outlier cases involving many billions of rows of data reasonably well enough – most of the time. However, today with the even more rapid rise of artificial intelligence (AI), pre-aggregated OLAP is primed for a resurgence.
Why Does OLAP Still Matter?
The proposition of this article is that the complex nature of AI query resolutions (as opposed to simple data retrieval) and their exponentially increasing volume will result in the need for vastly superior query concurrency of BI data . That solution is our old friend, pre-aggregated OLAP, the data warehouse accelerator. Pre-aggregated OLAP shines the brightest when the typical volume of query load exceeds the capability of the typical BI system. I’ve written earlier about how AI will release much of what was locked in unstructured data into our BI databases in an earlier article How AI Developments Impact BI Data Volumes. Additionally, these earlier articles, All Roads Lead to OLAP Cubes… Eventually and Why do we need OLAP on Cloud?, describe how OLAP cubes substantially benefit an analytics infrastructure.
OLAP stands on the precipice of transitioning from a vital tool for human-driven data analysis to an indispensable component of the infrastructure underpinning AI operations. That’s because AI, like its human counterparts, will still require staple BI data for its “speed of AI” thinking.
The Role of OLAP in the AI World
As AI continues to evolve and exponentially expand its analytics reach, the unmatched performance capabilities of pre-aggregated OLAP become increasingly crucial. Pre-aggregation yields fast results by minimizing compute through the drastic reduction of data reads, which reduces response time, and relevant to this article, releases resources to service a high concurrency of queries. That plays a key role in handling what will be a staggering volume of AI-initiated BI queries.
A Few Magnitudes of Query Volume Increase
The ascension of AI is drastically accelerating the analytics processes. Large Language Models (LLMs), such as ChatGPT, are becoming mainstream, leading to an unprecedented increase in query volumes in several ways.
Consider an enterprise with a few hundred (human) analysts. These analysts may spend from several seconds to minutes between queries, waiting for results and pondering a resultant data visualization. This scenario dramatically changes when an LLM AI comes into play, processing hundreds of queries simultaneously. For an AI model, a query will be more than just a lookup. It’s a multi-step, iterative process not much different from problem-solving with our brains.
Following are short discussions on three factors, each potentially adding a magnitude to BI query call volumes.
Opening the Door to More than Just Analysts and Data Scientists
LLMs remove significant communication friction between humans and BI databases, which enables more than just human analysts and data scientists to access BI data. Mitigating the need for sophisticated visualization tools such as Power BI and Tableau, a non-trivial skill in its own right, opens it up to all information workers who can query in their spoken language.
Other forms of BI data sources (e. g. data warehouse solutions without pre-aggregation) will be strained to keep pace , revealing a new, crucial role for pre-aggregated OLAP. The rapid response time and high query concurrency pre-aggregated OLAP offers are critical. It is critical for human analysts and will be for AI. Any unnecessarily long delays in response impede the process of solving complex problems.
The enabling factor of the ability for LLMs to access BI data sources is the recent emergence of plugins for Large Language Models (LLMs), for example, ChatGPT Plugins, which has opened profound avenues for AI analytics by enabling access to a broad range of data sources. These plugins can harness the power of domain-specific LLMs, libraries of software functions, and even computational engines like Wolfram Alpha. In the case of this article, that list would include incorporating pre-aggregated OLAP cubes as reference sources for LLMs capable of delivering staple slice and dice values to an AI in sub-second time at high concurrency.
It’s not necessarily the case that an AI will formulate insights and strategies the same way as humans, only much faster. However, it’s hard to imagine that AI would not require the same ad-hoc metric values sourced from BI data sources to make data-driven decisions. It would be a shame to witness an AI waiting for many seconds to minutes for the results of its queries.
Complex Prompting
With a few decades of the socialization of BI querying and Google searches, we’re trained to think of queries as just one-shot lookups. We have a problem to solve in our heads. We require a few pieces of information to connect the dots. We obtain each piece from simple BI queries and/or web searches. The critical part of connecting those dots happens in our human brains.
But AI isn’t just a vast and sophisticated data source. It’s a mind in its own right, even if not yet sentient. It’s capable of connecting the dots and coming to a conclusion. At this moment, AI may not be to the point where we should automatically trust its conclusions, but its conclusions can still be rather insightful today. So, a query to an AI can be much more than just a question-and-answer, one-shot deal.
Unlike one-shot queries common with web search engines, more sophisticated prompting techniques like Chain of Thought and Tree of Thought have emerged for compound queries. The latter prompts the AI to explore multiple approaches to the same problem in parallel, pruning what appears to be fruitless paths along the way. If the AI employs BI data sources through plugins, this iterative and concurrent process intensifies the load on traditional BI systems which are not optimized with pre-aggregation.
Pre-aggregated OLAP cubes will be best suited to handle a barrage of queries from an AI operating at “AI speed”. As mentioned, pre-aggregations drastically reduce query-time compute, which reduces query response time and frees up resources to serve many queries concurrently.
Knowledge Graphs and Strategy Maps
Structures such as knowledge graphs and strategy maps (a kind of object from the performance management domain) are sophisticated virtual models of the world. The notion of them has been around for years. However, they’ve been elusive as they are hard enough to develop when things remain static but almost impossible to maintain as things are constantly changing.
These structures would be the AI counterpart to the models of the world we have in each of our brains. They are our theories of how things work. They are built through years of experience and training.
To me, the ability to create and maintain such structures is the Holy Grail of an analytics system. With LLMs and machine learning (ML) tools—such as Azure AutoML—that remove much of the drudge work and AI plugins, I don’t think we’re far off.
Building and maintaining such structures dives deep into a sea of queries, pulling in data from various sources to create a comprehensive picture. For example, an AI model could drive the creation of an enterprise-wide strategy map of theoretical cause and effect. It is a model of how the enterprise’s strategy is supposed to work, ideally a holistic view.
Within a strategy map are gems of knowledge such as: If employee satisfaction increases, product quality may improve, leading to higher customer satisfaction, leading to improved retention, leading to consistently higher profits, etc. That type of information is drastically more intense than simply, “My total sales in CA for bikes is $750,000.”
As mentioned, effective strategy maps have been elusive as it’s hard enough to capture the first time, not to mention keeping it up to date. An AI attempting to build and maintain such a structure will employ multiple ML experiments against various combinations of attributes sourced from BI data. It’s much more than building an ML model through many iterations of different combinations of hyperparameters and feature sets. Each ML model is pretty much a node or two in one of these knowledge structures.
The compute-optimized nature of pre-aggregated OLAP cubes serving up staple BI data to an AI goes a long way toward the feasibility of building and maintaining such immensely valuable structures.
Conclusion
As AI becomes more accessible to the general audience, the need for fast and reliable data sources will only grow. Pre-aggregated OLAP cubes, despite not being able to answer all types of questions an LLM may pose, can provide essential slice-and-dice data rapidly. Further, the data contained within OLAP cubes typically is of vital importance to the enterprise, which has invested significant effort and expense into creating these cube objects.
The somewhat forgotten importance of pre-aggregated OLAP over the past decade of the Big Data era will emerge as a critical factor in the AI era. In a world where AI reduces the friction between humans and data sources, the optimization of pre-aggregates will be paramount for smooth, uninterrupted AI operations. In this context, OLAP’s extreme performance will become an essential resource for enterprises, handling massive query volumes at sub-second times.