What this blog covers:
- How can NLP query interface work for BI data sources using Azure Cognitive Services.
- An overview of the sudden impact of AI on BI and data analytics.
- Assessing BI NLP interface with a ChatGPT-centric approach and CHatGPT plugins before finding a middle ground for mitigating communication friction between human analysts and machine-held data.
Natural Language OLAP Queries
BI NLP Interface using Azure Cognitive Services
This two-part article series describes how to implement a Natural Language Processing (NLP) query interface for Business Intelligence data sources using Azure Cognitive Services. By “Business Intelligence data sources”, I mean that the data is optimized for analytics and most likely modeled in a star/snowflake schema typically found in data warehouses and OLAP cubes.
For the most part, Business Intelligence (BI) users interact with databases through sophisticated visualization tools, such as PowerBI and Tableau, or through a query language such as the various flavors of SQL. The ways in which analysts interact with these visualization tools and SQL require non-trivial levels of skill and significant cost for the visualization tools. More importantly, the interface between human analysts and BI data is rigid compared to the rather loose way we normally communicate in human-to-human conversations. That is, through a spoken language.
It’s not so much that the ability to query BI data sources by a spoken language is tremendously revolutionary. A couple of BI vendors have had this capability for a few years. SQL encapsulates the semantics of querying data very well in an unambiguous manner. But spoken language is loose and forgiving.
Where the real advantage lies in an NLP querying paradigm is the ability to drastically mitigate the requirement for exactness. That’s the appeal of large language models (LLM) such as ChatGPT. We can ask it questions in an ambiguous manner, reducing a great deal of communication friction.
This article is targeted at BI implementers and BI consumers. BI consumers access secured enterprise data towards the discovery of strategic insights and optimization opportunities for their enterprise. As opposed to the general public consuming data intended for public consumption.
The Very Sudden Impact of A.I.
I’ll start this section off by mentioning that the phrase, “at the time of this writing”, is implied for virtually every sentence of this article. “The time of this writing” means late March 2023. ChatGPT burst onto the awareness of the general public just a few months ago. Things are changing very quickly. Already, this month, ChatGPT 4 and ChatGPT Plugins were unleashed. If I suspect that something I’ve written here is likely to be obsolete soon, I’ll toss in an “attotw” into the sentence.
A.I. has been a significant part of our lives for over a decade but operating mostly behind the scenes. It’s only since the release of ChatGPT in November 2022 that A.I. has been readily accessible to a wide audience of people, especially to the majority of non-technical folks in the real world. There’s not much that’s easier for folks at home to create a ChatGPT account and ask it whatever question we have without much concern over the best key words and phrasing.
A BI NLP Interface – Take 1 – ChatGPT-Centric Approach
The first inclination for a BI NLP query interface might be to ask ChatGPT to answer our queries. For BI analysts, the answers to most of their questions wouldn’t be available to a very public resource such as ChatGPT. The queries usually involve internal data not meant for public consumption.
Let’s explore this approach to an NLP interface. We’ll just ask ChatGPT a question. Maybe it’s more omnipotent than we think. Figure 1 shows the response to a question I might ask my business OLAP cube (a demo cube based on the publicly available Iowa Liquor Division data).
Figure 1 – Ask ChatGPT the question directly.
As you see, it gave me its patented BAAAIL excuse (… but as an AI language model …). It’s not yet able to piece together that I have a demo OLAP cube at some URL that can provide the answer, log onto it using many clues I may have inadvertently provided to help guess my password, and issue some query. Or at least, so it seems.
However, as Figure 2a and Figure 2b shows, with a little bit of personalized training (presenting a data schema), ChatGPT can construct a very usable SQL. In Figure 2a, I provide simple schema information as a json.
Figure 2a – Teaching ChatGPT about my cube schema.
Figure 2b shows the SQL it generated for my simple query based on the schema I provided above.
Figure 2b – SQL generated by ChatGPT for my simple query.
That’s quite impressive. Note that the FROM clause doesn’t include any JOINS as it would if my data source were on a relational database and not an OLAP cube. It knows that OLAP cubes can be thought of as “one big flattened table”. I didn’t mention that to ChatGPT.
Let’s push ChatGPT’s mad SQL skills a little more with a tougher question shown in Figure 2c.
Figure 2c – SQL generated by ChatGPT for my simple query.
The SQL does contain two errors. It didn’t rank the classes in s2 and select just the top class for each county nor did it include s2.class in the GROUP BY. However, it managed to impressively fix itself as shown in Figure 2d.
Figure 2d – ChatGPT impressively fixed two errors in the Figure 2c result.
As a Business Intelligence architect/developer, Figure 2c and 2d are both astounding and terrifying – especially since SQL is child’s play for ChatGPT today! It would usually take weeks for someone without SQL experience to learn to readily comprehend the schema, the nature of the question, and construct an almost syntactically correct SQL.
So ChatGPT can:
- Comprehend a database schema described in a loose manner.
- Comprehend the intent of a question I pose to it in English.
- Compose SQL for complicated queries.
- Fill in commentary, advisements, and notable facts that I didn’t think of asking.
But that’s as far as it can go. Fortunately, for better or worse, ChatGPT has guardrails that limit its capabilities. As mentioned specifically, ChatGPT itself can’t directly connect itself to any data source. Thankfully, that includes our private enterprise databases.
That’s a good thing that I conflictingly hope never changes. There is something very unsettling about ChatGPT knowing my enterprise data source schemas, having direct access to my data sources, and most importantly, learning from my query patterns and the results (as it does with all interactions).
Consider though, whatever capabilities ChatGPT lacks attotw will more likely than not be “remedied” in a timeframe of months. With the speed of change over the past few months, there’s hardly anything related to A.I. that I would bet against.
Guess what. That happened a few days ago in the form of ChatGPT Plugins. ChatGPT may not be able to directly access a private database, but it can now ask a plugin to do it.
A BI NLP Interface – Take 2 – ChatGPT Plugins
Software developers can now build ChatGPT PlugIns to extend its reach. For example, ChatGPT has recently been interfaced with WolframAlpha, off-loading heavy math tasks, for which ChatGPT has been laughably poor at. Other examples include the ability to ask Expedia about travel information and ask OpenTable about dining information.
In this case, we could develop a ChatGPT Plugin that would consume a SQL authored by ChatGPT and execute it on the appropriate database. The solution would look something like what is described in the sequence diagram of Figure 3.
Figure 3 – Sequence diagram of a ChatGPT-based NLP OLAP interface.
This approach is along the lines of the sparkling new HuggingGPT. It revolves around a general LLM (ChatGPT) and delegates to other specialized A.I.s.
With this approach, the “inner-workings” of the BI databases is decoupled from ChatGPT itself. However, we have no control over how ChatGPT will be trained in future versions. For example, we can’t count on how changes to the query language will result in mismatches that break the system, or policies related to the utilization of ChatGPT will change.
Worse, we have no control over what it learns about in terms of the internals of our enterprise from the results sent to it from our PlugIn. This currently isn’t a problem since all of the existing ChatGPT PlugIns (attotw) access customer-facing data (i.e. built for public consumption). As opposed to the OLAP BI interface intended to access private internal enterprise data usually accessed by analysts. I feel quite uncomfortable with the idea of placing a very public A.I. technology at the center of my solution.
Now, if we could feasibly support and train private/local versions of LLMs like ChatGPT, the A.I. learning on its own about the private workings of our enterprise would be more of a blessing than a concern. But that’s a whole other whopping subject for another time.
Besides, there is the very temporary obstacle of the fact that the development of ChatGPT PlugIns are in alpha test and by invitation only. I didn’t even bother getting on the waiting list as I’m sure there are tens of thousands ahead of me.
A Safer Middle-ground Approach to a BI NLP Interface
Back in December 2022, before the release of ChatGPT PlugIns and HuggingGPT, I developed a demo solution based on Microsoft’s Azure Cognitive Services. But I do wonder how MSFT’s big investment and partnership with OpenAI will affect the roadmap of Azure Cognitive Services. In Part 2 of this blog, I present a solution centered around the Azure Cognitive Services “stack”.
At the time of this writing, A.I. is moving faster than I believe the vast majority of us intuitively think is a safe pace. In fact, Elon Musk and other technology leaders just pitched for a six-month time-out on A.I. research so the world can carefully think this through a bit. The approach I present in Part 2 is a safer middle-ground for mitigating communication friction between human analysts and machine-held data through the incorporation of A.I. into Business Intelligence implementations.
Conclusion
In this article, and a demo in Part 2, we delved into how natural language processing (NLP) interfaces can be used for querying BI data sources (OLAP cubes or data warehouses). With easy-to-use A.I. tools such as Azure Cognitive Services, businesses can develop NLP interfaces that enable users to interact with their BI data sources using voice commands or typed queries in natural language. This mitigates communication friction between humans and data sources.
This article is the final piece of a trilogy discussing how Azure Cognitive Services applied to Business Intelligence. Following are links to the related articles:
-
- The Effect of Recent A.I. Developments on BI Data Volume. – A.I. has the potential to revolutionize BI implementations by unlocking insights from unstructured data, leveraging computer vision for market basket analysis, and providing NLP interfaces for querying data warehouses or OLAP cubes.
- Basket Analysis with Computer Vision – Explore how computer vision can be used for market basket analysis. By applying image recognition and object detection capabilities, businesses can analyze images of products in a shopping cart or basket, and derive insights on customer behavior, preferences, and cross-selling opportunities. This can provide valuable inputs for optimizing pricing, promotions, and inventory management strategies.
- NLP for OLAP Semantic Layers
- a. Part 1 – Mitigating Query Friction Between Human Analysts and BI OLAP Cubes.
- b. Part 2 – BI NLP Interface using Azure Cognitive Services.
By incorporating A.I. capabilities into their BI strategies, organizations can gain a competitive edge by making more informed decisions and uncovering hidden patterns and opportunities in their data. As A.I. continues to evolve, it promises to further augment and enhance data warehousing and OLAP cube implementations, leading to more advanced and sophisticated BI solutions in the future.