What this blog covers:
- Snowflake offers scale-out capabilities to process more data at lower costs but compute costs still explode when data volumes grow unpredictably.
- Understand the key factors affecting Snowflake costs.
- How can Kyvos make it happen with its AI-powered smart aggregation technology.
- Benchmarking numbers prove that the overall cost of executing BI queries on Kyvos is half that of executing them directly on Snowflake.
When someone goes on a holiday, they rent a car and pay based on the travel distance to their destination. The greater the distance; the more the cost. Same is the kinship between data and the cloud. In a world where data increases every minute and businesses thrive on data-driven insights, enterprises need to explore more and more data to make intelligent business decisions. Since they can’t control the influx of data, optimizing cost is vital.
Cloud data warehouses like Snowflake offer relational, scale-out capabilities with the promise to make more data obtainable at a lower cost with fewer data management troubles. But the real concern starts when the size of data grows inevitably with time, leaving organizations with the baggage of unpredictable compute costs. Heavy analytics performed on massive amounts of data can make their bill heftier than usual and at the end of each month, they will realize that they have paid a fortune for a workload that could be performed at a tremendously lower price.
The key question is – how can they reduce snowflake costs without impacting performance? How can they make their data accessible to business users and empower them to interact with it in a self-service way, without worrying about exploding compute costs? Is there a better way to achieve Snowflake cost optimization?
Let’s take the case where users want to perform analysis on 2 years of data – it could be billions or trillions of rows. They want to explore this data and not limit themselves in terms of the size, cardinality, or granularity and get quick answers to their business questions. There should be an optimized way of organizing and interacting with data while keeping Snowflake query costs under check.
Factors that Impact Snowflake Costs
Before we see how organizations can reduce snowflake costs, let us first take a look at some of the key factors that add to costs.
Resource-Intensive Queries
When users fire complex queries, costs are bound to escalate. For instance, if an organization make its data available to 1000 users and each of these users submits 10 ad hoc queries a day to the data warehouse and the query they fire has multiple joins or group bys and scans through billions or even trillions of rows at query time. In this case, every query is resource-intensive and the organizations’ register keeps ringing with each query fired to the data warehouse.
User Concurrency – More Users, More Queries
In time, as business users grow, the number of queries submitted to the data warehouse grows too. Ultimately, enterprises may have to deal with a situation where lots of users run heavy queries just because the data is available to them. All of this can lead to a significant rise in Snowflake query costs.
The Impact of Cold Queries on Snowflake Costs
Data warehouse platforms like Snowflake work on the partial caching mechanism for a frequently used set of data and preserve the results so that whenever users fire a query, the data is not moved from storage to compute every time they ask for it. Sure, it will save them some bucks, but when they fire a query for non-cached data, then it has to go through all the heavy lifting of reading the data from storage and moving it to compute, leading to increased response time and costs.
Reduce Snowflake Costs with Kyvos’ AI-Powered Smart Aggregation Technology
Kyvos helps enterprises in Snowflake cost optimization by reducing querying costs through its AI-powered smart aggregation technology. It reads data from Snowflake and builds optimized data models that are stored in the cloud itself. The best thing is that these aggregations can be performed on extremely large datasets, regardless of the number of dimensions, measures, or granularity. Now, when a query is fired, it is served directly from the smart aggregates. This reduces compute costs significantly and the queries return fast too!
Optimize Resource Utilization with Price-Performant Querying Approach
Once the data model is built, minimal resources are consumed per query enabling users to run unlimited queries on a massive amount of data without incurring any additional cost. The query doesn’t go through a huge amount of heavy processing at query time because all the heavy lifting has been done. As all the combinations have already been calculated and the aggregations are already available, the query becomes lightweight. This gives the benefit of scaling in terms of the number of users and queries fired concurrently. Our price-performant querying approach reduces Snowflake cost while ensuring that any number of users can fire any number of queries.
Reduce Snowflake Costs Further with Elastic Architecture
Another key advantage is that enterprises can scale up or down the querying capability through our scheduled cluster scaling feature, which allows them to increase or decrease the query engines depending on the load. This way, they can optimize the resource utilization and pay only for the resources they use.
Save Costs on Ad Hoc Analysis
Enterprises often have to deal with cost escalations while performing ad hoc analysis. If they run the same query frequently then a lot of different caching mechanisms can retrieve the result without affecting the compute cost. But what if users try to run cold queries with different combinations. They need to scan their data multiple times and bear the cost of excessive processing at query time.
To learn more about how Kyvos reduces Snowflake costs, Download our Ebook.
Benchmarking Report on AdventureWorks Dataset
The benchmarking study below demonstrates how Kyvos can help save up to 50% of the cloud data warehouse cost while delivering exceptionally high performance. Snowflake charges based on warehouse size and running time. Each query that comes to Snowflake will have to perform joins and aggregates. This results in higher query execution time and eventually, higher costs. Using Kyvos as a query acceleration and semantic layer platform between Snowflake and BI tool, the cost can be reduced and query performance can be improved. In this scenario, the BI tool will connect with Kyvos rather than Snowflake so that the user can experience many other benefits of Kyvos apart from faster queries at lower cost.
The tests were carried out to analyze response time and cost for AdventureWorks (a standard sample dataset) queries. Two scenarios were compared where: in the first case, analytical queries were executed directly on Snowflake and in the second case, analytcial queries were executed on Kyvos, which was connected to Snowflake to create smart aggregates.
Key parameters for the comparison include:
- The analysis was conducted using a fact data size of 5 million.
- The scenario accounts for a load of upto 500 concurrent queries over a 5-minute period (assuming auto-suspend is enabled for 5 minutes of inactivity of Snowflake).
- A single X-Large snowflake warehouse can give an average response time of 20-30 seconds for 60 concurrent queries. So, for 500 concurrent queries, 8 Snowflake warehouses were launched in case of executing analytcial queries directly on Snowflake.
- In case of executing analytical queries on Kyvos, 10 large (64 cores, 256GB RAM) Kyvos nodes were used.
- A Kyvos license cost of $50,000 per large node has been considered for this exercise but this may vary based on many parameters.
- This exercise assumes queries are executed for 12 hours in a working day.
- The total cost of using Kyvos includes the hardware and Kyvos license costs, in addition to the cost of Snowflake.
Dataset Details
The following table gives details of the row count for each table used in this exercise
Case 1: When Queries are executed directly on Snowflake
Average query response time: 15-20 seconds.
Case 2: When Queries are executed on Kyvos
Average query response time: <5 seconds.
Benchmarking Test Results
- The overall cost of executing analytical queries on Kyvos is half that of executing them directly on Snowflake.
- Kyvos performance is 3x better than Snowflake.
Snowflake Cost Optimization with Kyvos
In the new era of data warehousing technologies, most businesses are already in the cloud or making their way towards it. With the volume and velocity at which their data is landing, it’s imperative for businesses to take account of this data and use it for competitive advantage, while at the same time ensuring that they are optimizing their analytical costs.
With the right solution like Kyvos in an organization’s data architecture, they can eliminate a lot of Snowflake query costs while improving overall performance, concurrency and cost predictability.
Want to learn more about how Kyvos enables Snowflake cost optimization, request a demo now.
FAQ
How can I reduce costs while using Snowflake as my data warehouse?
Use Kyvos to cut down querying costs on Snowflake through a Kyvos’ patented algorithms that reads data from Snowflake and builds pre-processed semantic models stored in the cloud itself, even on extremely large datasets, regardless of the number of dimensions, measures, or granularity. When a query is fired, it is served directly from the semantic models, reducing compute costs significantly.
What are the best practices for optimizing the Snowflake warehouse costs?
Organizations can optimize their Snowflake warehouse cost by investing in an elastic analytics architecture that allows heavy pre-processing to make queries lightweight. A scalable infrastructure allows any number of users to fire queries at once without breaking speed or performance. Kyvos offers all this and more with its patented algorithms that has a proven track record.
How can I minimize unnecessary data storage costs in Snowflake?
Using Kyvos, organizations can minimize their data storage costs in Snowflake. The platform helps pre-process data models to create semantic models for returning queries in sub-seconds. The models are stored on the cloud, and queries are fetched directly from the cube, reducing the storage requirements.
Are there any recommended approaches for optimizing query performance and reducing costs in Snowflake?
Using smart pre-processing can help optimize query performance and reduce Snowflake costs. Our benchmarking report says that Kyvos is 1695x more cost-effective than Snowflake itself for running 13 SSB queries across a 100-billion row fact table.