While engineering teams were focused on delivery (or finally taking some time off), Google Cloud rolled out a series of updates that significantly impact how cost is measured, tracked, and optimized.
From deeper visibility into Committed Use Discounts (CUDs) to new runtime efficiencies and observability in BigQuery, these changes aren’t just cosmetic — they reshape how infrastructure and platform teams interact with cost data.
If you’re managing multi-project environments or enabling AI workloads at scale, here are the most relevant recent GCP changes — and what you should start monitoring now.

1. Spend-Based Committed Use Discount (CUD) Export to BigQuery
Google Cloud now enables daily exports of Committed Use Discount (CUD) metadata to BigQuery, offering visibility into how your discounts are applied across SKUs and resources.
Before
- Discount insights were only available at the invoice level or via monthly summary files.
- It was difficult to associate discounts with specific projects or services in near-real time.
Now
- Daily CUD data can be exported via the
cud_subscriptions_export
table. - Includes resource-level details: SKU, commitment amount, usage, and discount applied.
What to monitor
- SKU-level usage by project and service
- Discount amounts vs. committed values
- Blended effective rates across workloads
Impact
This is a major enabler for FinOps. With Costory, these exports are automatically ingested and contextualized — making it easier to track underutilized commitments, map savings to specific teams, and optimize future purchasing strategies.
2. Short Query Optimizations Now Generally Available in BigQuery
Google has optimized how short-running queries consume resources in BigQuery’s Advanced Runtime, now generally available. This enhancement targets users on BigQuery Editions who run high volumes of short queries.
Before
- Even short, lightweight queries often consumed full slot bursts, leading to unnecessary cost — especially on provisioned slot plans.
- Query cost didn’t scale well with actual execution time.
Now
- Short queries are automatically optimized to reduce slot consumption.
- Runtime enhancements reduce CPU and memory overhead for fast, targeted queries.
What to monitor
- Average slot usage per query before and after adoption
- Query latency improvements for dashboards, APIs, or ML pipelines
- Editions-based slot plan efficiency for short-running workloads
Impact
For organizations using Enterprise or Enterprise Plus BigQuery Editions, this improvement can reduce slot waste and improve ROI — particularly in workloads with many small joins, lookups, or metric queries.
3. On-Demand Slot Usage Visibility in BigQuery
A recent update to the BigQuery console brings more granular visibility into on-demand slot usage, directly tied to specific queries — helping users better understand how query complexity translates to cost.
Before
- On-demand billing was opaque: pricing was based on processed bytes, but actual slot resource usage wasn’t exposed.
- Optimization relied on trial and error.
Now
- The BigQuery UI now surfaces slot usage information for on-demand workloads.
- Engineers and analysts can see how many slots a query used and how long it ran — even if they’re not on a flat-rate plan.
What to monitor
- High-slot-usage queries within on-demand environments
- Misconfigured or inefficient SQL patterns
- Spike patterns tied to analyst workloads or third-party integrations
Impact
This closes a long-standing visibility gap for orgs using on-demand pricing. It’s also an opportunity to reduce spend by improving query structure — without changing plan type
4. Vertex AI and Gemini Pricing Model Updates
As generative AI use expands, Google Cloud has introduced more granular pricing dimensions for Vertex AI — especially for those using Gemini models.
Before
- Pricing was largely tied to training hours and model storage.
- Cost drivers were relatively stable and predictable.
Now
Pricing includes multiple factors:
- Token usage (input/output)
- Context window size
- Provisioned throughput
- Training epochs
- Storage duration
What to monitor
- Prediction and training SKU usage
- AI workloads by project and environment
- Token growth over time across services
Impact
Token-based pricing makes real-time cost observability critical. If you’re building or scaling LLM-backed services, tracking token usage and model configuration is essential for managing volatility in spend.
Summary of Changes
Feature | Key Cost Change | Monitoring Focus |
---|---|---|
CUD Export to BigQuery | Daily visibility into discount usage | SKU-level mapping, commitment utilization |
Short Query Optimization | Reduced slot usage for fast queries | Query runtimes, slot savings, Editions ROI |
On-Demand Slot Insights | Slot visibility in on-demand pricing | Per-query slot usage, inefficient SQL patterns |
Vertex AI & Gemini | Token/context-based billing | Token volume, throughput, project tagging |
Closing Thoughts
Google Cloud is steadily making cost visibility more real-time, granular, and actionable. These improvements aren’t just for billing teams — they’re for infrastructure leaders, platform engineers, and developers who need to operate within budget without slowing down delivery.
What this means in practice:
- FinOps platforms like Costory can now provide more accurate, contextualized insights using CUD and BigQuery metadata.
- Infrastructure teams can optimize AI and analytics usage without waiting for end-of-month surprises.
- Slot management is no longer just about allocation — it’s about intelligent usage.
Cloud cost optimization is shifting from reporting to real-time decision-making. These updates are a step in the right direction.