GCP Cost and Billing Updates You May Have Missed This Summer 2025 ⛰️

While engineering teams were focused on delivery (or finally taking some time off), Google Cloud rolled out a series of updates that significantly impact how cost is measured, tracked, and optimized.

From deeper visibility into Committed Use Discounts (CUDs) to new runtime efficiencies and observability in BigQuery, these changes aren’t just cosmetic — they reshape how infrastructure and platform teams interact with cost data.

If you’re managing multi-project environments or enabling AI workloads at scale, here are the most relevant recent GCP changes — and what you should start monitoring now.

1. Spend-Based Committed Use Discount (CUD) Export to BigQuery

Google Cloud now enables daily exports of Committed Use Discount (CUD) metadata to BigQuery, offering visibility into how your discounts are applied across SKUs and resources.

Before

Discount insights were only available at the invoice level or via monthly summary files.
It was difficult to associate discounts with specific projects or services in near-real time.

Now

Daily CUD data can be exported via the cud_subscriptions_export table.
Includes resource-level details: SKU, commitment amount, usage, and discount applied.

What to monitor

SKU-level usage by project and service
Discount amounts vs. committed values
Blended effective rates across workloads

Impact
This is a major enabler for FinOps. With Costory, these exports are automatically ingested and contextualized — making it easier to track underutilized commitments, map savings to specific teams, and optimize future purchasing strategies.

Source

2. Short Query Optimizations Now Generally Available in BigQuery

Google has optimized how short-running queries consume resources in BigQuery’s Advanced Runtime, now generally available. This enhancement targets users on BigQuery Editions who run high volumes of short queries.

Before

Even short, lightweight queries often consumed full slot bursts, leading to unnecessary cost — especially on provisioned slot plans.
Query cost didn’t scale well with actual execution time.

Now

Short queries are automatically optimized to reduce slot consumption.
Runtime enhancements reduce CPU and memory overhead for fast, targeted queries.

What to monitor

Average slot usage per query before and after adoption
Query latency improvements for dashboards, APIs, or ML pipelines
Editions-based slot plan efficiency for short-running workloads

Impact
For organizations using Enterprise or Enterprise Plus BigQuery Editions, this improvement can reduce slot waste and improve ROI — particularly in workloads with many small joins, lookups, or metric queries.

Source

3. On-Demand Slot Usage Visibility in BigQuery

A recent update to the BigQuery console brings more granular visibility into on-demand slot usage, directly tied to specific queries — helping users better understand how query complexity translates to cost.

Before

On-demand billing was opaque: pricing was based on processed bytes, but actual slot resource usage wasn’t exposed.
Optimization relied on trial and error.

Now

The BigQuery UI now surfaces slot usage information for on-demand workloads.
Engineers and analysts can see how many slots a query used and how long it ran — even if they’re not on a flat-rate plan.

What to monitor

High-slot-usage queries within on-demand environments
Misconfigured or inefficient SQL patterns
Spike patterns tied to analyst workloads or third-party integrations

Impact
This closes a long-standing visibility gap for orgs using on-demand pricing. It’s also an opportunity to reduce spend by improving query structure — without changing plan type

Source

4. Vertex AI and Gemini Pricing Model Updates

As generative AI use expands, Google Cloud has introduced more granular pricing dimensions for Vertex AI — especially for those using Gemini models.

Before

Pricing was largely tied to training hours and model storage.
Cost drivers were relatively stable and predictable.

Now
Pricing includes multiple factors:

Token usage (input/output)
Context window size
Provisioned throughput
Training epochs
Storage duration

What to monitor

Prediction and training SKU usage
AI workloads by project and environment
Token growth over time across services

Impact
Token-based pricing makes real-time cost observability critical. If you’re building or scaling LLM-backed services, tracking token usage and model configuration is essential for managing volatility in spend.

Source

Summary of Changes

Feature	Key Cost Change	Monitoring Focus
CUD Export to BigQuery	Daily visibility into discount usage	SKU-level mapping, commitment utilization
Short Query Optimization	Reduced slot usage for fast queries	Query runtimes, slot savings, Editions ROI
On-Demand Slot Insights	Slot visibility in on-demand pricing	Per-query slot usage, inefficient SQL patterns
Vertex AI & Gemini	Token/context-based billing	Token volume, throughput, project tagging

Closing Thoughts

Google Cloud is steadily making cost visibility more real-time, granular, and actionable. These improvements aren’t just for billing teams — they’re for infrastructure leaders, platform engineers, and developers who need to operate within budget without slowing down delivery.

What this means in practice:

FinOps platforms like Costory can now provide more accurate, contextualized insights using CUD and BigQuery metadata.
Infrastructure teams can optimize AI and analytics usage without waiting for end-of-month surprises.
Slot management is no longer just about allocation — it’s about intelligent usage.

Cloud cost optimization is shifting from reporting to real-time decision-making. These updates are a step in the right direction.