Google Cloud C4 Boosts GPT OSS with Intel Xeon 6 Processors

New C4 virtual machines deliver significant cost and performance improvements for open-source AI models.

Google Cloud's new C4 virtual machines, powered by Intel Xeon 6 processors, are dramatically improving the Total Cost of Ownership (TCO) for running OpenAI GPT open-source models. This collaboration with Intel and Hugging Face shows a 70% TCO improvement over previous generations. It promises more efficient and affordable AI inference.

Sarah Kline

By Sarah Kline

October 16, 2025

4 min read

Google Cloud C4 Boosts GPT OSS with Intel Xeon 6 Processors

Key Facts

  • Google Cloud C4 VMs achieved a 70% TCO improvement for OpenAI GPT OSS models.
  • C4 VMs offer 1.7x improvement in Total Cost of Ownership over C3 VMs.
  • C4 VMs provide 1.4x to 1.7x TPOT throughput/vCPU/dollar.
  • The new C4 VMs run on Intel® Xeon® 6 processors (Granite Rapids).
  • OpenAI GPT OSS models are Mixture of Experts (MoE) models, making CPU inference viable.

Why You Care

Ever wonder if running AI models could be more affordable? What if you could get significantly better performance for less money? Google Cloud, Intel, and Hugging Face just announced something big for anyone using open-source AI. They’ve achieved a 70% Total Cost of Ownership (TCO) betterment for OpenAI GPT open-source models. This means your AI projects could become much cheaper and faster to operate.

What Actually Happened

Google Cloud has introduced its new C4 Virtual Machine (VM) instances, according to the announcement. These instances run on Intel® Xeon® 6 processors, codenamed Granite Rapids (GNR). The team specifically aimed to benchmark improvements in text generation performance. They focused on OpenAI GPT open-source Large Language Models (LLMs). The results are quite impressive, the research shows. They demonstrate a 1.7x betterment in Total Cost of Ownership (TCO). This is compared to the previous-generation Google C3 VM instances.

What’s more, the Google Cloud C4 VM instance offers several key advantages. It provides 1.4x to 1.7x throughput per vCPU per dollar. It also features a lower price per hour over C3 VMs, as detailed in the blog post. This makes AI more accessible. These C4 VMs are designed for efficiency. They use processor architecture. This helps manage the demands of modern AI workloads.

Why This Matters to You

Imagine you’re running an AI-powered chatbot for your customer service. The cost of processing every user query adds up quickly. With these new C4 VMs, your operational costs could drop significantly. This allows you to scale your services without breaking the bank. The company reports this efficiency gain is crucial for businesses. It helps them deploy and manage large language models more effectively.

Key Improvements with Google Cloud C4 VMs:

  • Total Cost of Ownership (TCO): 1.7x betterment over C3 VMs.
  • Throughput/vCPU/dollar: 1.4x to 1.7x higher.
  • Price per hour: Lower than C3 VMs.
  • Processor: Intel® Xeon® 6 (Granite Rapids).

Think of it as getting a faster, more fuel-efficient car for your business. You travel further for less money. This directly impacts your bottom line. “The results are in, and they are impressive, demonstrating a 1.7x betterment in Total Cost of Ownership (TCO) over the previous-generation Google C3 VM instances,” the team revealed. This means you can achieve more with your existing budget. How will these cost savings impact your AI creation strategy?

The Surprising Finding

Here’s the twist: even with very large AI models, CPU inference is becoming viable. This might seem counterintuitive. Many assume that only GPUs can handle complex AI tasks. However, the study finds that OpenAI GPT OSS models are Mixture of Experts (MoE) models. These models only activate a small subset of experts per token. This makes CPU inference practical, according to the technical report. This challenges the common assumption that GPUs are always necessary for LLMs. It opens new possibilities for cost-effective deployment. Intel and Hugging Face collaborated on an expert execution optimization. This further enhanced the efficiency of these MoE models on Intel Xeon processors. It demonstrates the power of specialized architecture.

What Happens Next

This creation suggests a future where AI becomes more widely accessible. We can expect to see more businesses adopting these C4 VMs in the coming months. For example, a startup building a personalized content generation system could now afford to run more models. This could happen without the prohibitive costs previously associated with such endeavors. Actionable advice for you: consider evaluating C4 VMs for your existing or upcoming AI projects. Especially if you are using OpenAI GPT open-source models. The industry implications are significant. This could lead to a broader adoption of open-source LLMs. It makes them more competitive against proprietary solutions. The documentation indicates that further optimizations are ongoing. This promises even greater efficiencies down the line. We anticipate continued advancements in this area through late 2025 and into 2026.

Ready to start creating?

Create Voiceover

Transcribe Speech

Create Dialogues

Create Visuals

Clone a Voice