Close Menu
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
What's Hot

President Trump lifts fuel blockade as Russian oil tanker arrives in Cuba

March 30, 2026

Japanese GP: FIA evaluates 2026 F1 regulations after Oliver Bearman’s crash at Suzuka, highlighting the issue of “finishing speed” | F1 News

March 30, 2026

StarCloud raises $170 million Series Ato builds data center in space

March 30, 2026
Facebook X (Twitter) Instagram
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Facebook X (Twitter) Instagram
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Home » Executing AI models is turning into a memory game
AI

Executing AI models is turning into a memory game

Editor-In-ChiefBy Editor-In-ChiefFebruary 17, 2026No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email


When talking about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs, but memory is becoming an increasingly important part of the picture. DRAM chip prices have jumped about seven times in the last year as hyperscalers prepare to build billions of dollars worth of new data centers.

At the same time, there is increased discipline in coordinating all memory to ensure the right data reaches the right agent at the right time. Companies that master this will be able to perform the same queries with fewer tokens, which could be the difference between going out of business and staying in business.

Semiconductor analyst Doug O’Loughlin speaks with Weka’s chief AI officer, Val Bercovitch, for an interesting look at the importance of memory chips in his substack. They are both semiconductor experts, so their focus is on chips rather than broader architectures. The impact on AI software is also very important.

I was especially struck by Bercovici’s discussion of the growing complexity of Anthropic’s prompt cache documentation:

You can find out by visiting Anthropic’s Prompt Cash pricing page. It started as a very simple page 6-7 months ago, especially when Claude Code was launching. They just said, “It’s cheaper if you use cash.” It’s now an encyclopedia of advice on exactly how many cache writes to buy in advance. There’s a 5-minute window, or a 1-hour window, that’s very common across the industry, and no more. That’s a really important announcement. Of course, you have all sorts of arbitrage opportunities regarding the pricing of cache reads based on the number of cache writes you have purchased upfront.

The question here is how long Claude keeps the prompt in cached memory. You can pay for a 5-minute window or even more for a 1-hour window. It’s much cheaper to utilize data that’s still in cache, so if you manage your data properly, you can save a lot of money. However, there is a catch. Every time you add new data to your query, something else may be pushed out of the cache window.

This is complex, but the conclusion is very simple. Memory management for AI models will be a big part of the future of AI. Companies that do this well will rise to the top.

And a lot of progress is being made in this new field. Back in October, I covered a startup called Tensormesh that was working on one layer in the stack known as cache optimization.

tech crunch event

boston, massachusetts
|
June 23, 2026

Opportunities also exist elsewhere in the stack. For example, lower down the stack is how data centers use the different types of memory they have. (The interview includes a nice discussion about when DRAM chips are used instead of HBM, but it’s pretty deep in the hardware weeds.) Higher up the stack, end users are figuring out how to configure their model suites to take advantage of shared cache.

As companies improve their memory orchestration, they use fewer tokens and the cost of inference becomes cheaper. On the other hand, the model is becoming more efficient at processing each token, further lowering the cost. As the cost of servers decreases, many applications that currently seem unfeasible will gradually begin to become profitable.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Editor-In-Chief
  • Website

Related Posts

StarCloud raises $170 million Series Ato builds data center in space

March 30, 2026

Why OpenAI really shut down Sora

March 29, 2026

Sora shutdown could be a reality check moment for AI video

March 29, 2026
Add A Comment

Comments are closed.

News

Republican Mace says sending U.S. troops to Iran must be approved by Congress | U.S.-Israel war against Iran News

By Editor-In-ChiefMarch 29, 2026

Republican U.S. Representative Nancy Mace said Congress should have a say in any decisions about…

‘Nowhere is truly safe’: Iranian dissidents grapple with US war in Iran | US and Israel’s war against Iran News

March 29, 2026

Vice President J.D. Vance tops CPAC straw poll and becomes U.S. president in 2028 | Election News

March 28, 2026
Top Trending

StarCloud raises $170 million Series Ato builds data center in space

By Editor-In-ChiefMarch 30, 2026

StarCloud’s latest funding round values ​​the space computing company at $1.1 billion,…

Why OpenAI really shut down Sora

By Editor-In-ChiefMarch 29, 2026

OpenAI’s decision last week to shut down its AI video generation tool…

Sora shutdown could be a reality check moment for AI video

By Editor-In-ChiefMarch 29, 2026

OpenAI announced this week that it is shutting down its Sora app…

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Welcome to WhistleBuzz.com (“we,” “our,” or “us”). Your privacy is important to us. This Privacy Policy explains how we collect, use, disclose, and safeguard your information when you visit our website https://whistlebuzz.com/ (the “Site”). Please read this policy carefully to understand our views and practices regarding your personal data and how we will treat it.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Advertise With Us
  • Contact US
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
  • About US
© 2026 whistlebuzz. Designed by whistlebuzz.

Type above and press Enter to search. Press Esc to cancel.