Memory vs Cache - Search News

Korea's 'father of HBM' sees 1,000x AI memory surge as Google's TurboQuant faces real-world tests

Alphabet's Google has unveiled its KV cache quantization compression technology, TurboQuant, promising dramatic reductions in ...

Crypto Briefing

Reiner Pope: Batch size dramatically impacts AI latency and cost, kv cache is key for autoregressive models, and efficient inference can save resources | Dwarkesh

Batch size has a significant impact on both latency and cost in AI model training and inference. Estimating inference time ...

1mon

Google's new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

Within 24 hours of the release, community members began porting the algorithm to popular local AI libraries like MLX for ...

Semiconductor Engineering

Freeing Up Near-Memory Capacity For Cache Using Compression Techniques In A Flat Hybrid-Memory Architecture

A technical paper titled “HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory” was published by researchers at Chalmers University of Technology and ZeroPoint Technologies.

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

Opinion

Communications of the ACMOpinion

Show inaccessible results

Korea's 'father of HBM' sees 1,000x AI memory surge as Google's TurboQuant faces real-world tests

Reiner Pope: Batch size dramatically impacts AI latency and cost, kv cache is key for autoregressive models, and efficient inference can save resources | Dwarkesh

Google's new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more

Freeing Up Near-Memory Capacity For Cache Using Compression Techniques In A Flat Hybrid-Memory Architecture

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

The Golden Rule of Big Memory: Persistence Is Not Harmful

Dual Cache Processor Targets High Performance Workloads

DDN, Google Cloud claim Lustre KV cache trick boosts AI inference throughput by 75%