Loading…
Data Tech 2026 has ended

Friday May 15, 2026 10:15am - 10:45am CDT
Serving large language models efficiently in production is a hard problem. Traditional inference engines waste up to 80% of GPU memory through KV cache fragmentation and over-allocation, leading to poor throughput and high costs. This talk dives into PagedAttention, a key innovation that borrows virtual memory paging from operating systems as a solution to this and vLLM, the open-source engine built on top of it.

This talk will cover the theory, walk through using vLLM in practice, and look at benchmark results showing up to 24× throughput improvements. We'll close with a look at how vLLM has been rapidly adopted across the industry and why PagedAttention has become a foundational primitive in LLM serving.
Speakers
avatar for Sona Maniyan, MS

Sona Maniyan, MS

Staff Engineer - AI/ML, Thrivent

Friday May 15, 2026 10:15am - 10:45am CDT
(d) Nokomis Best Buy HQ, 7700 Knox Ave S, Richfield, MN 55423

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link