Name: vLLM - LLM serving with PagedAttention
Start: 2026-05-15T10:15:00-0500
End: 2026-05-15T10:45:00-0500

Friday May 15, 2026 10:15am - 10:45am CDT

Serving large language models efficiently in production is a hard problem. Traditional inference engines waste up to 80% of GPU memory through KV cache fragmentation and over-allocation, leading to poor throughput and high costs. This talk dives into PagedAttention, a key innovation that borrows virtual memory paging from operating systems as a solution to this and vLLM, the open-source engine built on top of it.

This talk will cover the theory, walk through using vLLM in practice, and look at benchmark results showing up to 24× throughput improvements. We'll close with a look at how vLLM has been rapidly adopted across the industry and why PagedAttention has become a foundational primitive in LLM serving.

Speakers

Sona Maniyan, MS

Staff Engineer - AI/ML, Thrivent

Friday May 15, 2026 10:15am - 10:45am CDT
(d) Nokomis Best Buy HQ, 7700 Knox Ave S, Richfield, MN 55423

5 - Technical

Data Tech 2026

Sona Maniyan, MS

Get help with the event

Data Tech 2026

Sona Maniyan, MS

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event