Sunday, November 12, 2023

Groq scales AI inference processing

Groq, an AI start-up based in Mountain View, California offering a  Tensor Streaming Processor (TSP), announced a new performance bar of more than 300 tokens per second per user on Meta AI's Llama-2 70B LMM.

The benchmark was set using Groq’s  Language Processing Unit system. 

Jonathan Ross, CEO and founder of Groq commented, "When running LLMs, you can't accurately generate the 100th token until you've generated the 99th. An LPU™ system is built for the sequential and compute-intensive nature of GenAI language processing. Simply throwing more GPUs at LLMs doesn't solve for incumbent latency and scale-related issues. Groq enables the next level of AI."

https://groq.com

  • Prior to founding Groq, Jonathan began what became Google’s Tensor Processing Unit (TPU) as a 20% project where he designed and implemented the core elements of the first generation TPU chip.