Skip to content

MOUNTAIN VIEW, Calif., Aug. 8, 2023 /PRNewswire/ — Groq, an artificial intelligence (AI) solutions provider, today announced it now runs the Large Language Model (LLM), Llama-2 70B, at more than 100 tokens per second (T/s) per user on a Groq LPU™, the newly defined category for Groq silicon architecture.

Daniel Newman, Principal Analyst and Founding Partner at The Futurum Group, commented, “While the AI gold rush has driven increased demand and long lead times from incumbent silicon providers, there is a rapidly growing market for alternative solutions. Groq running Llama-2 70B at more than 100 tokens per second demonstrates advantages in power, performance, and ease-of-use. Furthermore, with immediately available supply, Groq has a viable alternative for scaled LLM inference.”

Groq is compiling and deploying new LLMs in a matter of days using its kernel-less compiler and generating the fastest user experience for generated language responses at over 100T/s on Groq Language Processing Unit™ systems. To contextualize this level of performance, a user could write this entire press release in roughly seven seconds or generate a 4,000-word essay in just over a minute. This ultra-low latency, real-time performance comes with improved performance per watt when compared to graphics processor-based systems.

Jonathan Ross, CEO and founder of Groq, commented “I’m really proud of our team for reaching this significant milestone for LLMs! Groq is the first company to run Llama-2 70B at more than 100 tokens per second per user–not just among the AI start-ups, but among incumbent providers as well! And there’s more performance on our roadmap using current hardware, meaning the future of Groq AI performance for customers is real-time insights and interactions.”

GroqLabs, the platform where Groq hosts product demos and reference designs, now showcases Meta AI’s Llama-2 70B LLM for customers to see. Previously, GroqLabs has successfully demonstrated several other open-source models, such as Llama 13B and 65B and Vicuna 13B and 33B, on scaled Groq Language Processing Unit systems of up to eight GroqRack™ compute clusters (over 500 GroqChip™ processors on 14nm silicon computing in unison). As mentioned in its previous press release, Groq has accelerated production for deploying models at scale without long and arduous development delays, saving customers thousands of production hours and millions of dollars.

The next wave of generative AI solutions will be language-based, with language meaning not just words, but other pattern recognition and prediction as well. For Enterprise and Government organizations, this means LLMs will serve beyond the common use cases of chat-bots or document analysis. Upcoming breakthrough models will accelerate and impact life sciences, financial services, digital media, content authoring, programming, and more, ultimately connecting humanity in ways not yet imagined.

Mark Heaps, VP of Brand and Creative at Groq, commented, “I remember how the internet of the 90’s was novel for a moment, but slow loading speeds quickly exhausted users. Today no one would tolerate that old ‘dial-up’ experience. Soon, no one will tolerate interaction with their data or device being anything less than real-time, so this is where increasing AI performance matters. Groq is changing the rules of the game.”

If you’d like to join for the first public sneak peek of Llama-2 70B running at 100T/s on Groq, register for GroqSpotlight, a virtual event airing today, August 8th, at 11:30am PDT. If you’d like to schedule an exclusive one-on-one demo, reach out to

About Groq
Groq is an AI solutions company delivering ultra-low latency inference with the first ever Language Processing Unit™. For more information, visit

Groq, the Groq logo, and other Groq marks are trademarks of Groq, Inc. Other names and brands may be claimed as the property of others. Reference to specific trade names, trademarks or otherwise, does not necessarily constitute or imply its endorsement or recommendation by Groq.

Copyright © 2023 Groq Inc. All rights reserved.

Groq, an AI solutions company, is first to reach more than 100 tokens per second per user running Meta AI's Llama-2 70B LLM on a Groq LPU™ system.

Groq, an AI solutions company, is first to reach more than 100 tokens per second per user running Meta AI’s Llama-2 70B LLM on a Groq LPU™ system.

Groq logo (PRNewsfoto/Groq)

Groq logo (PRNewsfoto/Groq)



View original content to download multimedia: