Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Ars Technica ·

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of …

Google launched its Gemma 4 open models this spring, promising a new level of power and performance for local AI. Google’s take on edge AI could be getting even faster already with the release of Multi-Token Prediction (MTP) drafters for Gemma. Google says these experimental models leverage a form of speculative decoding to take a guess at future tokens, which can speed up generation compared to the way models generate tokens on their own. …

Original source: Ars Technica