Google launches the Gemma 3 QAT model, which can be run on consumer - grade graphics cards.
Google Launches Gemma 3 QAT Model, Runnable on Consumer - Grade Graphics Cards
Last month, Google released the latest generation of the open large - scale model, Gemma 3. Its outstanding performance enables it to run on a single high - end accelerator card such as the NVIDIA H100 using native BF16 precision. To further improve the usability of Gemma 3, Google announced the launch of a new version optimized with Quantization - Aware Training (QAT). This technology can significantly reduce the video memory requirements while maintaining high quality, allowing powerful models like Gemma 3 27B to run locally on consumer - grade graphics cards such as the NVIDIA RTX 3090. QAT does not perform quantization only after the model is fully trained. Instead, it integrates the quantization process into the training process, thus significantly reducing the post - training performance loss. The video memory footprint of the Gemma 3 27B model has been reduced from 54 GB (BF16) to only 14.1 GB (int4), while still maintaining high - quality results.
—— Google Blog
via Windvane Reference Express - Telegram Channel