🔥 Hot Repo: Google's Edge LLM Runs Gemma 4 Under 1.5GB RAM

Google's LiteRT-LM just added Gemma 4 support — the same inference engine powering Chrome, Pixel Watch, and Chromebook Plus. It runs multimodal, agentic LLMs on-device in under 1.5GB RAM.

By OMC Editorial on 2026-04-07

Google quietly shipped the inference engine inside your devices. LiteRT-LM, the open-source runtime that powers on-device AI in Chrome, Chromebook Plus, and Pixel Watch, just added support for Gemma 4 — Google's newest open model family released April 2, 2026. The google-ai-edge/LiteRT-LMhttps://github.com/google-ai-edge/LiteRT-LM repo has 2,327 stars and gained 522 in a single day as the Gemma 4 announcement went wide. What is LiteRT-LM? LiteRT-LM is Google's production-ready inference framework for large language models on edge devices. It sits on top of LiteRT the successor to TensorFlow Lite, trusted by millions of Android developers, adding GenAI-specific libraries for efficient LLM deployment. The SDK ships stable APIs in Kotlin, Python, and C++, with Swift support underway. Gemma 4 on a Raspberry Pi The Gemma 4 E2B Edge 2B model runs on a Raspberry Pi 5 using under 1.5GB of RAM, reaching 133 prefill tokens/s and 7.6 decode tokens/s on CPU alone. On hardware-accelerated targets — like the Qualcomm Dragonwing IQ8 NPU — prefill throughput jumps to 3,700 tokens/s with 31 decode tokens/s. The framework achieves this through 2-bit and 4-bit weight quantization combined with memory-mapped per-layer embeddings. LiteRT-LM supports the full Gemma 4 lineup: the compact E2B and E4B edge variants, plus the 26B mixture-of-experts and 31B dense models for desktop-class hardware. Not Just Text: Agentic and Multimodal On-Device Beyond raw inference speed, LiteRT-LM adds first-class function calling support — Gemma 4 can use tools and execute agentic workflows entirely on-device without a cloud round-trip. It also handles vision and audio inputs natively. Gemma 4 supports 256K context windows, which on the E2B variant now runs without a network connection. Already in Production at Google Scale This is not a research preview. LiteRT-LM powers real features in shipped Google products: - Chrome — on-device AI for browser tasks - Chromebook Plus — AI-accelerated workl