Last month Google released a compact version for the Gemma 3 large language model (LLM) family with just 270 million parameters (Gemma 3 270M). I’m amazed at the efficiency and potential for applications such as on-device use.

I tried it a few days after the launch and found the default model’s generated text to be underwhelming for my prompts. This confirmed to me the above blog’s explanation that the model targets fine-tuning applications. I found other reports that also expressed similar thoughts such as this post. Recently I cleaned up my code and published it to this gemma3_270m GitHub repo and my impression has improved on the default model performance, such as this example.

Model:        google/gemma-3-270m-it
Device:       mps:0
Precision:    torch.bfloat16
================================================================================
Input prompt: What causes climate change?
Climate change is caused by human activities, primarily the burning of fossil fuels.

I haven’t identified a fine-tuning task for Gemma 3 270M but look forward to working on that! Especially now that Google has released the EmbeddingGemma model so that direct comparison can be made between fine-tuning and retrieval augmented generation methods for a specific task targetted for local on-device implementation.

Separately it’s been good to catch-up on technology for audio sound. “Neural codecs” are encoder–decoder models that convert audio into compact numeric representations (e.g.: quantized latent tokens or continuous latents) which can be ingested and produced by large language models (LLMs). By letting an LLM predict sequences of these audio tokens, the model can generate or reconstruct audible sound. For example Meta (formerly FaceBook) described such technology in this 2022 paper and released their “EnCodec” implementation on GitHub followed by a HuggingFace transformers version in June 2023. I tried it out and published my approach in this encodec GitHub repo and was impressed by the compact representation and quality at just 6 kbps.

I’m looking forward to more hands-on work with audio, speech and LLMs!