Liquid AI’s smallest model yet the LFM2.5-230M beats models 4X its size in data output, can work ‘anywhere’

Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model yet, the LFM2.5-230M, and businesses would do well to consider it for use in data extraction and localization on smartphones, laptops and robots.
This is a 230 million parameter base model designed specifically for on-device agent workflows, and as Liquid says in its blog release, that small size makes it possible to work virtually. "or anywhere." According to Liquid, it also outperforms models more than 4X its size in selected benchmarks, performing significantly better in data extraction than the 800 million parameter Alibaba Qwen3.5-0.8B (Yala) and the 1 billion parameter Google Gemma 3 1B.
The model is aimed at developers and engineers building lightweight data pipelines and standalone edge systems.
Operating under a dual-use commercial license, the model remains free for individuals and companies generating less than $10 million per year, while requiring a paid enterprise agreement for larger companies.
This release differentiates itself from other small AI models by using the LFM2 architecture to achieve high inference speed without the large memory overhead typical of heavy parameters.
While the big AI companies Anthropic, OpenAI, Google, Microsoft, Meta and others are pushing the parameter to hundreds of billions or trillions to achieve borderline performance, the same race is completely focused on the edge and local use.
Liquid AI’s introduction of the LFM2.5-230M marks a significant shift in achieving architectural efficiency in addition to dynamic power measurement. By compressing 19 trillion pre-trained tokens into a 230 million parameter space, the company shows that edge devices don’t need massive computing power or persistent cloud connections to execute complex, multi-step workflows.
How LFM2.5-230M works
The LFM2.5-230M model differs from the standard transformer design, relying on the LFM2 frame. This structure acts as a hybrid system, short-range convolutions gating with group attention to process information efficiently.
For those who follow the evolution of efficient architectures, the Liquid method shares the same conceptual goal: to manage long content and sequential data efficiently on host hardware without the quadratic memory costs of pure attention methods. The model supports an extended context window of 32K, which allows it to import large documents or continuous streams of robotic telemetry.
If you analyze the performance charts provided in the release, the efficiency of the architecture is clearly visible. The model maintains a memory footprint of less than 400MB while achieving pre-fill speeds and outperforms comparable models such as the Gemma 3 1B IT and Granite 4.0-H-350M.
In the Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 CPU, the model reaches a decode speed of 213 tokens per second. Even on the more forced Raspberry Pi 5, the model maintains a decode rate of 42 tokens per second. In addition, internal benchmarks show that the GPU inference stack delivers lower end-to-end latency than competing micro-models at all concurrency levels.
Why it matters to businesses
To understand why a 230 million parameter model is needed, one must look at how businesses currently manage data.
Organizations have traditionally relied on rigid, rule-based Extract, Transform, Load (ETL) scripts to move and process data. However, these legacy systems are notorious; a simple change in document structure or schema update can break the entire route.
To solve this, the industry is switching to it "AI ETL," when machine learning enters the map, it detects schema drift, and adapts to the versions automatically. In a modern lightweight data extraction pipeline, an AI model connects to unstructured sources—such as PDFs, emails, or web forms—and organizes the data into JSON-like formats without requiring hard-coded rules.
For businesses, using a large flagship model like Claude Opus 4.6 (which costs $5.00 per million input tokens) to analyze regular invoices, format addresses, or route telemetry data is not economically viable.
This is where models like the LFM2.5-230M become critical. Designed expressly as a lightweight extraction engine, it allows companies to automate repetitive formatting and data analysis at a fraction of the computing cost and latency, running directly on on-premise hardware rather than relying on expensive, persistent cloud API calls.
Small Model Benchmarks: LFM vs. Class 3B
The AI industry in mid-2026 is seeing a renaissance "small" models, but the definition of "small" they vary.
Recently, the open-minded community was surprised by Weibo’s VibeThinker-3B, a 3 billion parameter model built on a Qwen2-style backbone that achieved a massive 94.3 in the AIME 2026 statistical benchmark, competing with 600 billion parameters through data reinforcement learning.
Similarly, Google’s Gemma 4 family – which recently surpassed 200 million downloads – pushes the AI frontier to the edge, including E2B (two billion parameters) designed specifically for mobile and IoT deployments.
In contrast, Liquid AI’s LFM2.5-230M operates in a completely different weight class. At only 230 million parameters, it’s about one-tenth the size of Google’s smaller Gemma 4 and VibeThinker-3B models.
Due to its extremely small footprint, the LFM2.5-230M is not designed to compete with demanding workloads such as advanced math, coding, or creative writing—a handicap Liquid AI clearly acknowledges.
However, in its target areas of data extraction and tool cost, the model punches above its weight class.
Benchmarks released by Liquid AI show the LFM2.5-230M scoring 43.26 on the BFCLv3 benchmark, dominating IBM’s Granite 4.0-350M (39.58) and completely outperforming larger 1 billion parameter models like Google’s Gemma 16 (IT16).
In CaseReportBench to extract data, it scores 22.51, finishing Qwen3.5-0.8B (Yala).
LFM2.5-230M proves that while 3 billion parameter models like VibeThinker solve advanced equations, the 230 million parameter model is the best, most optimized choice for making structured tool calls and maintaining efficient agent pipelines on constrained hardware.
Application of advanced research
Because it excels in tooling, the LFM2.5-230M serves primarily as a skill selection layer. Liquid AI demonstrated this capability by modeling the Unitree G1 humanoid robot.
Working entirely on the device with the NVIDIA Jetson Orin computer module for the robot, the model successfully processes complex environmental commands.
As noted on the company’s tech blog, the model takes free-form instructions such as, *"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold the forward knee of one leg for 5 seconds, then walk backward at 0.5 meters per second for 3 meters,"* and automatically translate it into a structured multi-step program that calls on the pre-trained low-level skills provided by NVIDIA’s SONIC framework.
Basic and post-trained models are available immediately on Hugging Face, with day one native support across the llama.cpp ecosystem (GGUF), MLX, vLLM, SGlang, and ONNX.
Dual use, open license for custom LFM
Liquid AI ships the LFM2.5-230M under the LFM Open License v1.0. Despite the word "turn on" in the article, this is not an Open Source Initiative (OSI) compliant license; it serves as a limited, dual-use framework.
For independent developers, researchers, and early-stage startups, the license works the same as open source software.
Users receive a perpetual, worldwide, royalty-free license to reproduce, modify, and distribute the model, provided they retain the original copyright notices and prominently identify any changes.
However, the license includes a strong "Limitation on Commercial Use". Any legal entity generating ten million dollars or more in annual revenue forfeits the right to use the model commercially under this agreement.
Large businesses that cross this financial threshold must negotiate a separate, paid commercial agreement with Liquid AI to release the model into production.
This strategy protects the company from having its intellectual property absorbed by large technology conglomerates for free, while producing the model at a low-level engineering level.



