Tech

The new agent memory frame uses 118K tokens per query. LangMem is hot at 3.26M.

0 0 5 minutes read

The new agent memory frame uses 118K tokens per query. LangMem is hot at 3.26M.

Horizon thinking reveals a key weakness in AI agents: context windows fill up quickly, and retrieval pipelines return noise instead of signal.

To solve this, researchers at the National University of Singapore developed MRAgent, a framework that leaves static "bring back the reason" come closer. Instead, it uses a method that allows the agent to improve its memory based on the collected evidence.

This multi-step memory reconstruction is combined with a large-scale language model (LLM) reasoning process. Although not the only framework in this space, MRAgent significantly reduces token consumption and runtime costs compared to other agent memory management methods.

Limits on passive returns to long-horizon activities

In classical retrieval systems, documents are retrieved by vector search or graph opening and passed to LLM for processing. This passive approach fails because it cannot combine thinking with memory access, creating three major obstacles:

These programs can’t update their average recovery strategy. If an agent is retrieving a document and finds a key clue missing – a specific date or person – it has no way to issue a new query based on what was found.
Corrected match scores and predefined graph expansions return high-level matches that fill the LLM context window with irrelevant noise, degrading reasoning.
Current systems rely heavily on pre-built structures such as high-k outputs and static parallel functions, which limit the flexibility needed to access unpredictable, long-horizon user interactions.

The researchers argue that to overcome these limitations, developers must shift to an “active and integrated reconstruction process,” an idea inspired by cognitive neuroscience.

Under this paradigm, memory retrieval occurs sequentially instead of operating as a static database transaction read. The system starts with small, specific triggers from the user’s input, such as a person’s name, action, or location. These first tips point to connecting concepts or paragraphs instead of large blocks of text.

By following this metadata ladder, the agent collects small pieces of evidence one by one. It uses each piece of information to guide its next step until it successfully assembles a complete and accurate story.

How MRAgent uses it to rebuild working memory

Instead of viewing memory as a static database, MRAgent (Memory Reasoning Architecture for LLM Agents) treats it as an interactive environment. When processing a complex query, the agent uses LLM’s backbone reasoning capabilities to explore multiple candidate paths in the structured memory graph.

At each step, LLM examines the central evidence it has collected and uses it to iteratively improve its search. It sets barriers to new searches, follows paths with the best information, and prunes irrelevant branches. This allows the MRAgent to cover deeply buried information without flooding the LLM context with noise.

To make this active evaluation more efficient and scalable, the framework organizes its database using the “Cue-Tag-Content” method. This works as a multi-layer clustering graph with three types of nodes:

Symptoms: Fine-grained keywords, such as entities or contextual attributes extracted from user interactions.
Contents: The actual storage units of memory. These are divided into many granular layers, such as episodic memory of physical events and semantic memory of stable facts and preferences of the user.
Tags: Semantic bridges summarize the association of relationships between Cues and specific content.

This feature enables a highly efficient two-stage recovery process. LLM starts by navigating from Cues to candidate tags. Because the tags clearly simplify the semantic relationships and structural associations of the data, the agent evaluates these short summaries to judge their validity. LLM identifies promising shortcuts and discards irrelevant branches before computing and telling tokens to access detailed, memory-heavy content.

For example, a user might ask an AI agent, "How did Nate spend the prize money when he won his third video game tournament?"

MRAgent starts by extracting well-parsed startup pointers from the notification, such as "Nate," "video game tournament," again "overcome."
The agent maps these primitives to a memory graph and looks for available linking markers that are linked to them. The agent sees matching tags "A championship victory" again "Participation in the Competition.” Concerned only with what a person did after winning a championship, the MRAgent discards the tournament participation marker and goes after the victory marker.
The agent retrieves the episode content linked to the selected Cue-Tag pair, finding three different memory episodes where Nate won the tournament.
MRAgent looks at three memories, decides that one of them specifically matches the query, and discards the other two.
With this information, it updates its indicators and begins another round of detection and pruning. From the new episode memory it retrieved, the agent adds “tournament earnings” to its tags and uses that to cross new tags and enter new memories. It repeats this process until it gathers enough information to answer the question, which might be something like “Nate saved the money.”

MRAgent performance on industry benchmarks

MRAgent works closely with several other frameworks that deal with agent memory creation. Other approaches include A-MEM, a graph-based agent memory framework, and MemoryOS, a hierarchical memory framework. Other persistent memory structures include LangMem and Mem0.

The researchers tested MRAgent on the LoCoMo and LongMemEval industry benchmarks. This tests agents’ abilities to solve queries in long-horizon tasks and conversations across multiple times and hundreds of conversational curves. The core models used are Gemini 2.5 Flash and Claude Sonnet 4.5. The program was tested against the standard RAG, A-MEM, MemoryOS, LangMem, and Mem0.

MRAgent consistently outperforms baseline in both models and all query types by a significant margin.

However, for business engineers, the most important metric is often the cost of computing. In the LongMemEval tests, MRAgent quickly reduced token consumption to only 118k per sample. In comparison, A-Mem consumed 632k tokens, and LangMem burned 3.26 million tokens per query. MRAgent also effectively cuts runtime in half compared to A-Mem, down from 1,122 seconds to 586 seconds.

What makes MRAgent successful is its on-demand behavior. Checking tags and pruning irrelevant paths before retrieving saves money and context space. In addition, the system automatically checks its collected context and knows naturally when to stop searching, completely avoiding unnecessary data checks.

Implementation and development catch

Although MRAgent works very well, the Cue-Tag-Content structure needs to be configured before the agent can query it. Developers must figure out how to build a basic in-memory database so that LLM can efficiently navigate complex objects and prune non-essential paths without exploding in computational cost.

Fortunately, developers don’t have to manually label or edit this data. The authors designed MRAgent with an automated dewatering pipeline that uses LLMs to process raw interaction histories and automatically populate the memory graph. For the developer, the task is to implement and configure this automated import pipeline, rather than tagging the data manually.

You need to set up a background job or a streaming pipeline that bypasses raw user interaction with prompt templates to extract this metadata before storing it in your graph database.

However, the authors stress that this is a lightweight build and MRAgent intentionally keeps importing simple.

The authors have released the code on GitHub.

Mosegas 12 hours ago

0 0 5 minutes read