fix(memory): add maxMessages limit to ConversationSummaryBufferMemory#6172
fix(memory): add maxMessages limit to ConversationSummaryBufferMemory#6172octo-patch wants to merge 1 commit intoFlowiseAI:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the ConversationSummaryBufferMemory node to version 2.0 and introduces a maxMessages parameter to limit the number of messages retrieved. While this helps manage the message count for processing, the current implementation performs the slicing in-memory after fetching the entire history from the database. Feedback suggests applying this limit at the database query level to prevent potential memory issues and improve performance.
| if (this.maxMessages && this.maxMessages > 0) { | ||
| chatMessage = chatMessage.slice(-this.maxMessages) | ||
| } |
There was a problem hiding this comment.
The current implementation fetches the entire message history from the database into memory before applying the maxMessages limit via slice. This does not address the performance overhead or potential Out-Of-Memory (OOM) issues associated with loading a very large history, which contradicts the primary goal of this pull request. To effectively optimize the database load, the limit should be applied at the database level using TypeORM's take property. This would require changing the query order to DESC on createdDate, applying the limit, and then reversing the resulting array to maintain the chronological order expected by the subsequent logic.
Fixes #5873
Problem
ConversationSummaryBufferMemory fetches all messages from the database on every request. As a conversation grows, the token-counting and summarization steps operate on an ever-larger dataset, causing increasing latency, cost, and eventual OOM/context-exceeded errors.
The root cause is that movingSummaryBuffer is an in-memory instance variable reset on each request, so every call re-processes the entire message history from scratch.
Solution
Add an optional maxMessages parameter (additional param, no default) that caps how many messages are loaded from the database before token-based pruning is applied.
The node version is bumped from 1.0 to 2.0 to reflect the new input parameter.
Testing