Ever felt like your LLM needs a memory?

LangChain felt the same thing. From full chat transcripts to summaries, entities, and vector backed recall, it gives you several ways to make a stateless model feel like it actually remembers what matters.

Large Language Models are inherently stateless. Every request you send arrives as a blank slate with no recollection of what was discussed five minutes ago. To create a coherent conversation, the system must manually feed previous messages back into the model.

LangChain provides several distinct patterns for managing this history. Choosing the right one is a balance between providing perfect context and managing the cost of every token.

LangChain Memory Types

  • Use the Transcript Pattern for quick, high precision support tasks.
  • Use the Window Pattern for predictable, task oriented interactions.
  • Use the Summary Pattern for long, creative, or collaborative sessions.
  • Use the Entity Pattern for personal assistants that track user preferences.
  • Use the Vector Retrieval Pattern for knowledge intensive systems with vast histories.

The Transcript Pattern

The simplest way to maintain a conversation is through a direct buffer. This stores every word exactly as it was spoken in a sequential list.

  • Every message from the user and every response from the AI is saved verbatim.
  • The entire history is appended to the prompt for the next turn.
  • It provides the model with the most accurate and raw context possible.
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.save_context({"input": "What is the capital of France?"}, {"output": "The capital of France is Paris."})
memory.load_memory_variables({})

An example of this is a customer support bot helping a user reset a password. The bot needs to remember the specific email address and the error code mentioned two sentences ago to provide a precise solution. While excellent for short interactions, this does not scale for long sessions where the prompt becomes massive.

The Window Pattern

To solve the scaling issue of a raw buffer, we can use a sliding window. This strategy only keeps the most recent portion of the conversation.

  • The system only remembers the last few interactions, defined by a fixed count.
  • Older segments are discarded as new ones arrive.
  • This keeps the prompt size and API costs predictable.
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=2)
memory.save_context({"input": "I live in London"}, {"output": "London is a great city."})
memory.save_context({"input": "What is the weather like?"}, {"output": "It is currently rainy in London."})

A weather assistant is a perfect candidate for this pattern. If you ask for the forecast in London and then ask “What about tomorrow?”, the bot only needs the most recent context to understand that you are still talking about London. It does not need to remember that you asked about the news ten minutes ago.

The Summary Pattern

For very long term dialogues, a summarization strategy is more effective. Instead of saving every word, the system maintains a running overview of the discussion.

  • After each interaction, the system updates a concise summary of the key points.
  • Only this summary is sent to the primary model as context.
  • It handles massive transcripts while keeping the context size relatively flat.
from langchain.memory import ConversationSummaryMemory
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
memory = ConversationSummaryMemory(llm=llm)
memory.save_context({"input": "Explain the plot of Inception"}, {"output": "Inception is about dreams within dreams..."})

Consider a creative writing assistant helping you plot a novel. Over several hours, you might discuss dozens of characters and plot points. Instead of feeding the whole transcript, the system carries a summary that tracks the main objective and the current state of the story.

The Entity Pattern

Some applications require remember specific facts about people or technical concepts without carrying the entire dialogue.

  • The system extracts key participants or topics mentioned in the chat.
  • It builds a structured knowledge base about these specific items.
  • Relevant facts are pulled from storage when the topic resurfaces.
from langchain.memory import ConversationEntityMemory
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
memory = ConversationEntityMemory(llm=llm)
memory.save_context({"input": "My name is Tomer and I use Kotlin"}, {"output": "Nice to meet you Tomer."})

An example is a personalized coding coach. If you mention that you prefer a specific library like React or a particular cloud provider, the system stores that fact. When you later ask for a code sample, it automatically applies those preferences without needing to reread the original transcript.

The Vector Retrieval Pattern

The most advanced method involves treating the conversation like a database. This allows the model to recall information from any point in the history based on semantic relevance.

  • Past message snippets are stored in a vector database.
  • The system performs a search based on the current user query.
  • It retrieves only the most relevant historical segments.
from langchain.memory import VectorStoreRetrieverMemory
import faiss
from langchain_community.docstore import InMemoryDocstore
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

vectorstore = FAISS(OpenAIEmbeddings().embed_query, faiss.IndexFlatL2(1536), InMemoryDocstore({}), {})
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)

This is the ideal choice for an AI researcher. If you are discussing a series of academic papers over several weeks, the model can pull a specific detail from a conversation you had ten days ago because it is semantically related to your current question.