Nâng caoKỹ thuậtClaude APINguồn: Anthropic

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 641 0 7 phút đọc

Nghe bài viết

00:00

1 Hành động cụ thể: Thách thức 1: Mất patterns giữa sessions Hãy tưởng tượng bạn có một Code Review Agent. Phần này hướng dẫn bạn cách triển khai thực tế, không chỉ lý thuyết suông.
2 Thành thật mà nói: Anthropic cung cấp memory tool tên chính thức: memory20250818 — một file-based memory system cho phép Claude ghi nhớ. Phương pháp này hiệu quả trong hầu hết trường hợp, nhưng bạn cần điều chỉnh cho phù hợp ngữ cảnh riêng.
3 Điểm nhấn quan trọng: Dưới đây là agent hoàn chỉnh có khả năng nhớ preferences của team qua nhiều sessions: import anthropic import json. Đây là phần mang lại giá trị thực tiễn cao nhất trong toàn bài viết.
4 Để đạt hiệu quả tối đa: Có ba pattern phổ biến: Pattern 1: Sliding Window Giữ N messages gần nhất, bỏ messages cũ: def. Nhiều người bỏ qua bước này và mất thời gian gấp đôi để đạt cùng kết quả.
5 Thành thật mà nói: Loại thông tin Hành động Lý do Team coding standards Save vào memory Áp dụng cho mọi session tương lai User preferences. Phương pháp này hiệu quả trong hầu hết trường hợp, nhưng bạn cần điều chỉnh cho phù hợp ngữ cảnh riêng.

Khi xây dựng AI agents hoạt động liên tục, bạn sẽ nhanh chóng gặp hai thách thức không thể tránh khỏi: mất kiến thức giữa các sessions và context window bị đầy trong conversations dài. Bài viết này giới thiệu các giải pháp chính thức từ Anthropic cho cả hai vấn đề.

Hai thách thức cốt lõi của long-running agents

Thách thức 1: Mất patterns giữa sessions

Hãy tưởng tượng bạn có một Code Review Agent. Mỗi ngày nó review hàng chục pull requests cho team của bạn. Nhưng mỗi lần khởi động mới, nó "quên" hết mọi thứ:

Team này ưu tiên readable code hơn clever code
Project dùng snake_case cho Python variables
Security reviews cần đặc biệt nghiêm ngặt với authentication code
Junior developer X đang học và cần feedback chi tiết hơn

Agent phải "học lại" từ đầu mỗi session — hoặc bạn phải nhét toàn bộ context đó vào system prompt, làm nó ngày càng phình to.

Thách thức 2: Context window bị đầy

Với conversations dài — debugging session kéo dài nhiều giờ, project planning qua nhiều vòng — context window sẽ đầy. Lúc đó bạn phải chọn: truncate (mất thông tin cũ) hay crash.

Giải pháp 1: Memory Tool cho cross-session learning

Anthropic cung cấp memory tool (tên chính thức: memory_20250818) — một file-based memory system cho phép Claude ghi nhớ thông tin quan trọng giữa các conversations.

Cách hoạt động

Memory tool sử dụng một file văn bản đặt tại thư mục /memories. Claude có thể đọc và ghi file này giữa các conversations. Đây là client-side implementation — bạn quản lý file này trên server của mình, không phải Anthropic.

import anthropic

client = anthropic.Anthropic()

# Khởi tạo memory tool
def create_agent_with_memory(memory_file_path):
    # Đọc memories hiện tại
    try:
        with open(memory_file_path, 'r', encoding='utf-8') as f:
            current_memories = f.read()
    except FileNotFoundError:
        current_memories = "No memories yet."

    system_prompt = f"""You are a Code Review Agent for a Vietnamese software team.

CURRENT MEMORIES:
{current_memories}

Use your memory to provide consistent, personalized reviews.
When you learn something important about the team's preferences,
coding standards, or individual developers, save it to memory."""

    return system_prompt

Define memory tool

MEMORY_TOOL = {
    "name": "memory_20250818",
    "description": """Save important information to long-term memory.
    Use this when you learn:
    - Team coding preferences and standards
    - Project-specific conventions
    - Individual developer patterns
    - Recurring issues to watch for""",
    "input_schema": {
        "type": "object",
        "properties": {
            "action": {
                "type": "string",
                "enum": ["save", "clear"],
                "description": "save: append to memory. clear: reset memory."
            },
            "content": {
                "type": "string",
                "description": "The information to save to memory."
            }
        },
        "required": ["action", "content"]
    }
}

Xử lý memory tool calls

def handle_memory_tool(tool_input, memory_file_path):
    action = tool_input["action"]
    content = tool_input["content"]

    if action == "save":
        # Append to memory file
        with open(memory_file_path, 'a', encoding='utf-8') as f:
            from datetime import datetime
            timestamp = datetime.now().strftime("%Y-%m-%d")
            f.write(f"
[{timestamp}] {content}")
        return {"success": True, "message": "Memory saved."}

    elif action == "clear":
        with open(memory_file_path, 'w', encoding='utf-8') as f:
            f.write("")
        return {"success": True, "message": "Memory cleared."}

    return {"success": False, "message": "Unknown action."}

Demo: Code Review Agent với long-term memory

Dưới đây là agent hoàn chỉnh có khả năng nhớ preferences của team qua nhiều sessions:

import anthropic
import json
import os

client = anthropic.Anthropic()
MEMORY_FILE = "/memories/code_review_agent.txt"
os.makedirs("/memories", exist_ok=True)

def run_code_review_session(code_to_review, feedback=""):
    """Chạy một session review với memory persistence."""

    # Load memories
    try:
        with open(MEMORY_FILE, 'r') as f:
            memories = f.read() or "No prior memories."
    except FileNotFoundError:
        memories = "No prior memories."

    system = f"""You are a Code Review Agent for a Vietnamese dev team.

LONG-TERM MEMORIES:
{memories}

Review code thoroughly. When you notice consistent patterns
about team preferences, save them to memory for future sessions.
Always respond in Vietnamese."""

    messages = [{"role": "user", "content": code_to_review}]
    if feedback:
        messages.append({"role": "user", "content": f"Team feedback: {feedback}"})

    # Agentic loop
    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=2048,
            system=system,
            tools=[MEMORY_TOOL],
            messages=messages
        )

        if response.stop_reason == "end_turn":
            # Extract text response
            for block in response.content:
                if hasattr(block, 'text'):
                    return block.text
            break

        elif response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    result = handle_memory_tool(block.input, MEMORY_FILE)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

            messages.append({"role": "user", "content": tool_results})
        else:
            break

    return "Review completed."

Test qua nhiều sessions

# Session 1: Review code lần đầu
code_sample = """
def getUserData(userId):
    result = db.query(f"SELECT * FROM users WHERE id = {userId}")
    return result
"""

review = run_code_review_session(
    code_sample,
    feedback="Team note: chúng tôi dùng snake_case và rất coi trọng SQL injection prevention"
)
print("Session 1:", review)

# Session 2: Agent đã nhớ preferences từ session trước
code_sample_2 = """
def calculateTotalPrice(items, discount):
    total = sum([item.price for item in items])
    return total * (1 - discount)
"""

review2 = run_code_review_session(code_sample_2)
print("Session 2:", review2)
# Agent sẽ tự động check SQL injection và naming conventions
# vì đã lưu preferences từ session trước

Giải pháp 2: Context Editing — Auto-compaction strategies

Khi conversation dài, bạn cần chiến lược xử lý context thông minh. Có ba pattern phổ biến:

Pattern 1: Sliding Window

Giữ N messages gần nhất, bỏ messages cũ:

def sliding_window_messages(messages, max_messages=20):
    """Giữ system + N messages gần nhất."""
    if len(messages) <= max_messages:
        return messages
    # Luôn giữ system message đầu tiên nếu có
    return messages[-max_messages:]

Pattern 2: Summary-based compaction

Tóm tắt messages cũ thay vì xóa hoàn toàn:

def compact_with_summary(messages, threshold=15000):
    """Khi tokens gần đầy, tóm tắt conversation cũ."""
    # Ước tính token count (4 chars ~ 1 token)
    total_chars = sum(
        len(str(m.get('content', '')))
        for m in messages
    )

    if total_chars < threshold * 4:
        return messages  # Chưa cần compact

    # Tóm tắt nửa đầu conversation
    old_messages = messages[:-5]  # Giữ 5 messages gần nhất
    recent_messages = messages[-5:]

    summary_prompt = "Summarize the key decisions, findings, and context from this conversation in 3-5 bullet points:"
    summary_content = "
".join([
        str(m.get('content', '')) for m in old_messages
    ])

    summary_response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"{summary_prompt}

{summary_content}"
        }]
    )

    summary_text = summary_response.content[0].text
    summary_message = {
        "role": "user",
        "content": f"[CONVERSATION SUMMARY]
{summary_text}"
    }

    return [summary_message] + recent_messages

Pattern 3: Selective retention

Giữ lại những messages có giá trị cao (tool results, decisions) và bỏ messages chit-chat:

def selective_retention(messages, max_tokens=50000):
    """Giữ messages quan trọng, bỏ filler messages."""
    important_keywords = [
        'decision', 'important', 'remember', 'conclusion',
        'error', 'fix', 'solution', 'warning', 'critical'
    ]

    scored_messages = []
    for i, msg in enumerate(messages):
        content = str(msg.get('content', '')).lower()
        score = sum(1 for kw in important_keywords if kw in content)
        # Messages gần đây được ưu tiên hơn
        recency_bonus = i / len(messages) * 3
        scored_messages.append((score + recency_bonus, i, msg))

    # Sort by score, giữ top messages
    scored_messages.sort(reverse=True)

    # Estimate tokens và chọn messages fit trong budget
    retained = []
    token_count = 0
    for score, idx, msg in scored_messages:
        msg_tokens = len(str(msg.get('content', ''))) // 4
        if token_count + msg_tokens < max_tokens:
            retained.append((idx, msg))
            token_count += msg_tokens

    # Sắp xếp lại theo thứ tự ban đầu
    retained.sort(key=lambda x: x[0])
    return [msg for _, msg in retained]

Best practices: Khi nào nên save vs forget?

Loại thông tin	Hành động	Lý do
Team coding standards	Save vào memory	Áp dụng cho mọi session tương lai
User preferences đã confirm	Save vào memory	Personalization tốt hơn
Kết quả tính toán tạm thời	Forget sau session	Không có giá trị dài hạn
Lỗi đã fix	Save pattern, forget details	Pattern quan trọng, không phải line numbers
Chit-chat, greetings	Forget	Tốn context, không có giá trị
Security vulnerabilities đã phát hiện	Save ngay lập tức	Critical — cần nhớ để tránh lặp lại

Kết hợp memory + compaction: Complete agent pattern

class PersistentAgent:
    """Agent với cả long-term memory và context compaction."""

    def __init__(self, agent_id, memory_dir="/memories"):
        self.agent_id = agent_id
        self.memory_file = f"{memory_dir}/{agent_id}.txt"
        self.conversation_history = []
        self.max_context_chars = 60000
        os.makedirs(memory_dir, exist_ok=True)

    def load_memories(self):
        try:
            with open(self.memory_file, 'r') as f:
                return f.read()
        except FileNotFoundError:
            return "No memories yet."

    def chat(self, user_message):
        # Auto-compact nếu context quá lớn
        total_chars = sum(
            len(str(m.get('content', '')))
            for m in self.conversation_history
        )

        if total_chars > self.max_context_chars:
            self.conversation_history = compact_with_summary(
                self.conversation_history
            )

        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })

        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=2048,
            system=f"Your memories:
{self.load_memories()}",
            tools=[MEMORY_TOOL],
            messages=self.conversation_history
        )

        # Handle tool calls và responses...
        # (full agentic loop như trên)

        return response

Tổng kết

Memory management là nền tảng để xây dựng AI agents thực sự hữu ích trong production:

Memory tool (memory_20250818) giải quyết vấn đề mất kiến thức giữa sessions
Context compaction giải quyết vấn đề context window bị đầy trong conversations dài
Kết hợp cả hai tạo ra agents có thể hoạt động liên tục, học hỏi và cải thiện theo thời gian

Bước tiếp theo: Tìm hiểu Context Compaction tự động với compaction_control parameter để server tự quản lý việc nén context thay vì bạn phải tự code.

Tính năng liên quan:Memory Tool Context Editing Agent Memory

Bai viet co huu ich khong?

Writer cho nền tảng kiến thức Claude AI cho người Việt. Software engineer với hơn 20 năm kinh nghiệm, đam mê AI và chia sẻ kiến thức công nghệ.

5 bài viết · 16K lượt đọc

Bình luận (0)

Đăng nhập để bình luận...

Đăng nhập để bình luận

Đang tải bình luận...

Gợi ý cho bạn

Xây dựng Customer Service Agent với Claude Tool Use

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Điểm nổi bật

Hai thách thức cốt lõi của long-running agents

Thách thức 1: Mất patterns giữa sessions

Thách thức 2: Context window bị đầy

Giải pháp 1: Memory Tool cho cross-session learning

Cách hoạt động

Define memory tool

Xử lý memory tool calls

Demo: Code Review Agent với long-term memory

Test qua nhiều sessions

Giải pháp 2: Context Editing — Auto-compaction strategies

Pattern 1: Sliding Window

Pattern 2: Summary-based compaction

Pattern 3: Selective retention

Best practices: Khi nào nên save vs forget?

Kết hợp memory + compaction: Complete agent pattern

Tổng kết

Gợi ý cho bạn

Xây dựng Customer Service Agent với Claude Tool Use

Thiết kế Tool Use cho AI Agent — Nguyên tắc và best practices

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Tool Choice — Kiểm soát cách Claude chọn và gọi tools

Tin liên quan nên xem

Parallel Tool Calls — Gọi nhiều tools đồng thời với Claude

Vision + Tool Use — Trích xuất dữ liệu từ hình ảnh

Tool Use với Pydantic — Type-safe tools cho Claude

Session Memory Compaction — Conversation dài không lo tràn context

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Điểm nổi bật

Hai thách thức cốt lõi của long-running agents

Thách thức 1: Mất patterns giữa sessions

Thách thức 2: Context window bị đầy

Giải pháp 1: Memory Tool cho cross-session learning

Cách hoạt động

Define memory tool

Xử lý memory tool calls

Demo: Code Review Agent với long-term memory

Test qua nhiều sessions

Giải pháp 2: Context Editing — Auto-compaction strategies

Pattern 1: Sliding Window

Pattern 2: Summary-based compaction

Pattern 3: Selective retention

Best practices: Khi nào nên save vs forget?

Kết hợp memory + compaction: Complete agent pattern

Tổng kết

Gợi ý cho bạn

Xây dựng Customer Service Agent với Claude Tool Use

Thiết kế Tool Use cho AI Agent — Nguyên tắc và best practices

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Tool Choice — Kiểm soát cách Claude chọn và gọi tools

Tin liên quan nên xem

Parallel Tool Calls — Gọi nhiều tools đồng thời với Claude

Vision + Tool Use — Trích xuất dữ liệu từ hình ảnh

Tool Use với Pydantic — Type-safe tools cho Claude

Session Memory Compaction — Conversation dài không lo tràn context

Đăng ký nhận bản tin