Trung cấpKỹ thuậtClaude APINguồn: Anthropic

Session Memory Compaction — Conversation dài không lo tràn context

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 549 0 5 phút đọc

Nghe bài viết

00:00

1 Công cụ AI sẽ thay đổi cách bạn làm việc: Không xử lý context overflow dẫn đến: API error: Request bị reject vì quá context limit Mất thông tin: Truncate cứng. Điểm mấu chốt là biết cách đặt prompt đúng để nhận kết quả có thể sử dụng ngay.
2 Không có giải pháp hoàn hảo: Chiến lược Ưu điểm Nhược điểm Khi nào dùng Sliding window Đơn giản, nhanh Mất thông tin cũ hoàn toàn Small talk, tasks. Bài viết phân tích rõ trade-off giúp bạn đưa ra quyết định phù hợp với tình huống thực tế.
3 Điểm nhấn quan trọng: import anthropic client = anthropic.Anthropic def slidingwindowchatmessages: list, newmessage: str, maxmessages: int =. Đây là phần mang lại giá trị thực tiễn cao nhất trong toàn bài viết.
4 Tận dụng Claude hiệu quả: Các quyết định hoặc kết luận quan trọng 3 — mẹo quan trọng là cung cấp đủ ngữ cảnh để AI trả về kết quả chính xác hơn 80% so với prompt chung chung.
5 Góc nhìn thực tế: class HierarchicalMemory: """ Bộ nhớ phân cấp: - Working memory: conversation gần nhất detail level - Episodic memory:. Điều quan trọng là hiểu rõ khi nào nên và không nên áp dụng phương pháp này.

Context window của Claude là hữu hạn — claude-haiku-4-5 có 200k tokens, claude-opus-4-5 có 200k tokens. Conversation dài sẽ tới lúc đạt giới hạn này và bạn phải xử lý. Session Memory Compaction là kỹ thuật "nén" lịch sử hội thoại cũ thành tóm tắt, giải phóng space cho conversation tiếp tục.

Vấn đề: Context Window Overflow

Không xử lý context overflow dẫn đến:

API error: Request bị reject vì quá context limit
Mất thông tin: Truncate cứng mất đi context quan trọng
Chi phí tăng: Mỗi request trả tiền cho toàn bộ history
Latency tăng: Xử lý context lớn chậm hơn

Ba chiến lược xử lý

Chiến lược	Ưu điểm	Nhược điểm	Khi nào dùng
Sliding window	Đơn giản, nhanh	Mất thông tin cũ hoàn toàn	Small talk, tasks ngắn
Summarization	Giữ được context quan trọng	Có thể mất chi tiết	Hầu hết ứng dụng
Hierarchical memory	Giữ nhiều thông tin nhất	Phức tạp hơn	Long-term assistants

Chiến lược 1: Sliding Window (đơn giản)

import anthropic

client = anthropic.Anthropic()

def sliding_window_chat(messages: list, new_message: str,
                        max_messages: int = 20) -> tuple:
    """
    Giữ tối đa max_messages cuối cùng trong history.
    Đơn giản nhưng mất context cũ.
    """
    # Thêm message mới
    messages.append({"role": "user", "content": new_message})

    # Trim nếu vượt quá giới hạn
    if len(messages) > max_messages:
        messages = messages[-max_messages:]
        print(f"[Trimmed to last {max_messages} messages]")

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=500,
        messages=messages,
    )

    answer = response.content[0].text
    messages.append({"role": "assistant", "content": answer})

    return answer, messages

# Usage
conversation = []
while True:
    user_input = input("Bạn: ")
    if user_input.lower() in ["quit", "exit"]:
        break
    answer, conversation = sliding_window_chat(conversation, user_input)
    print(f"Claude: {answer}
")

Chiến lược 2: Summarization Compaction (khuyến nghị)

COMPACTION_PROMPT = """Bạn đang giúp tóm tắt lịch sử hội thoại để giải phóng bộ nhớ.

Tóm tắt cuộc hội thoại sau thành một đoạn văn ngắn gọn nhưng đầy đủ thông tin.
Bao gồm:
1. Chủ đề chính đã thảo luận
2. Các quyết định hoặc kết luận quan trọng
3. Thông tin cụ thể quan trọng (số liệu, tên, ngày tháng)
4. Trạng thái hiện tại của task (nếu có)

Lịch sử hội thoại:
{conversation_history}

Viết tóm tắt ngắn gọn (tối đa 300 từ), third-person perspective."""

def compact_conversation(messages: list) -> str:
    """Tóm tắt danh sách messages thành một đoạn văn ngắn."""

    # Format history thành text
    history_text = ""
    for msg in messages:
        role = "Người dùng" if msg["role"] == "user" else "Claude"
        content = msg["content"] if isinstance(msg["content"], str) else str(msg["content"])
        history_text += f"{role}: {content}

"

    prompt = COMPACTION_PROMPT.format(conversation_history=history_text)

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=400,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0,
    )

    return response.content[0].text

class CompactingChatbot:
    """Chatbot tự động compact khi context gần đầy."""

    def __init__(self, system_prompt: str = "", max_tokens_threshold: int = 150000,
                 keep_recent: int = 10):
        self.system_prompt = system_prompt
        self.max_tokens = max_tokens_threshold
        self.keep_recent = keep_recent
        self.messages = []
        self.compacted_summary = ""
        self.total_tokens_used = 0
        self.compaction_count = 0

    def _estimate_tokens(self) -> int:
        """Ước tính số tokens trong conversation hiện tại."""
        total_chars = sum(
            len(msg["content"]) if isinstance(msg["content"], str) else 0
            for msg in self.messages
        )
        return total_chars // 4  # Rough estimate: 4 chars per token

    def _should_compact(self) -> bool:
        return self._estimate_tokens() > self.max_tokens

    def _compact(self):
        """Compact phần cũ của conversation."""
        if len(self.messages) <= self.keep_recent:
            return

        # Chia messages thành: cũ (sẽ compact) và mới (giữ lại)
        old_messages = self.messages[:-self.keep_recent]
        recent_messages = self.messages[-self.keep_recent:]

        # Tóm tắt phần cũ
        print(f"[Compacting {len(old_messages)} messages...]")
        new_summary = compact_conversation(old_messages)

        # Kết hợp với summary cũ nếu có
        if self.compacted_summary:
            combine_prompt = f"""Kết hợp hai tóm tắt sau thành một:

TÓM TẮT CŨ:
{self.compacted_summary}

TÓM TẮT MỚI:
{new_summary}

Viết tóm tắt kết hợp (tối đa 400 từ):"""

            response = client.messages.create(
                model="claude-haiku-4-5",
                max_tokens=500,
                messages=[{"role": "user", "content": combine_prompt}],
                temperature=0.0,
            )
            self.compacted_summary = response.content[0].text
        else:
            self.compacted_summary = new_summary

        # Rebuild messages với summary + recent
        self.messages = recent_messages
        self.compaction_count += 1
        print(f"[Compaction #{self.compaction_count} done. Summary: {len(self.compacted_summary)} chars]")

    def chat(self, user_message: str) -> str:
        """Gửi message và nhận response, tự động compact khi cần."""

        # Kiểm tra và compact nếu cần
        if self._should_compact():
            self._compact()

        # Thêm user message
        self.messages.append({"role": "user", "content": user_message})

        # Build system prompt với summary (nếu có)
        system = self.system_prompt
        if self.compacted_summary:
            system += f"

[CONTEXT FROM EARLIER CONVERSATION]
{self.compacted_summary}"

        # Gọi API
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=800,
            system=system,
            messages=self.messages,
        )

        answer = response.content[0].text
        self.messages.append({"role": "assistant", "content": answer})
        self.total_tokens_used += response.usage.input_tokens + response.usage.output_tokens

        return answer

    def get_stats(self) -> dict:
        return {
            "messages_in_memory": len(self.messages),
            "has_summary": bool(self.compacted_summary),
            "compaction_count": self.compaction_count,
            "estimated_current_tokens": self._estimate_tokens(),
            "total_tokens_used": self.total_tokens_used,
        }

# Ví dụ sử dụng
bot = CompactingChatbot(
    system_prompt="Bạn là trợ lý lập kế hoạch dự án. Hãy giúp user theo dõi tiến độ.",
    max_tokens_threshold=50000,  # Compact khi > 50k tokens (thấp để demo)
    keep_recent=6  # Giữ 6 messages gần nhất
)

# Simulate conversation dài
questions = [
    "Tôi đang làm dự án website cho khách hàng ABC.",
    "Deadline là ngày 15/4. Hiện tại đã xong design và backend API.",
    "Frontend cần thêm 2 tuần nữa. Team có 3 người frontend.",
    "Khách hàng muốn thêm tính năng payment gateway.",
    "Chúng ta có nên dùng Stripe hay VNPay?",
    "Tóm tắt lại tiến độ dự án hiện tại cho tôi.",
]

for q in questions:
    print(f"User: {q}")
    answer = bot.chat(q)
    print(f"Claude: {answer[:150]}...")
    print(f"Stats: {bot.get_stats()}")
    print()

Chiến lược 3: Hierarchical Memory

class HierarchicalMemory:
    """
    Bộ nhớ phân cấp:
    - Working memory: conversation gần nhất (detail level)
    - Episodic memory: tóm tắt sessions trước
    - Semantic memory: facts quan trọng, user preferences
    """

    def __init__(self):
        self.working_memory = []     # Messages gần nhất
        self.episodic_memory = []    # Tóm tắt từng session
        self.semantic_memory = {}    # Key facts about user/project

    def update_semantic_memory(self, conversation: list):
        """Extract và lưu facts quan trọng."""
        if not conversation:
            return

        history = "
".join([
            f"{m['role'].title()}: {m['content'][:100]}"
            for m in conversation[-10:]
        ])

        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=200,
            messages=[{
                "role": "user",
                "content": f"""Từ conversation sau, extract các facts quan trọng về user/project.
Format: JSON dict với keys là categories.

{history}

Chỉ trả về JSON:"""
            }],
            temperature=0.0,
        )

        import json, re
        text = response.content[0].text.strip()
        match = re.search(r"{.*}", text, re.DOTALL)
        if match:
            try:
                new_facts = json.loads(match.group())
                self.semantic_memory.update(new_facts)
            except json.JSONDecodeError:
                pass

    def build_context_for_request(self) -> str:
        """Tạo context string từ tất cả memory layers."""
        context_parts = []

        if self.semantic_memory:
            facts = "
".join(f"- {k}: {v}" for k, v in self.semantic_memory.items())
            context_parts.append(f"Known facts:
{facts}")

        if self.episodic_memory:
            recent_episodes = self.episodic_memory[-3:]  # 3 sessions gần nhất
            episodes_text = "

".join(recent_episodes)
            context_parts.append(f"Previous sessions:
{episodes_text}")

        return "

".join(context_parts)

Khi nào compact?

Có 3 trigger phổ biến:

Token threshold: Compact khi estimated tokens vượt X% của context limit
Message count: Compact sau mỗi N messages (đơn giản, predictable)
Time-based: Compact sau mỗi session/ngày

class SmartCompactionTrigger:
    def __init__(self, max_tokens: int = 100000, max_messages: int = 50,
                 compact_ratio: float = 0.7):
        self.max_tokens = max_tokens
        self.max_messages = max_messages
        self.compact_ratio = compact_ratio  # Compact khi đạt 70% giới hạn

    def should_compact(self, messages: list, current_tokens: int) -> bool:
        return (
            current_tokens > self.max_tokens * self.compact_ratio
            or len(messages) > self.max_messages
        )

Memory compaction là kỹ thuật nền tảng cho mọi ứng dụng chatbot nghiêm túc. Kết hợp với Prompt Caching để giảm chi phí sau mỗi compaction cycle.

Tính năng liên quan:Context Management Memory Compaction Long Conversations Summarization

Bai viet co huu ich khong?

Writer cho nền tảng kiến thức Claude AI cho người Việt. Software engineer với hơn 20 năm kinh nghiệm, đam mê AI và chia sẻ kiến thức công nghệ.

5 bài viết · 16K lượt đọc

Bình luận (0)

Đăng nhập để bình luận...

Đăng nhập để bình luận

Đang tải bình luận...

Gợi ý cho bạn

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Session Memory Compaction — Conversation dài không lo tràn context

Điểm nổi bật

Vấn đề: Context Window Overflow

Ba chiến lược xử lý

Chiến lược 1: Sliding Window (đơn giản)

Chiến lược 2: Summarization Compaction (khuyến nghị)

Chiến lược 3: Hierarchical Memory

Khi nào compact?

Gợi ý cho bạn

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Context Compaction — Tự động nén context cho conversations dài

Claude Context Window Optimization — Tận dụng 1M token hiệu quả

Tóm tắt văn bản với Claude — Từ cơ bản đến domain-specific

Tin liên quan nên xem

Claude Release Notes 2025-2026: Tất Cả Tính Năng Quan Trọng Nhất Theo Dòng Thời Gian

Cộng Đồng Reddit Nghĩ Gì Về Claude AI Năm 2026: Tổng Hợp Thực Tế

Tóm tắt trang web với Claude Haiku — Nhanh và rẻ

Claude Cowork Có Thể Xóa File Của Bạn Không? Sự Thật Về An Toàn Và Quyền Kiểm Soát

Session Memory Compaction — Conversation dài không lo tràn context

Điểm nổi bật

Vấn đề: Context Window Overflow

Ba chiến lược xử lý

Chiến lược 1: Sliding Window (đơn giản)

Chiến lược 2: Summarization Compaction (khuyến nghị)

Chiến lược 3: Hierarchical Memory

Khi nào compact?

Gợi ý cho bạn

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Context Compaction — Tự động nén context cho conversations dài

Claude Context Window Optimization — Tận dụng 1M token hiệu quả

Tóm tắt văn bản với Claude — Từ cơ bản đến domain-specific

Tin liên quan nên xem

Claude Release Notes 2025-2026: Tất Cả Tính Năng Quan Trọng Nhất Theo Dòng Thời Gian

Cộng Đồng Reddit Nghĩ Gì Về Claude AI Năm 2026: Tổng Hợp Thực Tế

Tóm tắt trang web với Claude Haiku — Nhanh và rẻ

Claude Cowork Có Thể Xóa File Của Bạn Không? Sự Thật Về An Toàn Và Quyền Kiểm Soát

Đăng ký nhận bản tin