Nâng caoKỹ thuậtClaude APINguồn: Anthropic

Context Compaction — Tự động nén context cho conversations dài

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 538 0 6 phút đọc

Nghe bài viết

00:00

1 Áp dụng ngay: Quy trình gồm 4 bước: Monitor token usage — SDK liên tục theo dõi lượng tokens đã dùng trong context window Inject — phần này cung cấp quy trình cụ thể giúp bạn triển khai hiệu quả mà không cần thử nghiệm nhiều lần.
2 Một điều ít người đề cập: Để kích hoạt, dùng @betatool decorator và set compactioncontrol : import anthropic from anthropic import Anthropic. Hiểu rõ bối cảnh áp dụng sẽ quyết định 80% thành công khi triển khai.
3 Không thể bỏ qua: Đây là bài toán điển hình: agent nhận 25 tickets, phải xử lý từng cái, tổng hợp kết quả. Đây là kiến thức nền tảng mà mọi người làm việc với AI đều cần hiểu rõ.
4 Công cụ AI sẽ thay đổi cách bạn làm việc: Starting agent with 25 tickets... Điểm mấu chốt là biết cách đặt prompt đúng để nhận kết quả có thể sử dụng ngay.
5 Thực tế không hoàn hảo: Aspect Server-side compactioncontrol Client-side manual Implementation 1 parameter, tự động Phải tự code logic. Biết trước giới hạn sẽ giúp bạn thiết lập kỳ vọng đúng và tránh thất vọng không cần thiết.

A white robot is standing in front of a black background

Khi xây dựng agents xử lý long-running tasks — customer service xử lý hàng chục tickets, coding assistant debug qua nhiều giờ, research agent phân tích documents — context window sẽ đầy. Đây không phải bug, đây là giới hạn vật lý của mọi LLM. Câu hỏi là: bạn xử lý nó như thế nào?

Anthropic Agent SDK cung cấp context compaction tự động thông qua compaction_control parameter — giải pháp server-side được khuyến nghị cho production workloads.

Context Compaction hoạt động như thế nào?

Thay vì đơn giản truncate (cắt bỏ messages cũ), compaction thông minh hơn nhiều. Quy trình gồm 4 bước:

Monitor token usage — SDK liên tục theo dõi lượng tokens đã dùng trong context window
Inject summary prompt — Khi gần đầy, SDK tự động inject một prompt yêu cầu tóm tắt toàn bộ conversation
Model generates summary — Claude tạo summary trong <summary> tags, bao gồm tất cả thông tin quan trọng
Clear history, resume with summary — Context history được xóa, thay bằng summary. Conversation tiếp tục như bình thường

Kết quả: agent có thể chạy vô hạn thời gian mà không bao giờ hết context — như một nhân viên có thể nhớ tóm tắt lịch sử thay vì nhớ từng từ.

Setup: Beta decorator và compaction_control

Context compaction hiện là beta feature. Để kích hoạt, dùng @beta_tool decorator và set compaction_control:

import anthropic
from anthropic import Anthropic

client = Anthropic()

# Cấu hình compaction
# threshold: % context window đã dùng trước khi compact
# model dùng để tạo summary (nên dùng fast model)
compaction_config = {
    "type": "enabled",
    "summary_context_ratio": 0.5,  # Compact khi dùng 50% context
}

# Tạo agent với compaction enabled
def create_compaction_agent():
    return client.beta.messages.create(
        model="claude-opus-4-5",
        max_tokens=4096,
        betas=["context-compaction-2025-06-01"],
        # compaction_control được pass qua beta headers
    )

Sử dụng Agent SDK (cách khuyến nghị)

from anthropic.agents import AgentLoop

# AgentLoop tự động handle compaction
agent = AgentLoop(
    client=client,
    model="claude-opus-4-5",
    tools=[...],
    compaction_control={
        "type": "enabled",
        "summary_context_ratio": 0.5
    }
)

# Run agent — sẽ tự compact khi cần
async def run_with_compaction():
    result = await agent.run(
        "Process these 25 customer service tickets: ..."
    )
    return result

Demo: Customer Service Agent xử lý 20-30 tickets

Đây là bài toán điển hình: agent nhận 25 tickets, phải xử lý từng cái, tổng hợp kết quả. Không có compaction, context sẽ đầy sau khoảng ticket thứ 10-15.

Define tools cho customer service agent

CUSTOMER_SERVICE_TOOLS = [
    {
        "name": "get_ticket_details",
        "description": "Lấy chi tiết một ticket support",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticket_id": {"type": "string"}
            },
            "required": ["ticket_id"]
        }
    },
    {
        "name": "update_ticket_status",
        "description": "Cập nhật trạng thái ticket",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticket_id": {"type": "string"},
                "status": {
                    "type": "string",
                    "enum": ["resolved", "pending", "escalated"]
                },
                "resolution": {"type": "string"}
            },
            "required": ["ticket_id", "status"]
        }
    },
    {
        "name": "send_customer_reply",
        "description": "Gửi reply cho khách hàng",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticket_id": {"type": "string"},
                "message": {"type": "string"}
            },
            "required": ["ticket_id", "message"]
        }
    }
]

Simulate ticket database

TICKET_DATABASE = {
    f"TK-{i:03d}": {
        "id": f"TK-{i:03d}",
        "customer": f"Nguyen Van {chr(65 + i % 26)}",
        "issue": [
            "San pham bi loi sau 3 ngay su dung",
            "Khong nhan duoc hang sau 7 ngay dat",
            "Yeu cau hoan tien vi san pham khong dung mo ta",
            "Loi thanh toan khi checkout",
            "Can ho tro cai dat san pham"
        ][i % 5],
        "priority": ["high", "medium", "low"][i % 3],
        "status": "open"
    }
    for i in range(25)
}

def handle_tool_call(tool_name, tool_input):
    if tool_name == "get_ticket_details":
        tid = tool_input["ticket_id"]
        return TICKET_DATABASE.get(tid, {"error": "Ticket not found"})

    elif tool_name == "update_ticket_status":
        tid = tool_input["ticket_id"]
        if tid in TICKET_DATABASE:
            TICKET_DATABASE[tid]["status"] = tool_input["status"]
            TICKET_DATABASE[tid]["resolution"] = tool_input.get("resolution", "")
        return {"success": True}

    elif tool_name == "send_customer_reply":
        # In production: gửi email thật
        print(f"[EMAIL] To ticket {tool_input['ticket_id']}: {tool_input['message'][:50]}...")
        return {"success": True, "message_id": f"MSG-{tool_input['ticket_id']}"}

Agent với context compaction

import json

def run_customer_service_agent():
    ticket_ids = [f"TK-{i:03d}" for i in range(25)]
    ticket_list = ", ".join(ticket_ids)

    system_prompt = """Ban la Customer Service Agent cho mot cong ty thuong mai dien tu Viet Nam.
    Nhiem vu: xu ly tat ca tickets duoc giao, phan tich van de, gui reply cho khach,
    va cap nhat trang thai. Sau khi xu ly xong, bao cao tong ket."""

    messages = [{
        "role": "user",
        "content": f"Hay xu ly tat ca {len(ticket_ids)} tickets sau: {ticket_list}. Xu ly tung ticket mot, gui reply phu hop va cap nhat trang thai."
    }]

    print(f"Starting agent with {len(ticket_ids)} tickets...")
    total_input_tokens = 0
    compaction_count = 0

    while True:
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            system=system_prompt,
            tools=CUSTOMER_SERVICE_TOOLS,
            messages=messages,
            extra_headers={
                "anthropic-beta": "context-compaction-2025-06-01"
            }
        )

        # Track token usage
        total_input_tokens += response.usage.input_tokens

        # Detect compaction occurred
        if hasattr(response, 'context_compaction_metadata'):
            compaction_count += 1
            print(f"[Compaction #{compaction_count}] Context compressed successfully")

        if response.stop_reason == "end_turn":
            # Extract final report
            for block in response.content:
                if hasattr(block, 'text'):
                    print("
=== FINAL REPORT ===")
                    print(block.text)
            break

        elif response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    print(f"[Tool] {block.name}({json.dumps(block.input)[:60]}...)")
                    result = handle_tool_call(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

            messages.append({"role": "user", "content": tool_results})
        else:
            print(f"Unexpected stop reason: {response.stop_reason}")
            break

    print(f"
Stats: {total_input_tokens:,} total tokens, {compaction_count} compactions")

run_customer_service_agent()

Output mẫu: Compaction in action

Starting agent with 25 tickets...
[Tool] get_ticket_details({"ticket_id": "TK-000"}...)
[Tool] send_customer_reply({"ticket_id": "TK-000", "message": "Xin chao...")
[Tool] update_ticket_status({"ticket_id": "TK-000", "status": "resolved"...)
[Tool] get_ticket_details({"ticket_id": "TK-001"}...)
...
[Compaction #1] Context compressed successfully
[Tool] get_ticket_details({"ticket_id": "TK-014"}...)
...
[Compaction #2] Context compressed successfully
[Tool] get_ticket_details({"ticket_id": "TK-022"}...)
...

=== FINAL REPORT ===
Da xu ly xong 25 tickets:
- 18 tickets resolved (cap do trung binh va thap)
- 5 tickets escalated (van de phuc tap, can ky thuat vien)
- 2 tickets pending (cho phan hoi tu khach hang)
...

Stats: 284,521 total tokens, 2 compactions

Compaction strategies: Server-side vs Client-side

Aspect	Server-side (compaction_control)	Client-side (manual)
Implementation	1 parameter, tự động	Phải tự code logic compaction
Summary quality	Cao — model tự summary	Phụ thuộc vào code của bạn
Timing control	Hạn chế (threshold config)	Full control
Cost	Thêm tokens cho summary step	Có thể tối ưu hơn nếu code tốt
Recommended for	Production, Opus models	Fine-grained control needed

Anthropic khuyến nghị server-side compaction cho claude-opus-4-5 và các production workloads vì model tự tạo summary chất lượng cao hơn, bảo toàn nhiều context quan trọng hơn.

Khi nào NÊN và KHÔNG NÊN dùng compaction

Nên dùng khi:

Agent cần xử lý nhiều items trong một session (20+ tickets, documents, records)
Debugging session kéo dài nhiều giờ
Research agent phải đọc và tổng hợp nhiều tài liệu
Any workflow mà context window là bottleneck

Không cần khi:

Single-turn queries ngắn
Conversations ít hơn 10-15 exchanges
Khi bạn cần truy cập exact text từ early conversation

Tổng kết

Context compaction với compaction_control parameter là giải pháp production-ready cho long-running agents:

Không cần viết code phức tạp — 1 parameter bật tự động
Summary quality cao vì model tự tạo
Đặc biệt hiệu quả với claude-opus-4-5 cho complex reasoning tasks
Cho phép agents xử lý workflows dài vô hạn mà không crash

Bước tiếp theo: Tìm hiểu Programmatic Tool Calling để giảm latency thêm bước nữa — thay vì round-trips qua model, để Claude viết code gọi tools trực tiếp.

Gợi ý cho bạn

Parallel Tool Calls — Gọi nhiều tools đồng thời với Claude

Context Compaction — Tự động nén context cho conversations dài

Điểm nổi bật

Context Compaction hoạt động như thế nào?

Setup: Beta decorator và compaction_control

Sử dụng Agent SDK (cách khuyến nghị)

Demo: Customer Service Agent xử lý 20-30 tickets

Define tools cho customer service agent

Simulate ticket database

Agent với context compaction

Output mẫu: Compaction in action

Compaction strategies: Server-side vs Client-side

Khi nào NÊN và KHÔNG NÊN dùng compaction

Nên dùng khi:

Không cần khi:

Tổng kết

Bài viết liên quan

Gợi ý cho bạn

Parallel Tool Calls — Gọi nhiều tools đồng thời với Claude

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Xây dựng Customer Service Agent với Claude Tool Use

Tool Use với Pydantic — Type-safe tools cho Claude

Tin liên quan nên xem

Extended Thinking + Tool Use — Suy luận sâu kết hợp công cụ

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

Xây dựng LLM Agent từ đầu — Reference Implementation

Research Agent một dòng code — Bắt đầu với Claude Agent SDK

Context Compaction — Tự động nén context cho conversations dài

Điểm nổi bật

Context Compaction hoạt động như thế nào?

Setup: Beta decorator và compaction_control

Sử dụng Agent SDK (cách khuyến nghị)

Demo: Customer Service Agent xử lý 20-30 tickets

Define tools cho customer service agent

Simulate ticket database

Agent với context compaction

Output mẫu: Compaction in action

Compaction strategies: Server-side vs Client-side

Khi nào NÊN và KHÔNG NÊN dùng compaction

Nên dùng khi:

Không cần khi:

Tổng kết

Bài viết liên quan

Gợi ý cho bạn

Parallel Tool Calls — Gọi nhiều tools đồng thời với Claude

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Xây dựng Customer Service Agent với Claude Tool Use

Tool Use với Pydantic — Type-safe tools cho Claude

Tin liên quan nên xem

Extended Thinking + Tool Use — Suy luận sâu kết hợp công cụ

ReAct Agent với LlamaIndex + Claude — Lý luận + Hành đ��ng

Xây dựng LLM Agent từ đầu — Reference Implementation

Research Agent một dòng code — Bắt đầu với Claude Agent SDK

Đăng ký nhận bản tin