Trung cấpHướng dẫnClaude APINguồn: Anthropic

Xây dựng LLM Agent từ đầu — Reference Implementation

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 667 0 6 phút đọc

Nghe bài viết

00:00

1 Để áp dụng agent là gì? tại sao không dùng framework? hiệu quả, bạn cần nắm rõ: Một LLM Agent về bản chất gồm 3 thứ: Agent Loop — vòng lặp: nhận input → gọi LLM → thực thi tool → lặp lại Tools — các hàm mà agent có thể gọi search web, đọc file — đây là bước quan trọng giúp tối ưu quy trình làm việc với AI trong thực tế.
2 Góc nhìn thực tế về kiến trúc tổng quan: Agent loop hoạt động theo sơ đồ sau: User Input | v LLM decides: respond or use tool? | +---&gt Respond to user DONE | +-- — hiệu quả phụ thuộc nhiều vào cách triển khai và ngữ cảnh sử dụng cụ thể.
3 Dữ liệu từ bước 2: agent loop cho thấy: Đây là trái tim của mọi agent. Logic đơn giản nhưng mạnh mẽ: client anthropic.Anthropic str, max_iterations: int 10: """ Agent loop chinh — những con số này phản ánh mức độ cải thiện thực tế mà người dùng có thể kỳ vọng.
4 Để áp dụng bước 3: chạy thử hiệu quả, bạn cần nắm rõ: Test agent với câu hỏi cần dùng cả hai tools: # Test 1: Single tool result run_agent"Thoi tiet o Ha Noi hom nay the nao?" # Test 2: Multi — đây là bước quan trọng giúp tối ưu quy trình làm việc với AI trong thực tế.
5 Về bước 5: memory compression — xử lý hội thoại dài, thực tế cho thấy Vấn đề: khi conversation dài, context window đầy và chi phí tăng. Giải pháp: compress memory . list, keep_last_n: int 6 -&gt list: """ Giu nguyen system context va n messages gan nhat. Summarize phan con lai — đây là con dao hai lưỡi nếu không hiểu rõ giới hạn và điều kiện áp dụng của nó.

low angle photography of high rise buildings

Bạn đã nghe nhiều về AI Agents — các hệ thống AI tự động thực thi nhiệm vụ phức tạp, gọi công cụ, và đưa ra quyết định. Nhưng thực ra, một agent chỉ là một vòng lặp đơn giản. Bài viết này sẽ xây dựng agent từ đầu, không dùng LangChain, AutoGen, hay bất kỳ framework nào — để bạn hiểu đúng bản chất.

Đây là reference implementation chính thức từ Anthropic, được biên soạn lại cho cộng đồng Việt Nam.

Agent là gì? Tại sao không dùng framework?

Một LLM Agent về bản chất gồm 3 thứ:

Agent Loop — vòng lặp: nhận input → gọi LLM → thực thi tool → lặp lại
Tools — các hàm mà agent có thể gọi (search web, đọc file, tính toán...)
Memory — lịch sử conversation để giữ context

Framework ẩn những thứ này đi, khiến bạn khó debug khi agent làm sai. Xây từ đầu giúp bạn:

Hiểu chính xác mỗi bước agent làm gì
Tùy chỉnh behavior mà không bị giới hạn bởi abstraction
Debug dễ dàng khi agent mắc lỗi
Migrate sang Claude model mới không cần đợi framework update

Kiến trúc tổng quan

Agent loop hoạt động theo sơ đồ sau:

User Input
    |
    v
[LLM decides: respond or use tool?]
    |
    +---> Respond to user (DONE)
    |
    +---> Call tool
              |
              v
         [Execute tool]
              |
              v
         [Add result to memory]
              |
              v
         [Back to LLM]

Vòng lặp tiếp tục cho đến khi LLM quyết định đưa ra final answer thay vì gọi thêm tool.

Bước 1: Định nghĩa Tools

Tool là một Python function kèm schema JSON mô tả input/output của nó. Claude đọc schema để biết cách gọi tool.

import anthropic
import json

# Tool thực tế
def get_weather(city: str) -> str:
    # Trong thực tế, gọi weather API
    return f"Thời tiết tại {city}: 28 do C, co may"

def calculate(expression: str) -> str:
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Loi: {e}"

# Schema cho Claude
tools = [
    {
        "name": "get_weather",
        "description": "Lay thong tin thoi tiet cho mot thanh pho",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "Ten thanh pho can lay thoi tiet"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "calculate",
        "description": "Tinh toan bieu thuc toan hoc",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Bieu thuc can tinh, vi du: 2 + 3 * 4"
                }
            },
            "required": ["expression"]
        }
    }
]

# Map tên tool sang hàm thực thi
tool_map = {
    "get_weather": get_weather,
    "calculate": calculate
}

Bước 2: Agent Loop

Đây là trái tim của mọi agent. Logic đơn giản nhưng mạnh mẽ:

client = anthropic.Anthropic()

def run_agent(user_message: str, max_iterations: int = 10):
    """
    Agent loop chinh.
    - max_iterations: bao ve khong bi loop vo han
    """
    # Memory: lich su conversation
    messages = [
        {"role": "user", "content": user_message}
    ]

    print(f"User: {user_message}")
    print("-" * 50)

    for iteration in range(max_iterations):
        # Goi LLM
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Them response vao memory
        messages.append({
            "role": "assistant",
            "content": response.content
        })

        # Kiem tra stop reason
        if response.stop_reason == "end_turn":
            # Claude da tra loi xong, khong can goi tool
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text = block.text
            print(f"Agent: {final_text}")
            return final_text

        elif response.stop_reason == "tool_use":
            # Claude muon goi tool
            tool_results = []

            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input
                    tool_use_id = block.id

                    print(f"[Tool call] {tool_name}({tool_input})")

                    # Thuc thi tool
                    if tool_name in tool_map:
                        result = tool_map[tool_name](**tool_input)
                    else:
                        result = f"Unknown tool: {tool_name}"

                    print(f"[Tool result] {result}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": result
                    })

            # Them ket qua tool vao memory
            messages.append({
                "role": "user",
                "content": tool_results
            })

        else:
            print(f"Unexpected stop reason: {response.stop_reason}")
            break

    return "Max iterations reached"

Bước 3: Chạy thử

Test agent với câu hỏi cần dùng cả hai tools:

# Test 1: Single tool
result = run_agent("Thoi tiet o Ha Noi hom nay the nao?")

# Test 2: Multi-step reasoning
result = run_agent(
    "Neu nhiet do Ha Noi hom nay la 28 do C, "
    "cong them 15 do thi duoc bao nhieu?"
)

# Test 3: No tool needed
result = run_agent("Claude la gi?")

Output sẽ trông như thế này:

User: Neu nhiet do Ha Noi hom nay la 28 do C, cong them 15 do thi duoc bao nhieu?
--------------------------------------------------
[Tool call] get_weather({'city': 'Ha Noi'})
[Tool result] Thoi tiet tai Ha Noi: 28 do C, co may
[Tool call] calculate({'expression': '28 + 15'})
[Tool result] 43
Agent: Nhiet do Ha Noi hom nay la 28 do C.
       Neu cong them 15 do thi se la 43 do C.

Bước 4: Conversation Memory

Memory trong agent đơn giản chỉ là list messages. Nhưng có một số chiến lược quản lý memory quan trọng:

class AgentWithMemory:
    def __init__(self, system_prompt: str = ""):
        self.client = anthropic.Anthropic()
        self.messages = []
        self.system_prompt = system_prompt

    def chat(self, user_message: str) -> str:
        """Multi-turn: giu nguyen conversation history."""
        self.messages.append({
            "role": "user",
            "content": user_message
        })

        response = self.client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            system=self.system_prompt,
            tools=tools,
            messages=self.messages
        )

        # Xu ly tool calls neu co
        while response.stop_reason == "tool_use":
            self.messages.append({
                "role": "assistant",
                "content": response.content
            })

            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = tool_map.get(block.name, lambda **x: "Unknown tool")(**block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            self.messages.append({
                "role": "user",
                "content": tool_results
            })

            response = self.client.messages.create(
                model="claude-opus-4-5",
                max_tokens=4096,
                system=self.system_prompt,
                tools=tools,
                messages=self.messages
            )

        # Lay final text
        final_text = next(
            (block.text for block in response.content if hasattr(block, "text")),
            ""
        )

        self.messages.append({
            "role": "assistant",
            "content": final_text
        })

        return final_text

    def clear_memory(self):
        """Reset conversation."""
        self.messages = []

Bước 5: Memory Compression — Xử lý hội thoại dài

Vấn đề: khi conversation dài, context window đầy và chi phí tăng. Giải pháp: compress memory.

def compress_memory(messages: list, keep_last_n: int = 6) -> list:
    """
    Giu nguyen system context va n messages gan nhat.
    Summarize phan con lai.
    """
    if len(messages) <= keep_last_n:
        return messages

    # Phan can summarize
    old_messages = messages[:-keep_last_n]
    recent_messages = messages[-keep_last_n:]

    # Tao summary
    summary_response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=500,
        messages=[
            {"role": "user", "content": (
                "Hay tom tat ngan gon nhung diem chinh cua cuoc hoi thoai sau: "
                + json.dumps(old_messages, ensure_ascii=False)
            )}
        ]
    )
    summary = summary_response.content[0].text

    # Them summary vao dau recent messages
    compressed = [
        {"role": "user", "content": f"[Tom tat hoi thoai truoc]: {summary}"},
        {"role": "assistant", "content": "Da hieu context. Toi se tiep tuc hoi thoai."},
        *recent_messages
    ]

    return compressed

Error Handling — Agent production-ready

Agent thực tế cần xử lý lỗi gracefully:

def safe_tool_execute(tool_name: str, tool_input: dict) -> str:
    """Thuc thi tool voi error handling day du."""
    try:
        if tool_name not in tool_map:
            return f"Error: Tool '{tool_name}' khong ton tai"

        result = tool_map[tool_name](**tool_input)
        return str(result)

    except TypeError as e:
        return f"Error: Input khong hop le — {e}"
    except Exception as e:
        return f"Error: {type(e).__name__} — {e}"

Tổng kết: Anatomy của một Agent

Thành phần	Vai trò	Ví dụ
Agent Loop	Điều phối toàn bộ flow	Vòng while cho đến end_turn
Tools	Khả năng tương tác thế giới	get_weather, calculate, search
Memory	Giữ context qua các turns	List messages append
Error Handling	Robustness production	try/except + fallback
Memory Compression	Quản lý context window	Summarize old messages

Pattern này là nền tảng của mọi AI agent hiện đại. Khi bạn hiểu rõ 5 thành phần này, bạn có thể xây dựng bất kỳ agent nào: coding agent, research agent, customer support agent, hay thậm chí multi-agent system.

Bước tiếp theo: Xem Autonomous Coding Agent để thấy pattern này áp dụng vào bài toán thực tế, và Computer Use Demo để xem agent kiểm soát cả màn hình máy tính.

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Xây dựng LLM Agent từ đầu — Reference Implementation

Điểm nổi bật

Agent là gì? Tại sao không dùng framework?

Kiến trúc tổng quan

Bước 1: Định nghĩa Tools

Bước 2: Agent Loop

Bước 3: Chạy thử

Bước 4: Conversation Memory

Bước 5: Memory Compression — Xử lý hội thoại dài

Error Handling — Agent production-ready

Tổng kết: Anatomy của một Agent

Bài viết liên quan

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Tìm kiếm Wikipedia với Claude — Research agent đơn giản

Calculator Tool — Bài học đầu tiên về Tool Use với Claude

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Tin liên quan nên xem

Context Compaction — Tự động nén context cho conversations dài

Orchestrator-Workers — Kiến trúc điều phối agent phức tạp

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Kiểm soát chi phí AI Agent — Token budget, max iterations và monitoring

Xây dựng LLM Agent từ đầu — Reference Implementation

Điểm nổi bật

Agent là gì? Tại sao không dùng framework?

Kiến trúc tổng quan

Bước 1: Định nghĩa Tools

Bước 2: Agent Loop

Bước 3: Chạy thử

Bước 4: Conversation Memory

Bước 5: Memory Compression — Xử lý hội thoại dài

Error Handling — Agent production-ready

Tổng kết: Anatomy của một Agent

Bài viết liên quan

Gợi ý cho bạn

Agent Loop — Nền tảng xây dựng AI Agent với Claude

Tìm kiếm Wikipedia với Claude — Research agent đơn giản

Calculator Tool — Bài học đầu tiên về Tool Use với Claude

Memory Management — Quản lý bộ nhớ dài hạn cho Claude agents

Tin liên quan nên xem

Context Compaction — Tự động nén context cho conversations dài

Orchestrator-Workers — Kiến trúc điều phối agent phức tạp

Human-in-the-Loop — Khi nào AI Agent cần xin phép con người

Kiểm soát chi phí AI Agent — Token budget, max iterations và monitoring

Đăng ký nhận bản tin