Subagent là gì — Mental model từ context đến kiến trúc — Introduction to Subagents

Ba bài trước dạy bạn làm: tạo (16.1), thiết kế (16.2), quyết định dùng hay không (16.3). Bạn có thể đã tạo code-quality-reviewer, có thể đã apply principle of least privilege, có thể đã refuse tạo sequential pipeline.

Bạn sẽ học được

Giải thích context window isolation — cơ chế cốt lõi khiến subagent là subagent
Mô tả 2 input subagent nhận: custom system prompt + task description từ parent
Kể tên 3 built-in subagent của Claude Code (General purpose, Explore, Plan) và use case từng cái
Hiểu subagent với Claude là một tool đặc biệt — framework tool calling làm giao thức giao tiếp
Đặt subagent trong bối cảnh lớn hơn: multi-agent architecture và future của agentic work

Subagent là gì — định nghĩa kỹ thuật

Subagent là một trợ lý chuyên biệt mà Claude Code có thể delegate task đến.

Cụm từ khóa:

Về bản chất: mỗi subagent chạy trong context window riêng biệt của nó, làm việc, và trả về summary cho main thread. Mọi intermediate work (file reads, searches, tool calls) isolated — không clutter main conversation.

Anatomy: 2 input + 1 output

3 điểm quan trọng từ kiến trúc này:

trợ lý chuyên biệt — có role rõ ràng, không phải general assistant
Claude Code có thể delegate — main agent điều phối, subagent thực thi
task — đơn vị công việc tự chứa (không phải câu chat casual)
Subagent nhận 2 input, trả về 1 output. Input = system prompt (static, từ config) + task description (dynamic, từ parent). Output = summary text.
Mọi thứ giữa input và output bị "đốt". Entire conversation subagent discarded. Parent không thấy journey.
Parent context lean. Thay vì 28 tool calls + file content, parent chỉ thấy 1 summary ~100-200 token.

┌────────────────────────────────────────────────────────────────┐
│                                                                │
│   PARENT (MAIN) THREAD                                         │
│                                                                │
│   User: "Explain how authentication works in this codebase"    │
│                                                                │
│   Main agent decides: dùng research subagent                   │
│                                                                │
│   Main agent writes task description for subagent:             │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ "Investigate authentication flow in this codebase.   │     │
│   │  Find: where JWT is validated, token lifecycle,      │     │
│   │  and test setup. Start from src/auth/ directory."    │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                │
└────────────────────────────┬───────────────────────────────────┘
                             │
                   ┌─────────┴──────────┐
                   │                    │
           INPUT 1 │           INPUT 2  │
           System  │           Task     │
           prompt  │           description│
           từ config│          từ parent │
                   │                    │
                   ▼                    ▼
┌────────────────────────────────────────────────────────────────┐
│                                                                │
│   SUBAGENT CONTEXT WINDOW (fresh, isolated)                    │
│                                                                │
│   system prompt:                                               │
│   "You are a research agent. Read-only. Return structured...   │
│                                                                │
│   task: "Investigate authentication flow..."                   │
│                                                                │
│   [Subagent works autonomously]                                │
│   - Glob src/auth/**/*.ts (finds 12 files)                     │
│   - Read middleware/auth.ts                                    │
│   - Grep for 'jwt.verify'                                      │
│   - Read tests/helpers/auth.ts                                 │
│   - Read services/tokenService.ts                              │
│   - ... (28 tool calls total)                                  │
│                                                                │
│   OUTPUT: structured summary                                   │
│   ┌──────────────────────────────────────────────────────┐     │
│   │ Direct Answer: JWT validation in middleware/auth.ts  │     │
│   │ Evidence: line 42 — jwt.verify(token, PUBLIC_KEY...) │     │
│   │ Related: services/tokenService.ts handles refresh    │     │
│   │ Test setup: tests/helpers/auth.ts mocks JWT          │     │
│   │ Obstacles: types file has legacy "any" for claims    │     │
│   └──────────────────────────────────────────────────────┘     │
│                                                                │
│   [Entire context window then DISCARDED]                       │
│                                                                │
└────────────────────────────┬───────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────┐
│                                                                │
│   BACK TO PARENT (MAIN) THREAD                                 │
│                                                                │
│   Main agent receives: only the structured summary             │
│   Main thread context has NOT seen 28 tool calls               │
│                                                                │
│   Main agent uses summary to answer user:                      │
│   "JWT validation happens in middleware/auth.ts at line 42..." │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Tại sao context window isolation quan trọng

Vấn đề: context window là tài nguyên hữu hạn

Mỗi lần bạn chat với Claude Code, bạn đang "thêm" vào main context window. Mỗi tool call, file read, search result — tất cả stored ở đó.

Context window không vô hạn. Mỗi Claude model có một giới hạn context cụ thể (con số thay đổi theo phiên bản — check documentation hiện tại để biết chính xác). Khi context đầy, Claude bắt đầu lose track phần đầu của conversation — giống như bạn quên email cũ khi inbox đầy.

Vấn đề cụ thể: exploratory work ăn context

Giả sử bạn hỏi: "Service nào handle refund?"

Không có subagent, main agent có thể làm như sau:

Với subagent:

Tỷ lệ tiết kiệm: ~99% main context cho cùng câu trả lời.

Điều này không có nghĩa subagent miễn phí — tổng token cho cả task là gần như nhau (cả 2 case đều phải đọc 15 file). Nhưng main context — cái bạn dùng cho continuous conversation — được bảo vệ.

Tradeoff: mất visibility

Subagent không miễn phí về visibility. Trong version có subagent, main thread không thấy 15 file đã đọc. Nếu bạn muốn hiểu sâu cấu trúc codebase (không chỉ câu trả lời), bạn đã mất đi 15 context đó.

Quy tắc (review từ Bài 16.3): dùng subagent khi bạn cần result, không cần journey.

Main context BEFORE task:                      ~15,000 tokens
[Same as before]

Main context AFTER task:                       ~15,200 tokens
┌──────────────────────────────────────────────┐
│                                              │
│  Previous context...                         │
│                                              │
│  User: "Service nào handle refund?"          │
│  [Subagent call + summary]                   │
│  Assistant: "RefundService handles it...     │
│  ...located in src/billing/RefundService.ts" │
│                                              │
└──────────────────────────────────────────────┘

→ You burned ~200 tokens in main context.
   The 23,000 tokens of exploration stayed in the subagent
   and got discarded.

Built-in subagent của Claude Code

Bạn không cần tạo mọi subagent từ đầu. Claude Code ship sẵn 3 subagent built-in, ready to use:

1. General purpose subagent

Use case: Task đa bước cần cả exploration và action.

Ví dụ:

Khi nào chọn: Task không fit neatly với Explore (đọc thôi) hoặc Plan (research ra plan). Cần mix read + search + edit.

2. Explore subagent

Use case: Fast search và navigation codebase.

Ví dụ:

Khi nào chọn: Bạn cần thông tin nhanh, không cần phân tích sâu. Explore tối ưu cho speed — nó sẽ không cố "giải thích" code, chỉ return locate và summary ngắn.

3. Plan subagent

Use case: Research và analysis codebase trước khi trình bày plan implement.

Ví dụ:

Khi nào chọn: Plan mode của Claude Code. Khi bạn muốn Claude nghĩ trước khi code, Plan subagent làm việc nặng của research phase.

So sánh 3 built-in

"Refactor all uses of the legacy fetch helper to the new http.request helper across the codebase"
"Find all places where we log user PII and fix them to use the redactor"
"Set up a new Prisma model for Invoice including migration and seed data"
"How many places use the calculateTax function?"
"What's the import structure around the auth module?"
"Find all TODO comments in the billing directory"
"I want to add feature flags. Research current state and propose a plan."
"We need to migrate from SQLite to Postgres. Analyze blast radius and steps."
"Propose a refactor of the checkout module to reduce coupling."

Built-in	Đọc code?	Edit code?	Tốc độ	Output style
General purpose	✅	✅	Trung bình	Mixed — findings + actions taken
Explore	✅	❌	⚡ Nhanh nhất	Fact-forward — file:line, counts, lists
Plan	✅	❌	Chậm nhất	Structured plan với step, risk, effort

Custom subagent — khi built-in không đủ

Built-in cover phần lớn use case phổ biến. Bạn cần tạo custom khi có thêm các yêu cầu sau:

Đây là lúc bạn tạo custom (Bài 16.1) và thiết kế tốt (Bài 16.2).

Custom system prompt mà built-in không có (copywriting voice, legal clause criteria, security audit checklist)
Encoded convention của team (coding standard, review rubric) để mọi thành viên dùng nhất quán
Specific workflow lặp lại đủ nhiều lần để justify đầu tư tạo config

Subagent dưới góc độ giao thức: Subagent là một tool

Góc nhìn chuyên sâu: với Claude, subagent nhìn như một tool.

Chính xác hơn: framework tool calling của Claude là giao thức để main agent giao tiếp với subagent. Main agent "call" subagent giống như call bất kỳ tool nào — pass input, nhận output.

Theo chia sẻ từ Anthropic research team (xem podcast "Building the future of agents with Claude", 2025-10-02), một hướng training focus là làm Claude trở thành manager tốt cho subagent của nó — biết viết task description rõ ràng, cung cấp đủ context, và expect đúng output. Một quan sát thú vị: Claude thường mắc y hệt lỗi first-time manager thường mắc — cho instruction không đủ, assume subagent sẽ "tự hiểu" context, ngạc nhiên khi output không như mong.

Training theo thời gian cải thiện khía cạnh này — Claude học viết instruction verbose hơn, chia sẻ context broader hơn cho subagent.

Hệ quả thực tế

Vì subagent = tool với Claude, bạn có thể:

Parallel subagent calls — giống call nhiều tool cùng lúc. Main agent có thể spawn 5 research subagent song song, gọi aggregation sau.
Subagent nested inside subagent (với giới hạn) — về lý thuyết, một subagent có thể call subagent khác. Thực tế Anthropic hạn chế độ sâu để tránh runaway nesting.
Tool bucketing — customer có 100-200 MCP tool thường split thành nhóm, mỗi nhóm assign cho 1 subagent. Main agent chỉ biết "dùng subagent A cho bucket tool finance, subagent B cho bucket tool legal".

┌────────────────────────────────────────────────────────────────┐
│                                                                │
│   Available tools (từ góc nhìn main agent):                    │
│                                                                │
│     ┌──── Traditional tools ──────────┐                        │
│     │  Bash(command) → output          │                       │
│     │  Read(path) → content            │                       │
│     │  Edit(path, old, new) → success  │                       │
│     │  Grep(pattern) → matches         │                       │
│     └──────────────────────────────────┘                       │
│                                                                │
│     ┌──── Subagent-as-tool ───────────┐                        │
│     │  Agent(code-quality-reviewer,    │                       │
│     │        task_description)         │                       │
│     │     → structured summary         │                       │
│     │                                   │                       │
│     │  Agent(auth-researcher,          │                       │
│     │        task_description)         │                       │
│     │     → findings                    │                       │
│     └──────────────────────────────────┘                       │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Multi-agent — nơi subagent thuộc về

Subagent không đứng một mình. Nó là building block của multi-agent architecture — pattern lớn hơn đang định hình agentic system.

Workflow vs multi-agent

Theo Anthropic định nghĩa:

Workflow of agents:

Multi-agent:

Multi-agent là model mà Anthropic Deep Research dùng: main orchestrator decide → spawn N subagent parallel để search → aggregate results.

3 pattern multi-agent phổ biến

1. Parallelization:

      ┌─── Subagent A ───┐
      │                  │
Main ─┼─── Subagent B ───┼─── Main collects results
      │                  │
      └─── Subagent C ───┘
      (parallel, orchestrator-worker)

Agent 1 finishes → Agent 2 starts → Agent 3 starts
(sequential, state handoff)

3 pattern multi-agent phổ biến

Kết quả: thời gian total = thời gian subagent chậm nhất (thay vì tổng).

2. MapReduce:

Task: audit 100 contract
    ↓
Spawn 100 subagent (mỗi cái 1 contract)
    ↓
Wait for all
    ↓
Aggregate findings

Multi-agent — nơi subagent thuộc về (tiếp)

Use case: process large document, analyze large dataset chia nhỏ.

3. Test-time compute:

Map phase: spawn N subagent, mỗi cái process 1 chunk
     ↓
Reduce phase: main agent (hoặc 1 aggregator subagent)
               consolidate outputs

Multi-agent — nơi subagent thuộc về (tiếp)

Use case: hard reasoning task. Nhiều agent nghĩ độc lập → better final answer (giống "nhóm người suy nghĩ cùng vấn đề thường ra quyết định tốt hơn 1 người").

Spawn N subagent cùng task, khác approach
     ↓
Main agent compare N outputs
     ↓
Pick best (or merge)

Ví dụ theo ngành (context isolation trong thực tế)

🛠️ Engineering — COBOL modernization (Anthropic customer case)

Tình huống: Company cần document legacy COBOL codebase (~100 file) để modernize. Developer cũ đã nghỉ, nhân sự COBOL hiếm.

Solution:

Kết quả: Theo demo Anthropic, codebase được document thành công nhờ parallel subagent với context isolation — approach sequential trong main thread sẽ context overflow rất nhanh với 94 file COBOL source.

📊 Data — Parallel ETL analysis

Tình huống: Data team cần review 50 Airflow DAG cho deprecation risk (API sắp sunset).

Solution:

Kết quả minh hoạ: 50 DAG được analyze song song, tổng thời gian ≈ thời gian 1 DAG chậm nhất. Context main thread clean, sẵn sàng cho việc planning migration.

📣 Marketing — Multi-channel content (MapReduce)

Tình huống: Marketing có 1 blog post master, cần tạo 5 format: LinkedIn, Twitter thread, email, slide deck, Instagram carousel.

Solution (MapReduce pattern):

Kết quả minh hoạ: 5 format được generate song song thay vì tuần tự. Consistency message giữa 5 channel giữ nguyên, nhưng style adapted cho từng kênh (LinkedIn long-form, Twitter ngắn, v.v.).

⚖️ Legal — Deep research multi-agent

Tình huống: Legal team investigate precedent cases cho lawsuit. Cần search 5 database (Westlaw, LexisNexis, court records, academic...).

Solution:

Kết quả minh hoạ: Research rút ngắn đáng kể do 5 database được search song song thay vì tuần tự. Cross-database coverage thấu đáo hơn so với approach "1 researcher kiểm tra lần lượt" — vì mỗi subagent tập trung hoàn toàn vào source của nó.

⚙️ DevOps — Tool bucketing

Tình huống: Platform team có 180 MCP tool (infrastructure, monitoring, CI/CD, cloud provider X/Y/Z). Main agent nhầm tool nhau.

Solution:

Kết quả: Tool call accuracy tăng. Main agent context lean hơn (không phải load 180 tool description).

Tạo custom subagent cobol-doc-expert
Main agent coordinate, spawn nhiều subagent parallel — mỗi cái document 1 file COBOL
Context của mỗi subagent isolated → không pollute main thread với 94 file COBOL source code
Main agent tracking via TodoList để ensure no file skipped, no duplicate
50 subagent dag-analyzer parallel, mỗi cái 1 DAG
Output format: {dag_name, uses_deprecated_api: bool, affected_lines, migration_effort}
Main thread aggregate → spreadsheet ranked by urgency
5 subagent custom, mỗi cái có voice config riêng cho channel
Cùng nhận master blog post làm input
Output: 5 format độc lập
Main thread collect + final polish
Orchestrator subagent legal-research-lead
Spawn 5 specialized subagent (1 per database), mỗi cái có MCP tool connect to that database
Output từ mỗi subagent: top 10 relevant cases + summary
Orchestrator synthesize cross-database findings
Split 180 tool thành 6 bucket (infra-aws, infra-gcp, monitoring, ci, networking, security)
Tạo 6 subagent, mỗi cái có access 1 bucket (~30 tool)
Main agent chỉ biết "6 bucket" level, delegate task đúng bucket

Anti-patterns — Khi mental model sai lệch

❌ "Subagent là một persona khác của Claude"

Hiện tượng: Người mới nghĩ subagent như roleplay — "you are a different Claude with different personality".

Tại sao sai: Subagent vẫn là Claude (cùng model weights, cùng capability). Điểm khác biệt duy nhất: isolated context window + custom system prompt. Không có "expertise" thêm. Không có "personality" thêm.

Cách đúng: Subagent = same Claude, different starting context. Value đến từ context isolation và custom prompt, không từ "persona".

❌ "Càng nhiều subagent càng tốt"

Hiện tượng: Team tạo 20 subagent cho mọi task nhỏ.

Tại sao sai: Mỗi subagent mô tả được đưa vào main system prompt. 20 subagent = main agent confused biết gọi cái nào. Overhead nhiều hơn value.

Cách đúng: 3-7 subagent well-designed > 20 subagent overlap. Thỉnh thoảng audit, merge overlapping, delete unused.

❌ "Subagent thay thế main thread"

Hiện tượng: Mọi task dồn vào subagent, main thread chỉ là dispatcher.

Tại sao sai: Main thread là thinker/orchestrator. Nếu mọi thinking ở subagent → bạn mất khả năng điều chỉnh based on intermediate findings (vì không thấy intermediate).

Cách đúng: Main thread làm việc interactive + orchestration. Subagent làm việc tách rời có boundary rõ (research, review, audit).

❌ "Subagent parallel luôn nhanh hơn sequential"

Hiện tượng: Spawn 50 subagent cho task có sequential dependency.

Tại sao sai: Parallel chỉ win khi task thực sự độc lập. Dependency → mỗi subagent bị block đợi cái khác → tổng thời gian không giảm, có khi tăng do overhead spawn/aggregate.

Cách đúng: Parallel cho map-reduce, sequential (main thread) cho dependency chain.

❌ "Subagent nested vô hạn"

Hiện tượng: Subagent A spawn subagent B spawn subagent C...

Tại sao sai: Depth tăng → information loss qua mỗi handoff tăng. Parent A không biết gì về C. Debug thành nightmare.

Cách đúng: Giới hạn 2 cấp (main → subagent). Nếu cần 3+ cấp, redesign architecture — thường có cách gộp subagent gần root.

So sánh mental model: Before và after bài này

Câu hỏi	Trước bài này	Sau bài này
Subagent là gì?	"Một Claude khác chạy task giúp"	"Same Claude, isolated context, 2 input 1 output, discarded sau khi xong"
Tại sao context isolation?	"Cho clean main"	"Main context là tài nguyên hữu hạn; subagent protect nó khi exploration nặng"
Subagent vs tool?	"2 khái niệm khác"	"Subagent là tool đặc biệt — framework tool calling là giao thức"
Khi parallel?	"Càng nhiều càng nhanh"	"Map-reduce với task độc lập win; sequential dependency vẫn giữ main thread"
Built-in subagent?	"Không biết có"	"General purpose, Explore, Plan — dùng trước khi tạo custom"
Multi-agent là gì?	"Buzzword"	"Pattern lớn hơn bao gồm parallelization, MapReduce, test-time compute — subagent là building block"

Áp dụng ngay

Bài tập 1: Vẽ mental model của bạn (~10 phút)

Không nhìn lại bài này, tự vẽ ASCII diagram mô tả:

Bước 2: So sánh với diagram của tôi (scroll lên). Điểm nào bạn miss? Điểm nào bạn thêm được?

Bài tập 2: Phân loại subagent của bạn (~10 phút)

List mọi subagent đã tạo (nếu có, hoặc tưởng tượng), phân loại:

Bước 2: Subagent nào không fit 3 column đầu → có justify tồn tại không?

Bài tập 3 (thinking): Thiết kế multi-agent cho 1 use case (~15 phút)

Chọn 1 scenario công việc thật:

A. Review 30 pull request trước weekly release. B. Analyze 200 customer feedback ticket để extract theme. C. Migration 50 service từ Python 3.8 → 3.12.

Thiết kế:

Viết ra 5-7 dòng description.

Main thread ←→ Subagent giao tiếp thế nào
Context isolation xảy ra ở đâu
2 input + 1 output của subagent
Main thread làm gì (orchestrate/plan/interactive)
Bao nhiêu subagent (parallel? sequential?)
Mỗi subagent: role, input, output format
Anti-pattern bạn sẽ tránh

Subagent	Research	Review	Custom prompt	Multi-agent role
code-quality-reviewer	☐	☑	☐	Worker
[agent 2]	☐	☐	☐
[agent 3]	☐	☐	☐

Mẹo nâng cao

💡 Mẹo 1: Thinking theo layer

Khi design hệ thống Claude Code cho team:

Design từ trên xuống: main làm gì → task nào offload subagent → subagent dùng tool gì.

💡 Mẹo 2: Context budget thinking

Xem context window như budget:

💡 Mẹo 3: Theo dõi Anthropic research

Multi-agent là main area of research hiện tại tại Anthropic. Pattern/best practice sẽ evolve. Subscribe blog Anthropic Research để cập nhật — training Claude tốt hơn cho multi-agent đang active.

💡 Mẹo 4: Subagent là một dạng test-time compute

Khi bạn spawn nhiều subagent cùng task với approach khác → đây là test-time compute — dùng thêm compute vào inference để ra answer tốt hơn. Technique này hiệu quả cho hard reasoning, không phải simple task.

Layer 1: Main thread (orchestrate, interactive)
Layer 2: Specialized subagent (research, review, custom)
Layer 3: Built-in tool (Read, Grep, Bash)
Task nặng exploration (> 5000 token anticipated) → candidate delegation
Task ngắn, interactive → main thread
Task trung bình → consider: có cần preserve context cho turn sau không?

Tóm tắt bài học

🎯 Subagent = same Claude trong context window isolated + custom system prompt. Không phải persona, không phải "expert mode" — là một cơ chế tách biệt context.

🎯 Context window là tài nguyên hữu hạn. Subagent là công cụ protect main context khi exploration nặng — tradeoff là mất visibility journey.

🎯 Subagent nhận 2 input, trả 1 output: system prompt từ config + task description từ parent → structured summary. Conversation subagent discarded sau khi xong.

🎯 3 built-in subagent của Claude Code: General purpose (mixed task), Explore (fast search), Plan (research + plan). Dùng built-in trước, custom sau.

🎯 Subagent là building block của multi-agent architecture — pattern bao gồm parallelization, MapReduce, test-time compute. Là main area of Anthropic research hiện tại. Future của agentic work nằm ở đây.

Tài liệu tham khảo

Anthropic Podcast: "Building the future of agents with Claude" (2025-10-02) — thảo luận về multi-agent architecture, orchestrator-subagent pattern, training Claude thành manager tốt cho subagent
Anthropic: "Building more effective AI agents" (2025-10-17) — design principles cho agentic system
Claude Code docs: Sub-agents overview
Nội bộ Bài 16.1, 16.2, 16.3

Nội dung này có hữu ích không?