Bài tập — Thiết kế 3 system prompt production-ready — Building with the Claude API

Sai lầm của dev mới học: viết system prompt 1 lần, test 1 case, ship. Kết quả: 20% use case lỗi.

Bạn sẽ học được

Áp dụng template 5-block để viết system prompt cho use case thật
So sánh output giữa các biến thể system prompt để chọn tối ưu
Nhận diện khi nào system prompt cần iteration thêm
Tạo thư viện system prompt tái sử dụng

Đề bài

Viết 3 system prompt cho 3 scenario sau. Với mỗi cái:

Draft v1 theo template 5-block (Role, Objective, Constraints, Output, Examples)
Test với 5-10 câu hỏi mẫu (bao gồm edge case)
Iterate v2 dựa trên output sai
Báo cáo kết quả (bao nhiêu pass/fail)

Scenario 1: Email summarizer

Context

Team support nhận 50 email/ngày, muốn tóm tắt nhanh trước khi đọc chi tiết.

Input format

Output yêu cầu

From: [tên khách]
Subject: [tiêu đề]
Body: [nội dung email, 100-500 từ]

Output yêu cầu

Test cases (dùng để validate system prompt)

- Priority: High/Medium/Low
- Category: Bug/Feature Request/Billing/Other
- Summary (< 50 từ)
- Action needed: Yes/No

Test cases (dùng để validate system prompt)

Gợi ý v1

Case 1 (Bug, High priority):
From: John Smith
Subject: App crash on payment
Body: "Hi team, the app crashed twice when I tried to pay. 
I lost my cart. Really frustrating. Please fix urgently."

Case 2 (Feature request, Low):
From: Sarah
Subject: Dark mode?
Body: "Hey, just wondering if you have plans for dark mode? 
Would be nice to have."

Case 3 (Billing, Medium):
From: Acme Inc
Subject: Invoice question
Body: "We received invoice #4523 but the amount seems wrong. 
Should be $1200 not $1500. Can you check?"

Case 4 (Ambiguous):
From: Mark
Subject: Hello
Body: "Can you call me back? My number is 555-0123."

Case 5 (Spam/Other):
From: casino@spam.ru
Subject: You won $1000000
Body: "Click here to claim your prize..."

Gợi ý v1

Chạy thử

system_v1 = """Bạn là email classifier.

# Role
Phân loại email support.

# Output JSON:
{
  "priority": "High|Medium|Low",
  "category": "Bug|Feature|Billing|Other",
  "summary": "< 50 words",
  "action_needed": true/false
}
"""

Chạy thử

Điểm cần kiểm tra

Case 4 (Mark "hello"): system v1 sẽ confuse. Bạn cần:

Case 5 (spam): có thể nhầm là "Other" thay vì flag spam. Cần thêm category.

Iteration v2

Thêm:

Định nghĩa rõ priority khi no action
Handle ambiguous case
Spam category
Default Low priority khi không clear
Example ambiguous case

def classify_email(email_text: str):
    msg = client.messages.create(
        model=model,
        max_tokens=300,
        messages=[{"role": "user", "content": email_text}],
        system=system_v1,
    )
    return msg.content[0].text

for case in test_cases:
    result = classify_email(case)
    print(result)
    print("-" * 40)

Iteration v2

Self-review

[ ] 5/5 case pass?
[ ] JSON format parse được bằng json.loads()?
[ ] Output consistent qua 3 run liên tiếp cùng input?

system_v2 = """Bạn là email classifier cho customer support team.

# Output
JSON strict:
{
  "priority": "High|Medium|Low",
  "category": "Bug|Feature|Billing|Spam|Other",
  "summary": "< 50 words, neutral tone",
  "action_needed": true/false
}

# Priority rules
- High: system down, data loss, billing error > $100, urgent customer language
- Medium: feature request, clarification, refund < $100
- Low: casual question, compliment, no clear request

# Category rules
- Bug: error, crash, broken feature
- Feature: "would be nice", "wish you had", "any plans to add"
- Billing: invoice, payment, refund
- Spam: promotional, suspicious links, unknown sender domain
- Other: greetings with no context, misc

# Examples
Input: "Hi, please call me back, number 555-0123"
Output: {"priority":"Low","category":"Other","summary":"User requests callback, no context","action_needed":true}

Input: "System down! Can't login!"
Output: {"priority":"High","category":"Bug","summary":"Login outage reported","action_needed":true}
"""

Scenario 2: Interview question generator

Context

Hiring manager cần list câu hỏi phỏng vấn theo role và level.

Input

Output

Role: [Software Engineer / Product Manager / Sales]
Level: [Junior / Senior / Staff]
Focus areas: [comma-separated]

Output

Test cases

Gợi ý approach

System prompt cần:

Đây là bài self-directed — bạn viết v1, test, iterate. Không có solution key.

SWE Junior, focus: Python, React
PM Senior, focus: B2B, analytics
Sales Staff, focus: Enterprise, new market
(edge case) Role không tồn tại: "Ninja"
(edge case) Level không rõ: "Medium"
Define rõ structure output
Adapt depth theo level (Junior ≠ Staff)
Handle edge case (unknown role → yêu cầu clarify)

- 5 câu behavioral
- 3 câu technical/domain
- 2 câu assess culture fit
- Gợi ý follow-up cho mỗi câu

Scenario 3: Product description writer

Context

E-commerce, generate product description cho new SKU. Style phải khớp brand voice.

Input

Output

Product: [name]
Category: [clothing / electronics / home]
Key features: [bullet list]
Target audience: [who]
Brand voice: [friendly / luxurious / bold]

Output

Test cases

Tự làm từ đầu

Viết v1 → 5 test → iterate. Pay attention:

T-shirt organic cotton, friendly voice
Luxury watch, luxurious voice
Power drill, bold voice
(ambiguous) Skincare cream, minimal info
(edge) Product name chứa competitor brand
Brand voice matter (adjectives, sentence length)
SEO — key features nên lặp lại ở bullets
Edge case 5: system phải refuse hoặc rewrite không dùng competitor

- Headline (< 10 từ)
- Short description (2-3 câu)
- Bullet list 5 key benefits
- CTA line

Pattern chung: Iteration workflow

Đây là manual eval loop. Ở Module 4 bạn sẽ học cách automate nó qua test framework + graders.

┌─────────────────────────────────────────────────┐
│                                                 │
│   1. Viết v1 theo template                      │
│          │                                      │
│          ▼                                      │
│   2. Tạo test dataset                           │
│      (5-10 case, 20% edge)                      │
│          │                                      │
│          ▼                                      │
│   3. Run tất cả case                            │
│          │                                      │
│          ▼                                      │
│   4. Đếm pass/fail                              │
│      (nếu < 70% → must iterate)                 │
│          │                                      │
│          ▼                                      │
│   5. Analyze failures                           │
│      - Pattern nào fail?                        │
│      - Thiếu rule? Edge case? Format?           │
│          │                                      │
│          ▼                                      │
│   6. Viết v2 fix specific failures              │
│          │                                      │
│          ▼                                      │
│   7. Re-run → goto 4                            │
│                                                 │
│   Stop khi: > 90% pass + 3 lần run stable       │
│                                                 │
└─────────────────────────────────────────────────┘

Thư viện system prompt tái sử dụng

Sau khi hoàn thành 3 scenario, lưu vào file prompts.py:

Pattern production: Prompt như code — version control, review, test.

# prompts.py

EMAIL_CLASSIFIER = """Bạn là email classifier cho customer support...
..."""

INTERVIEW_GENERATOR = """Bạn là hiring manager...
..."""

PRODUCT_DESCRIPTION = """Bạn là copywriter e-commerce...
..."""

# Dùng:
# from prompts import EMAIL_CLASSIFIER
# chat(messages, system=EMAIL_CLASSIFIER)

Checklist submit bài tập

Với mỗi 3 scenario, document:

Scenario X

Không pass nếu: không có iteration, không có analysis, chỉ v1.

[ ] v1 prompt (paste full text)
[ ] Test cases (10 cases)
[ ] v1 results: X/10 pass
[ ] Failures analysis (which rules missed?)
[ ] v2 prompt (paste)
[ ] v2 results: Y/10 pass
[ ] (optional) v3
[ ] Final: Z/10 pass + stable

Tips để prompt work tốt hơn

Tip 1: Start simple, add constraints

v1 cực ngắn. Chỉ thêm rule khi thấy output sai.

Over-engineered v1 thường performance tệ hơn simple v1.

Tip 2: Put examples đúng format

Nếu bạn muốn JSON output:

Claude học từ example format.

Tip 3: Đặt constraint dương > phủ định

# Examples
Input: "..."
Output: {"key": "value"}  ← JSON thật, không phải text

Tip 3: Đặt constraint dương > phủ định

Constraint dương dễ follow hơn.

Tip 4: Cho "escape hatch"

❌ "Đừng dùng ngôn ngữ phức tạp"
✅ "Dùng ngôn ngữ đơn giản, câu < 15 từ"

Tip 4: Cho "escape hatch"

Thay vì force Claude phải pick, cho Claude option "tôi không biết".

Tip 5: Test same input 3 lần

Với temperature=1 (default), cùng input có thể khác output. Nếu output quá khác biệt qua 3 lần → prompt chưa đủ deterministic, cần rule cụ thể hơn.

Hoặc: set temperature=0 cho tác vụ cần determinism (bài 6.12).

"Nếu không chắc category → output 'Other' với confidence < 0.5"

Mẹo nâng cao

Mẹo 1: Chain-of-thought trong system prompt

Cải thiện accuracy 10-30% cho tác vụ reasoning.

Mẹo 2: Negative examples

system = """
Trước khi output final, hãy:
1. Phân tích input (output <thinking>...</thinking>)
2. Apply rules
3. Format output theo spec

Output FINAL luôn sau </thinking>.
"""

Mẹo 2: Negative examples

Claude học từ cả ví dụ âm lẫn dương.

Mẹo 3: XML section delimiter

# Examples
Good: "Summary: Bug in checkout flow"
Bad: "Well, the user is saying..."  ← verbose, nên tránh

Mẹo 3: XML section delimiter

Claude đặc biệt "quen" với XML tags (training data dùng XML). Cấu trúc rõ → output ổn định hơn. Chi tiết ở bài 6.19.

system = """<role>...</role>
<constraints>...</constraints>
<examples>
<example>
<input>...</input>
<output>...</output>
</example>
</examples>"""

Tóm tắt bài học

🎯 Viết system prompt = iteration, không phải 1 lần. v1 → test → v2 → test. Stop khi > 90% stable.

🎯 Template 5-block là starting point tốt: Role, Objective, Constraints, Output, Examples.

🎯 Edge case phải có trong test dataset — 20% test phải là unusual input.

🎯 Lưu prompt như code — prompts.py, version control, review.

🎯 Prompt dương > phủ định + escape hatch + test 3 lần cùng input.

Tài liệu tham khảo

Anthropic prompt library — 40+ prompt production-grade
Prompt engineering overview

Nội dung này có hữu ích không?