Hỏi Claude 1 bài toán phức tạp:
- Giải thích extended thinking và khi nào dùng
- Enable thinking trong API call
- Handle thinking blocks trong response
- Hiểu redacted thinking và signature security
- Đánh giá trade-off cost/latency/quality
Khi nào dùng?
✅ Use cases:
❌ Skip when:
Quy tắc: Try prompt engineering first. If eval score không đủ → enable thinking.
- Toán/logic phức tạp
- Reasoning nhiều bước
- Code generation khó (algorithm mới)
- Decision making có trade-offs
- Eval accuracy chưa đạt sau khi optimize prompt
- Task đơn giản (classify, extract, translate)
- Latency critical (thinking adds 2-10s)
- Cost critical (thinking tokens billed)
Enable trong API
Rules
- thinking_budget min 1024
- max_tokens > thinking_budget (output = thinking + final answer)
- Không compat với: message prefill, temperature != 1.0
def chat(messages, thinking=False, thinking_budget=1024):
params = {
"model": model,
"max_tokens": 8000, # MUST > thinking_budget
"messages": messages,
}
if thinking:
params["thinking"] = {
"type": "enabled",
"budget_tokens": thinking_budget
}
return client.messages.create(**params)Response structure
Response có thêm ThinkingBlock:
Extract
response.content = [
ThinkingBlock(
type="thinking",
thinking="Let me break this down step by step. 12345 × 6789...",
signature="eyJhbGc..." # cryptographic signature
),
TextBlock(
type="text",
text="83,810,205"
)
]Extract
for block in response.content:
if block.type == "thinking":
print(f"Reasoning: {block.thinking}")
elif block.type == "text":
print(f"Answer: {block.text}")Signature — Security
Each thinking block có cryptographic signature. Lý do:
Best practice: Không modify thinking text. Pass nguyên signature khi gửi lại multi-turn.
- Dev có thể tamper thinking text nếu được gửi lại
- Signature ensure integrity
- Ngăn "jailbreak" qua modified reasoning
Redacted thinking
Đôi khi response có:
Xảy ra khi thinking được internal safety system flag. Content thật trong encrypted form.
Handle:
- Pass nguyên trong history (Claude decrypt)
- Không crash — Claude vẫn trả answer
- Don't log/display redacted data
RedactedThinkingBlock(
type="redacted_thinking",
data="encrypted_base64..."
)Multi-turn với thinking
Khi reply với thinking history, giữ nguyên toàn bộ blocks:
# Turn 1 response
response = chat(messages, thinking=True)
messages.append({"role": "assistant", "content": response.content})
# Turn 2
add_user_message(messages, "Explain step 2 more")
response2 = chat(messages, thinking=True)
# Response2 sẽ continue reasoning with Turn 1 contextInteraction với tools
Thinking + tools = agents reasoning:
Claude flow:
Powerful cho complex agents. Tăng accuracy 10-30% trong reasoning tasks.
- Thinking block — "User asks X. I need info Y. Tool Z can give Y."
- ToolUseBlock — request tool
- ... execute, result back
- Thinking block — "Result confirms hypothesis"
- TextBlock — final answer
response = client.messages.create(
model=model,
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 4000},
tools=[tool_schema],
messages=messages
)Ví dụ thực chiến: Code debug
Thinking có thể:
messages = [{"role": "user", "content": """
This function returns wrong result for large inputs. Find bug:
def fib(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
fib(100) returns wrong value.
"""}]
response = chat(messages, thinking=True, thinking_budget=2000)
for block in response.content:
if block.type == "thinking":
print("Reasoning:\n" + block.thinking + "\n")
elif block.type == "text":
print("Answer:\n" + block.text)Ví dụ thực chiến: Code debug (tiếp)
Claude tìm ra bug qua reasoning — skill được trained từ training data.
Let me trace through:
fib(1): a=0, loop once, return a=1 ✓
fib(2): a=0→1→1, return 1 ✓
fib(5): a=0→1→1→2→3→5, return 5 ✓
Actually this seems correct. But user says wrong for large.
fib(100): should be 354224848179261915075
No Python int overflow issue...
Let me re-check the iteration...
Actually wait — it returns 'a' but 'a' was just assigned to old 'b'.
Let me re-trace more carefully...Cost & latency
ROI: Pay 2x cost + latency for +20% accuracy on hard tasks. Worth for critical use cases.
| Without thinking | With thinking | |
|---|---|---|
| Latency | 2s | 5-10s |
| Cost | 1x | 1.5-3x (thinking tokens) |
| Accuracy on hard tasks | 60-70% | 80-90% |
| Accuracy on easy tasks | 95% | 95% (no gain) |
Anti-patterns
❌ Thinking for easy tasks
Classification "positive/negative" với thinking → tốn tokens no benefit.
Fix: Eval first. Thinking only if eval accuracy low.
❌ thinking_budget quá nhỏ
Budget 1024 cho complex task → Claude bị cắt giữa reasoning.
Fix: Budget 2000-8000 cho complex.
❌ max_tokens <= thinking_budget
Fix: max_tokens phải > thinking_budget + reasonable final output.
❌ Modify thinking text
User có thể tamper thinking để jailbreak.
Fix: Never modify. Pass original signature.
max_tokens=1000, thinking_budget=2000 # INVALIDÁp dụng ngay
Bài tập 1: Compare with/without thinking (20 phút)
Test 5 hard problems:
Each: run with + without thinking. Score accuracy.
Bài tập 2: Eval-driven decision (20 phút)
Cho eval dataset bạn đã có (Module 4), run với thinking. Compare score.
Quyết định: worth enabling hay không.
- Complex math
- Multi-step logic
- Algorithm design
- Code debug
- Decision analysis
Tóm tắt
🎯 Extended thinking = "scratchpad" cho complex reasoning tasks.
🎯 Enable: thinking={"type": "enabled", "budget_tokens": 1024+}.
🎯 Response có ThinkingBlock với text + signature security.
🎯 Trade-off: 2-3x cost + latency, +20% accuracy on hard tasks.
🎯 Eval-driven: thử prompt eng first, thinking nếu chưa đủ.