Extended thinking — Giấy nháp cho Claude — Building with the Claude API

Hỏi Claude 1 bài toán phức tạp:

Bạn sẽ học được

Giải thích extended thinking và khi nào dùng
Enable thinking trong API call
Handle thinking blocks trong response
Hiểu redacted thinking và signature security
Đánh giá trade-off cost/latency/quality

Khi nào dùng?

✅ Use cases:

❌ Skip when:

Quy tắc: Try prompt engineering first. If eval score không đủ → enable thinking.

Toán/logic phức tạp
Reasoning nhiều bước
Code generation khó (algorithm mới)
Decision making có trade-offs
Eval accuracy chưa đạt sau khi optimize prompt
Task đơn giản (classify, extract, translate)
Latency critical (thinking adds 2-10s)
Cost critical (thinking tokens billed)

Enable trong API

Rules

thinking_budget min 1024
max_tokens > thinking_budget (output = thinking + final answer)
Không compat với: message prefill, temperature != 1.0

def chat(messages, thinking=False, thinking_budget=1024):
    params = {
        "model": model,
        "max_tokens": 8000,  # MUST > thinking_budget
        "messages": messages,
    }
    
    if thinking:
        params["thinking"] = {
            "type": "enabled",
            "budget_tokens": thinking_budget
        }
    
    return client.messages.create(**params)

Response structure

Response có thêm ThinkingBlock:

Extract

response.content = [
    ThinkingBlock(
        type="thinking",
        thinking="Let me break this down step by step. 12345 × 6789...",
        signature="eyJhbGc..."  # cryptographic signature
    ),
    TextBlock(
        type="text",
        text="83,810,205"
    )
]

Extract

for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Signature — Security

Each thinking block có cryptographic signature. Lý do:

Best practice: Không modify thinking text. Pass nguyên signature khi gửi lại multi-turn.

Dev có thể tamper thinking text nếu được gửi lại
Signature ensure integrity
Ngăn "jailbreak" qua modified reasoning

Redacted thinking

Đôi khi response có:

Xảy ra khi thinking được internal safety system flag. Content thật trong encrypted form.

Handle:

Pass nguyên trong history (Claude decrypt)
Không crash — Claude vẫn trả answer
Don't log/display redacted data

RedactedThinkingBlock(
    type="redacted_thinking",
    data="encrypted_base64..."
)

Multi-turn với thinking

Khi reply với thinking history, giữ nguyên toàn bộ blocks:

# Turn 1 response
response = chat(messages, thinking=True)
messages.append({"role": "assistant", "content": response.content})

# Turn 2
add_user_message(messages, "Explain step 2 more")
response2 = chat(messages, thinking=True)
# Response2 sẽ continue reasoning with Turn 1 context

Interaction với tools

Thinking + tools = agents reasoning:

Claude flow:

Powerful cho complex agents. Tăng accuracy 10-30% trong reasoning tasks.

Thinking block — "User asks X. I need info Y. Tool Z can give Y."
ToolUseBlock — request tool
... execute, result back
Thinking block — "Result confirms hypothesis"
TextBlock — final answer

response = client.messages.create(
    model=model,
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 4000},
    tools=[tool_schema],
    messages=messages
)

Ví dụ thực chiến: Code debug

Thinking có thể:

messages = [{"role": "user", "content": """
This function returns wrong result for large inputs. Find bug:

def fib(n):
    a, b = 0, 1
    for i in range(n):
        a, b = b, a + b
    return a

fib(100) returns wrong value.
"""}]

response = chat(messages, thinking=True, thinking_budget=2000)

for block in response.content:
    if block.type == "thinking":
        print("Reasoning:\n" + block.thinking + "\n")
    elif block.type == "text":
        print("Answer:\n" + block.text)

Ví dụ thực chiến: Code debug (tiếp)

Claude tìm ra bug qua reasoning — skill được trained từ training data.

Let me trace through:
fib(1): a=0, loop once, return a=1 ✓
fib(2): a=0→1→1, return 1 ✓
fib(5): a=0→1→1→2→3→5, return 5 ✓
Actually this seems correct. But user says wrong for large.
fib(100): should be 354224848179261915075
No Python int overflow issue...
Let me re-check the iteration...
Actually wait — it returns 'a' but 'a' was just assigned to old 'b'.
Let me re-trace more carefully...

Cost & latency

ROI: Pay 2x cost + latency for +20% accuracy on hard tasks. Worth for critical use cases.

	Without thinking	With thinking
Latency	2s	5-10s
Cost	1x	1.5-3x (thinking tokens)
Accuracy on hard tasks	60-70%	80-90%
Accuracy on easy tasks	95%	95% (no gain)

Anti-patterns

❌ Thinking for easy tasks

Classification "positive/negative" với thinking → tốn tokens no benefit.

Fix: Eval first. Thinking only if eval accuracy low.

❌ thinking_budget quá nhỏ

Budget 1024 cho complex task → Claude bị cắt giữa reasoning.

Fix: Budget 2000-8000 cho complex.

❌ max_tokens <= thinking_budget

Fix: max_tokens phải > thinking_budget + reasonable final output.

❌ Modify thinking text

User có thể tamper thinking để jailbreak.

Fix: Never modify. Pass original signature.

max_tokens=1000, thinking_budget=2000  # INVALID

Áp dụng ngay

Bài tập 1: Compare with/without thinking (20 phút)

Test 5 hard problems:

Each: run with + without thinking. Score accuracy.

Bài tập 2: Eval-driven decision (20 phút)

Cho eval dataset bạn đã có (Module 4), run với thinking. Compare score.

Quyết định: worth enabling hay không.

Complex math
Multi-step logic
Algorithm design
Code debug
Decision analysis

Tóm tắt

🎯 Extended thinking = "scratchpad" cho complex reasoning tasks.

🎯 Enable: thinking={"type": "enabled", "budget_tokens": 1024+}.

🎯 Response có ThinkingBlock với text + signature security.

🎯 Trade-off: 2-3x cost + latency, +20% accuracy on hard tasks.

🎯 Eval-driven: thử prompt eng first, thinking nếu chưa đủ.

Nội dung này có hữu ích không?