Extended thinking — Giấy nháp cho Claude

6 — Tính năng nâng caoTrung cấp20 phút

Hỏi Claude 1 bài toán phức tạp:

Bạn sẽ học được
  • Giải thích extended thinking và khi nào dùng
  • Enable thinking trong API call
  • Handle thinking blocks trong response
  • Hiểu redacted thinking và signature security
  • Đánh giá trade-off cost/latency/quality

Khi nào dùng?

✅ Use cases:

❌ Skip when:

Quy tắc: Try prompt engineering first. If eval score không đủ → enable thinking.

  • Toán/logic phức tạp
  • Reasoning nhiều bước
  • Code generation khó (algorithm mới)
  • Decision making có trade-offs
  • Eval accuracy chưa đạt sau khi optimize prompt
  • Task đơn giản (classify, extract, translate)
  • Latency critical (thinking adds 2-10s)
  • Cost critical (thinking tokens billed)

Enable trong API

Rules

  • thinking_budget min 1024
  • max_tokens > thinking_budget (output = thinking + final answer)
  • Không compat với: message prefill, temperature != 1.0
def chat(messages, thinking=False, thinking_budget=1024):
    params = {
        "model": model,
        "max_tokens": 8000,  # MUST > thinking_budget
        "messages": messages,
    }
    
    if thinking:
        params["thinking"] = {
            "type": "enabled",
            "budget_tokens": thinking_budget
        }
    
    return client.messages.create(**params)

Response structure

Response có thêm ThinkingBlock:

Extract

response.content = [
    ThinkingBlock(
        type="thinking",
        thinking="Let me break this down step by step. 12345 × 6789...",
        signature="eyJhbGc..."  # cryptographic signature
    ),
    TextBlock(
        type="text",
        text="83,810,205"
    )
]

Extract

for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Signature — Security

Each thinking block có cryptographic signature. Lý do:

Best practice: Không modify thinking text. Pass nguyên signature khi gửi lại multi-turn.

  • Dev có thể tamper thinking text nếu được gửi lại
  • Signature ensure integrity
  • Ngăn "jailbreak" qua modified reasoning

Redacted thinking

Đôi khi response có:

Xảy ra khi thinking được internal safety system flag. Content thật trong encrypted form.

Handle:

  • Pass nguyên trong history (Claude decrypt)
  • Không crash — Claude vẫn trả answer
  • Don't log/display redacted data
RedactedThinkingBlock(
    type="redacted_thinking",
    data="encrypted_base64..."
)

Multi-turn với thinking

Khi reply với thinking history, giữ nguyên toàn bộ blocks:

# Turn 1 response
response = chat(messages, thinking=True)
messages.append({"role": "assistant", "content": response.content})

# Turn 2
add_user_message(messages, "Explain step 2 more")
response2 = chat(messages, thinking=True)
# Response2 sẽ continue reasoning with Turn 1 context

Interaction với tools

Thinking + tools = agents reasoning:

Claude flow:

Powerful cho complex agents. Tăng accuracy 10-30% trong reasoning tasks.

  • Thinking block — "User asks X. I need info Y. Tool Z can give Y."
  • ToolUseBlock — request tool
  • ... execute, result back
  • Thinking block — "Result confirms hypothesis"
  • TextBlock — final answer
response = client.messages.create(
    model=model,
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 4000},
    tools=[tool_schema],
    messages=messages
)

Ví dụ thực chiến: Code debug

Thinking có thể:

messages = [{"role": "user", "content": """
This function returns wrong result for large inputs. Find bug:

def fib(n):
    a, b = 0, 1
    for i in range(n):
        a, b = b, a + b
    return a

fib(100) returns wrong value.
"""}]

response = chat(messages, thinking=True, thinking_budget=2000)

for block in response.content:
    if block.type == "thinking":
        print("Reasoning:\n" + block.thinking + "\n")
    elif block.type == "text":
        print("Answer:\n" + block.text)

Ví dụ thực chiến: Code debug (tiếp)

Claude tìm ra bug qua reasoning — skill được trained từ training data.

Let me trace through:
fib(1): a=0, loop once, return a=1 ✓
fib(2): a=0→1→1, return 1 ✓
fib(5): a=0→1→1→2→3→5, return 5 ✓
Actually this seems correct. But user says wrong for large.
fib(100): should be 354224848179261915075
No Python int overflow issue...
Let me re-check the iteration...
Actually wait — it returns 'a' but 'a' was just assigned to old 'b'.
Let me re-trace more carefully...

Cost & latency

ROI: Pay 2x cost + latency for +20% accuracy on hard tasks. Worth for critical use cases.

Without thinkingWith thinking
Latency2s5-10s
Cost1x1.5-3x (thinking tokens)
Accuracy on hard tasks60-70%80-90%
Accuracy on easy tasks95%95% (no gain)

Anti-patterns

❌ Thinking for easy tasks

Classification "positive/negative" với thinking → tốn tokens no benefit.

Fix: Eval first. Thinking only if eval accuracy low.

❌ thinking_budget quá nhỏ

Budget 1024 cho complex task → Claude bị cắt giữa reasoning.

Fix: Budget 2000-8000 cho complex.

❌ max_tokens <= thinking_budget

Fix: max_tokens phải > thinking_budget + reasonable final output.

❌ Modify thinking text

User có thể tamper thinking để jailbreak.

Fix: Never modify. Pass original signature.

max_tokens=1000, thinking_budget=2000  # INVALID

Áp dụng ngay

Bài tập 1: Compare with/without thinking (20 phút)

Test 5 hard problems:

Each: run with + without thinking. Score accuracy.

Bài tập 2: Eval-driven decision (20 phút)

Cho eval dataset bạn đã có (Module 4), run với thinking. Compare score.

Quyết định: worth enabling hay không.

  • Complex math
  • Multi-step logic
  • Algorithm design
  • Code debug
  • Decision analysis

Tóm tắt

🎯 Extended thinking = "scratchpad" cho complex reasoning tasks.

🎯 Enable: thinking={"type": "enabled", "budget_tokens": 1024+}.

🎯 Response có ThinkingBlock với text + signature security.

🎯 Trade-off: 2-3x cost + latency, +20% accuracy on hard tasks.

🎯 Eval-driven: thử prompt eng first, thinking nếu chưa đủ.

Nội dung này có hữu ích không?