Caching opt-in. Bạn phải manually mark content nào cache bằng cache_control field.
- Set cache breakpoints trên text, system, tools, messages
- Hiểu cache ordering: tools → system → messages
- Biết content change → invalidate cache
- Multiple breakpoints (max 4) cho multi-tier caching
Cache breakpoint syntax
Text block với cache
Shorthand không support cache:
Longhand với cache:
# ❌ Không cache được
{"role": "user", "content": "Long text..."}Text block với cache
Rule
Tất cả work TRƯỚC breakpoint được cache. Follow-up request: content phải identical up to and including breakpoint để cache hit.
# ✅ Cacheable
{
"role": "user",
"content": [
{
"type": "text",
"text": "Long text...",
"cache_control": {"type": "ephemeral"} # ← cache marker
}
]
}Cache system prompt
System prompt thường stable → prime candidate.
response = client.messages.create(
model=model,
max_tokens=1000,
system=[ # ← List, not string
{
"type": "text",
"text": long_system_prompt,
"cache_control": {"type": "ephemeral"}
}
],
messages=messages
)Cache tools
Trên last tool trong list:
Tất cả tools trước và including last được cache.
tools_with_cache = tools.copy()
tools_with_cache[-1] = {**tools[-1], "cache_control": {"type": "ephemeral"}}
response = client.messages.create(
tools=tools_with_cache,
...
)Cache messages
Tất cả history lên đến (và including) message này được cache.
messages = [
{"role": "user", "content": "First turn"},
{"role": "assistant", "content": "Response"},
{"role": "user", "content": [
{
"type": "text",
"text": "Second turn",
"cache_control": {"type": "ephemeral"}
}
]}
]Ordering matter
Claude process request theo order:
Cache breakpoints respect order.
Example
User gõ query khác nhau mỗi lần → messages khác → nhưng tools + system cache vẫn hit.
Optimal: tools + system = stable → cache. User query = dynamic → no cache.
- Tools
- System prompt
- Messages
# Setup
tools[-1]["cache_control"] = {"type": "ephemeral"} # breakpoint 1
system[0]["cache_control"] = {"type": "ephemeral"} # breakpoint 2
# Cached: tools + system prompt
# Non-cached: messages (user query varies)Invalidation
Một ký tự thay đổi → cache MISS.
Cache strict identical match.
Implications
Best practice
- Don't interpolate dynamic content into system prompt
- Parameter substitution breaks cache
- Whitespace matters
- Comment order matters
# Request 1
system_v1 = "You are helpful assistant."
# Cache created
# Request 2
system_v2 = "You are helpful assistant" # Missing period!
# Cache MISS — reprocess fullBest practice
Cache stable portion, accept dynamic portion uncached.
# ✅ Keep static portion separately
STABLE_SYSTEM = "You are a helpful assistant..." # 3000 tokens, cached
# Dynamic context go in user message (not cached)
user_msg = f"Current user: {user_id}\n\n{question}"Multiple breakpoints (max 4)
Tăng flexibility cho multi-tier caching:
Partial cache hit:
- Tools match → read cache
- System match → read cache
- First 10 messages match → read cache
- Newer messages → process fresh
# Breakpoint 1: tools
tools[-1]["cache_control"] = {"type": "ephemeral"}
# Breakpoint 2: system
system[0]["cache_control"] = {"type": "ephemeral"}
# Breakpoint 3: stable history (first 10 turns)
messages[9]["content"][-1]["cache_control"] = {"type": "ephemeral"}
# No breakpoint in newer messages (they change)Cross-message caching
Breakpoint span messages:
Cache = all 5 messages. Useful cho long conversation base.
messages = [
{"role": "user", "content": "Question 1"},
{"role": "assistant", "content": "Answer 1"},
{"role": "user", "content": "Question 2"},
{"role": "assistant", "content": "Answer 2"},
{"role": "user", "content": [
{
"type": "text",
"text": "Question 3",
"cache_control": {"type": "ephemeral"}
}
]}
]Cache control on image/PDF
Document/image blocks support:
Use case: PDF 50K tokens, multiple queries cùng PDF → cache PDF, save massive cost.
{
"type": "document",
"source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
"cache_control": {"type": "ephemeral"}
}Example: Production setup
3 breakpoints: tools, system, PDF. 25% cost on first request, 90% discount every subsequent query trên same PDF.
def build_cached_request(user_question, conversation_history, pdf_bytes):
return {
"model": "claude-sonnet-5-20260205",
"max_tokens": 2000,
# Tools cached
"tools": [
*other_tools,
{**last_tool, "cache_control": {"type": "ephemeral"}}
],
# System cached
"system": [{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"}
}],
"messages": [
{
"role": "user",
"content": [
# PDF cached
{
"type": "document",
"source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
"cache_control": {"type": "ephemeral"}
},
# History (could add breakpoint if stable)
*conversation_history,
# User question — dynamic, no cache
{"type": "text", "text": user_question}
]
}
]
}Cache ordering bugs
Gotcha 1: Tools reorder
Fix: Consistent ordering.
Gotcha 2: Text whitespace
# Request 1
tools = [tool_a, tool_b, tool_c] # last = tool_c cached
# Request 2
tools = [tool_c, tool_a, tool_b] # order changed!
# Cache MISS — different "last" hashGotcha 2: Text whitespace
Fix: Normalize whitespace.
Gotcha 3: Param interpolation
# Request 1
system = "You are helpful."
# Request 2
system = "You are helpful. " # trailing space
# Cache MISSGotcha 3: Param interpolation
Fix: Move dynamic stuff out of cached portion.
system = f"You are assistant for {user_name}."
# Each user = unique cache, uselessAnti-patterns
❌ Cache content < 1024 tokens
Ignored. Wasted code complexity.
Fix: Combined cacheable block ≥ 1024.
❌ Dynamic content in cached block
Templating → every request unique → cache miss.
Fix: Static portion only in cache.
❌ Forget cache_control on follow-up
Request 1 with cache, request 2 without → miss.
Fix: Consistent cache_control on every identical content.
❌ Abuse all 4 breakpoints
Over-fragmentation → complexity without benefit.
Fix: Start với 1-2 breakpoint (system, tools). Add more only if needed.
Áp dụng ngay
Bài tập 1: Add caching to your code (20 phút)
Lấy existing chatbot (bài 6.8) → add cache_control cho system prompt.
Log cache_creation_input_tokens và cache_read_input_tokens mỗi request. Verify hit rate.
Bài tập 2: Multi-breakpoint (30 phút)
Update có 3 breakpoints: tools, system, long conversation history (first 10 msgs).
Test scenario: 20 turn conversation. Monitor cache metrics.
Tóm tắt
🎯 Cache opt-in: cache_control: {type: "ephemeral"} breakpoint.
🎯 Work BEFORE breakpoint cached. Content must match identically follow-up.
🎯 Ordering: tools → system → messages. Respect order.
🎯 Max 4 breakpoints cho multi-tier caching.
🎯 Invalidation = single char difference. Keep cached portion truly static.