Rules of prompt caching — Breakpoints + invalidation

6 — Tính năng nâng caoNâng cao20 phút

Caching opt-in. Bạn phải manually mark content nào cache bằng cache_control field.

Bạn sẽ học được
  • Set cache breakpoints trên text, system, tools, messages
  • Hiểu cache ordering: tools → system → messages
  • Biết content change → invalidate cache
  • Multiple breakpoints (max 4) cho multi-tier caching

Cache breakpoint syntax

Text block với cache

Shorthand không support cache:

Longhand với cache:

# ❌ Không cache được
{"role": "user", "content": "Long text..."}

Text block với cache

Rule

Tất cả work TRƯỚC breakpoint được cache. Follow-up request: content phải identical up to and including breakpoint để cache hit.

# ✅ Cacheable
{
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Long text...",
            "cache_control": {"type": "ephemeral"}  # ← cache marker
        }
    ]
}

Cache system prompt

System prompt thường stable → prime candidate.

response = client.messages.create(
    model=model,
    max_tokens=1000,
    system=[  # ← List, not string
        {
            "type": "text",
            "text": long_system_prompt,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=messages
)

Cache tools

Trên last tool trong list:

Tất cả tools trước và including last được cache.

tools_with_cache = tools.copy()
tools_with_cache[-1] = {**tools[-1], "cache_control": {"type": "ephemeral"}}

response = client.messages.create(
    tools=tools_with_cache,
    ...
)

Cache messages

Tất cả history lên đến (và including) message này được cache.

messages = [
    {"role": "user", "content": "First turn"},
    {"role": "assistant", "content": "Response"},
    {"role": "user", "content": [
        {
            "type": "text",
            "text": "Second turn",
            "cache_control": {"type": "ephemeral"}
        }
    ]}
]

Ordering matter

Claude process request theo order:

Cache breakpoints respect order.

Example

User gõ query khác nhau mỗi lần → messages khác → nhưng tools + system cache vẫn hit.

Optimal: tools + system = stable → cache. User query = dynamic → no cache.

  • Tools
  • System prompt
  • Messages
# Setup
tools[-1]["cache_control"] = {"type": "ephemeral"}     # breakpoint 1
system[0]["cache_control"] = {"type": "ephemeral"}     # breakpoint 2

# Cached: tools + system prompt
# Non-cached: messages (user query varies)

Invalidation

Một ký tự thay đổi → cache MISS.

Cache strict identical match.

Implications

Best practice

  • Don't interpolate dynamic content into system prompt
  • Parameter substitution breaks cache
  • Whitespace matters
  • Comment order matters
# Request 1
system_v1 = "You are helpful assistant."
# Cache created

# Request 2
system_v2 = "You are helpful assistant"  # Missing period!
# Cache MISS — reprocess full

Best practice

Cache stable portion, accept dynamic portion uncached.

# ✅ Keep static portion separately
STABLE_SYSTEM = "You are a helpful assistant..."  # 3000 tokens, cached

# Dynamic context go in user message (not cached)
user_msg = f"Current user: {user_id}\n\n{question}"

Multiple breakpoints (max 4)

Tăng flexibility cho multi-tier caching:

Partial cache hit:

  • Tools match → read cache
  • System match → read cache
  • First 10 messages match → read cache
  • Newer messages → process fresh
# Breakpoint 1: tools
tools[-1]["cache_control"] = {"type": "ephemeral"}

# Breakpoint 2: system
system[0]["cache_control"] = {"type": "ephemeral"}

# Breakpoint 3: stable history (first 10 turns)
messages[9]["content"][-1]["cache_control"] = {"type": "ephemeral"}

# No breakpoint in newer messages (they change)

Cross-message caching

Breakpoint span messages:

Cache = all 5 messages. Useful cho long conversation base.

messages = [
    {"role": "user", "content": "Question 1"},
    {"role": "assistant", "content": "Answer 1"},
    {"role": "user", "content": "Question 2"},
    {"role": "assistant", "content": "Answer 2"},
    {"role": "user", "content": [
        {
            "type": "text",
            "text": "Question 3",
            "cache_control": {"type": "ephemeral"}
        }
    ]}
]

Cache control on image/PDF

Document/image blocks support:

Use case: PDF 50K tokens, multiple queries cùng PDF → cache PDF, save massive cost.

{
    "type": "document",
    "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
    "cache_control": {"type": "ephemeral"}
}

Example: Production setup

3 breakpoints: tools, system, PDF. 25% cost on first request, 90% discount every subsequent query trên same PDF.

def build_cached_request(user_question, conversation_history, pdf_bytes):
    return {
        "model": "claude-sonnet-5-20260205",
        "max_tokens": 2000,
        
        # Tools cached
        "tools": [
            *other_tools,
            {**last_tool, "cache_control": {"type": "ephemeral"}}
        ],
        
        # System cached
        "system": [{
            "type": "text",
            "text": SYSTEM_PROMPT,
            "cache_control": {"type": "ephemeral"}
        }],
        
        "messages": [
            {
                "role": "user",
                "content": [
                    # PDF cached
                    {
                        "type": "document",
                        "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
                        "cache_control": {"type": "ephemeral"}
                    },
                    # History (could add breakpoint if stable)
                    *conversation_history,
                    # User question — dynamic, no cache
                    {"type": "text", "text": user_question}
                ]
            }
        ]
    }

Cache ordering bugs

Gotcha 1: Tools reorder

Fix: Consistent ordering.

Gotcha 2: Text whitespace

# Request 1
tools = [tool_a, tool_b, tool_c]  # last = tool_c cached

# Request 2
tools = [tool_c, tool_a, tool_b]  # order changed!
# Cache MISS — different "last" hash

Gotcha 2: Text whitespace

Fix: Normalize whitespace.

Gotcha 3: Param interpolation

# Request 1
system = "You are helpful."

# Request 2
system = "You are helpful. "  # trailing space
# Cache MISS

Gotcha 3: Param interpolation

Fix: Move dynamic stuff out of cached portion.

system = f"You are assistant for {user_name}."
# Each user = unique cache, useless

Anti-patterns

❌ Cache content < 1024 tokens

Ignored. Wasted code complexity.

Fix: Combined cacheable block ≥ 1024.

❌ Dynamic content in cached block

Templating → every request unique → cache miss.

Fix: Static portion only in cache.

❌ Forget cache_control on follow-up

Request 1 with cache, request 2 without → miss.

Fix: Consistent cache_control on every identical content.

❌ Abuse all 4 breakpoints

Over-fragmentation → complexity without benefit.

Fix: Start với 1-2 breakpoint (system, tools). Add more only if needed.

Áp dụng ngay

Bài tập 1: Add caching to your code (20 phút)

Lấy existing chatbot (bài 6.8) → add cache_control cho system prompt.

Log cache_creation_input_tokens và cache_read_input_tokens mỗi request. Verify hit rate.

Bài tập 2: Multi-breakpoint (30 phút)

Update có 3 breakpoints: tools, system, long conversation history (first 10 msgs).

Test scenario: 20 turn conversation. Monitor cache metrics.

Tóm tắt

🎯 Cache opt-in: cache_control: {type: "ephemeral"} breakpoint.

🎯 Work BEFORE breakpoint cached. Content must match identically follow-up.

🎯 Ordering: tools → system → messages. Respect order.

🎯 Max 4 breakpoints cho multi-tier caching.

🎯 Invalidation = single char difference. Keep cached portion truly static.

Nội dung này có hữu ích không?