User hỏi về document bạn provide, Claude trả lời. Nhưng: - Claude tưởng tượng (hallucinate)?
- Enable citations cho PDF / text documents
- Handle citation metadata trong response
- Build UI với interactive citations
- Hiểu khác biệt giữa citations (docs) vs web_search citations
Enable citations
Modify document block:
2 fields thêm:
- title — readable name
- citations: {enabled: True} — track citations
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": file_bytes,
},
"title": "earth.pdf", # ← Add
"citations": {"enabled": True} # ← Add
}Response structure
Without citations — plain text.
With citations — structured blocks:
For PDF (pages)
response.content = [
TextBlock(
type="text",
text="Earth's atmosphere formed over billions of years..."
),
CitationCharLocationBlock( # or CitationPageLocationBlock
type="char_location_citation", # text source
cited_text="The atmosphere formed 4.5 billion years ago...",
document_index=0,
document_title="earth.pdf",
start_char_index=1234,
end_char_index=1380
)
]For PDF (pages)
CitationPageLocationBlock(
cited_text="...",
document_index=0,
document_title="earth.pdf",
start_page_number=3,
end_page_number=4
)Extract citations
def extract_with_citations(response):
citations = []
for block in response.content:
if block.type == "text":
print(block.text)
elif "citation" in block.type:
citations.append({
"cited_text": block.cited_text,
"document": block.document_title,
"page": getattr(block, "start_page_number", None),
"char_start": getattr(block, "start_char_index", None)
})
print("\n=== Citations ===")
for c in citations:
print(f"- {c['cited_text'][:80]}... (page {c['page']})")Plain text document
Cho text file (không PDF):
Citation returns character index (start_char_index, end_char_index) thay vì page.
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": article_text # raw string
},
"title": "article_v1",
"citations": {"enabled": True}
}UI pattern: Interactive citations
Frontend render:
Hover [1]:
Click [1] → open PDF at page 3, highlight text.
Implementation
┌────────────────────────────────────────┐ │ earth.pdf, page 3 │ │ "The atmosphere formed 4.5 billion │ │ years ago from volcanic outgassing..."│ └────────────────────────────────────────┘
"Earth's atmosphere formed over billions of years [1].
Oxygen levels rose significantly around 2.4 billion years ago [2]."Implementation
def render_with_citations(blocks, citations_list):
output = ""
citation_num = 0
for block in blocks:
if block.type == "text":
output += block.text
elif "citation" in block.type:
citation_num += 1
output += f" [{citation_num}]"
citations_list.append({
"num": citation_num,
"text": block.cited_text,
"page": block.start_page_number
})
return output, citations_listVí dụ: Contract Q&A
Paralegal verify immediately — không cần đọc 50 trang.
with open("service_agreement.pdf", "rb") as f:
pdf_bytes = base64.standard_b64encode(f.read()).decode("utf-8")
messages = [{
"role": "user",
"content": [
{
"type": "document",
"source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
"title": "service_agreement.pdf",
"citations": {"enabled": True}
},
{
"type": "text",
"text": "What is the termination notice period?"
}
]
}]
response = client.messages.create(
model="claude-sonnet-5-20260205",
max_tokens=1000,
messages=messages
)
# Output:
# "The contract requires 30-day written notice [1] for termination."
# Citations: [1] "Either party may terminate this agreement with thirty (30) days..." — page 7Multi-document citations
Pass multiple docs, Claude cite correct one:
Citation có document_index=0 (v1) hoặc document_index=1 (v2). Track correctly.
messages = [{
"role": "user",
"content": [
{"type": "document", "source": {...}, "title": "contract_v1.pdf", "citations": {"enabled": True}},
{"type": "document", "source": {...}, "title": "contract_v2.pdf", "citations": {"enabled": True}},
{"type": "text", "text": "What changed in section 5.2 between v1 and v2?"}
]
}]Document vs Web search citations
Combine: Upload your corporate docs (citations) + web search (fact-check external).
| Document citations | Web search citations | |
|---|---|---|
| Source | Your uploaded docs | Internet pages |
| Control | You choose content | Claude searches |
| Reliability | You curate | Depends quality |
| Block type | Citation* blocks | WebSearchResult + citation |
When to use citations
✅ Use when
⚠️ Skip when
- Users need to verify info (legal, medical, financial)
- Building trust for AI-assisted research
- Content requires source attribution
- Audit log compliance
- Casual chatbot
- Generation tasks (creative writing)
- Internal quick queries where user already knows source
Anti-patterns
❌ Enable but không render citations
Citations in response → nếu không display, user không biết → mất value.
Fix: UI layer phải extract + render.
❌ Chỉ dùng với 1 doc nhỏ
Over-engineering. Simple text prompt đủ.
Fix: Citations shine với 50+ page docs, multi-doc comparisons.
❌ Assume Claude cite mọi statement
Claude chỉ cite khi direct reference. General knowledge uncited.
Fix: Prompt explicit "cite every fact-based claim" nếu cần strict.
Áp dụng ngay
Bài tập 1: Cited Q&A (15 phút)
Upload bất kỳ PDF (research paper, manual). Ask 3 questions. Observe citations.
Verify citations accurate bằng mở PDF tại page được cite.
Bài tập 2: Multi-doc cited compare (20 phút)
2 documents similar topic (2 articles về same event).
Ask Claude compare. Observe Claude attribute correctly to each doc.
Tóm tắt
🎯 Citations = trust through transparency. User verify bằng cited_text + page.
🎯 Enable: "citations": {"enabled": True} trong document block.
🎯 Response có Citation blocks với cited_text, document_title, page/char.
🎯 UI pattern: inline [1] markers, hover show source.
🎯 Essential cho legal, medical, financial, research domains.