Citations — Trust qua transparency

6 — Tính năng nâng caoTrung cấp15 phút

User hỏi về document bạn provide, Claude trả lời. Nhưng: - Claude tưởng tượng (hallucinate)?

Bạn sẽ học được
  • Enable citations cho PDF / text documents
  • Handle citation metadata trong response
  • Build UI với interactive citations
  • Hiểu khác biệt giữa citations (docs) vs web_search citations

Enable citations

Modify document block:

2 fields thêm:

  • title — readable name
  • citations: {enabled: True} — track citations
{
    "type": "document",
    "source": {
        "type": "base64",
        "media_type": "application/pdf",
        "data": file_bytes,
    },
    "title": "earth.pdf",           # ← Add
    "citations": {"enabled": True}   # ← Add
}

Response structure

Without citations — plain text.

With citations — structured blocks:

For PDF (pages)

response.content = [
    TextBlock(
        type="text",
        text="Earth's atmosphere formed over billions of years..."
    ),
    CitationCharLocationBlock(  # or CitationPageLocationBlock
        type="char_location_citation",  # text source
        cited_text="The atmosphere formed 4.5 billion years ago...",
        document_index=0,
        document_title="earth.pdf",
        start_char_index=1234,
        end_char_index=1380
    )
]

For PDF (pages)

CitationPageLocationBlock(
    cited_text="...",
    document_index=0,
    document_title="earth.pdf",
    start_page_number=3,
    end_page_number=4
)

Extract citations

def extract_with_citations(response):
    citations = []
    for block in response.content:
        if block.type == "text":
            print(block.text)
        elif "citation" in block.type:
            citations.append({
                "cited_text": block.cited_text,
                "document": block.document_title,
                "page": getattr(block, "start_page_number", None),
                "char_start": getattr(block, "start_char_index", None)
            })
    
    print("\n=== Citations ===")
    for c in citations:
        print(f"- {c['cited_text'][:80]}... (page {c['page']})")

Plain text document

Cho text file (không PDF):

Citation returns character index (start_char_index, end_char_index) thay vì page.

{
    "type": "document",
    "source": {
        "type": "text",
        "media_type": "text/plain",
        "data": article_text  # raw string
    },
    "title": "article_v1",
    "citations": {"enabled": True}
}

UI pattern: Interactive citations

Frontend render:

Hover [1]:

Click [1] → open PDF at page 3, highlight text.

Implementation

┌────────────────────────────────────────┐
│ earth.pdf, page 3                      │
│ "The atmosphere formed 4.5 billion     │
│  years ago from volcanic outgassing..."│
└────────────────────────────────────────┘
"Earth's atmosphere formed over billions of years [1].
 Oxygen levels rose significantly around 2.4 billion years ago [2]."

Implementation

def render_with_citations(blocks, citations_list):
    output = ""
    citation_num = 0
    
    for block in blocks:
        if block.type == "text":
            output += block.text
        elif "citation" in block.type:
            citation_num += 1
            output += f" [{citation_num}]"
            citations_list.append({
                "num": citation_num,
                "text": block.cited_text,
                "page": block.start_page_number
            })
    
    return output, citations_list

Ví dụ: Contract Q&A

Paralegal verify immediately — không cần đọc 50 trang.

with open("service_agreement.pdf", "rb") as f:
    pdf_bytes = base64.standard_b64encode(f.read()).decode("utf-8")

messages = [{
    "role": "user",
    "content": [
        {
            "type": "document",
            "source": {"type": "base64", "media_type": "application/pdf", "data": pdf_bytes},
            "title": "service_agreement.pdf",
            "citations": {"enabled": True}
        },
        {
            "type": "text",
            "text": "What is the termination notice period?"
        }
    ]
}]

response = client.messages.create(
    model="claude-sonnet-5-20260205",
    max_tokens=1000,
    messages=messages
)

# Output:
# "The contract requires 30-day written notice [1] for termination."
# Citations: [1] "Either party may terminate this agreement with thirty (30) days..." — page 7

Multi-document citations

Pass multiple docs, Claude cite correct one:

Citation có document_index=0 (v1) hoặc document_index=1 (v2). Track correctly.

messages = [{
    "role": "user",
    "content": [
        {"type": "document", "source": {...}, "title": "contract_v1.pdf", "citations": {"enabled": True}},
        {"type": "document", "source": {...}, "title": "contract_v2.pdf", "citations": {"enabled": True}},
        {"type": "text", "text": "What changed in section 5.2 between v1 and v2?"}
    ]
}]

Document vs Web search citations

Combine: Upload your corporate docs (citations) + web search (fact-check external).

Document citationsWeb search citations
SourceYour uploaded docsInternet pages
ControlYou choose contentClaude searches
ReliabilityYou curateDepends quality
Block typeCitation* blocksWebSearchResult + citation

When to use citations

✅ Use when

⚠️ Skip when

  • Users need to verify info (legal, medical, financial)
  • Building trust for AI-assisted research
  • Content requires source attribution
  • Audit log compliance
  • Casual chatbot
  • Generation tasks (creative writing)
  • Internal quick queries where user already knows source

Anti-patterns

❌ Enable but không render citations

Citations in response → nếu không display, user không biết → mất value.

Fix: UI layer phải extract + render.

❌ Chỉ dùng với 1 doc nhỏ

Over-engineering. Simple text prompt đủ.

Fix: Citations shine với 50+ page docs, multi-doc comparisons.

❌ Assume Claude cite mọi statement

Claude chỉ cite khi direct reference. General knowledge uncited.

Fix: Prompt explicit "cite every fact-based claim" nếu cần strict.

Áp dụng ngay

Bài tập 1: Cited Q&A (15 phút)

Upload bất kỳ PDF (research paper, manual). Ask 3 questions. Observe citations.

Verify citations accurate bằng mở PDF tại page được cite.

Bài tập 2: Multi-doc cited compare (20 phút)

2 documents similar topic (2 articles về same event).

Ask Claude compare. Observe Claude attribute correctly to each doc.

Tóm tắt

🎯 Citations = trust through transparency. User verify bằng cited_text + page.

🎯 Enable: "citations": {"enabled": True} trong document block.

🎯 Response có Citation blocks với cited_text, document_title, page/char.

🎯 UI pattern: inline [1] markers, hover show source.

🎯 Essential cho legal, medical, financial, research domains.

Nội dung này có hữu ích không?