Image support — Claude đọc ảnh (Vision) — Building with the Claude API

Tất cả model Claude 4+ có vision. Gửi ảnh + câu hỏi → Claude analyze.

Bạn sẽ học được

Encode image base64 và gửi trong message
Biết limits: size, count, dimensions, token cost
Apply prompting engineering cho image (chain-of-thought, few-shot)
Xây case study: Fire risk assessment từ satellite image

Limits

Example: 1000×1000 image = ~1333 tokens. Budget accordingly.

	Limit
Images per request	100
Size per image	5 MB
Dimensions (single image)	8000 × 8000 px max
Dimensions (multi image)	2000 × 2000 px max
Format	PNG, JPG, GIF, WEBP
Source	base64 OR URL
Token cost	(width × height) / 750 tokens

Sending image — Base64

import base64
from anthropic import Anthropic

client = Anthropic()

with open("image.png", "rb") as f:
    image_bytes = base64.standard_b64encode(f.read()).decode("utf-8")

messages = [{
    "role": "user",
    "content": [
        {
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": "image/png",
                "data": image_bytes
            }
        },
        {
            "type": "text",
            "text": "What do you see in this image?"
        }
    ]
}]

response = client.messages.create(
    model="claude-sonnet-5-20260205",
    max_tokens=1000,
    messages=messages
)

print(response.content[0].text)

Sending image — URL

URL must be publicly accessible. Anthropic fetch server-side.

messages = [{
    "role": "user",
    "content": [
        {
            "type": "image",
            "source": {
                "type": "url",
                "url": "https://example.com/image.jpg"
            }
        },
        {"type": "text", "text": "Describe this."}
    ]
}]

Multiple images

Up to 100 images per request. Useful cho:

Before/after comparison
Product variants
Batch analysis

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "source": {...img1}},
        {"type": "image", "source": {...img2}},
        {"type": "text", "text": "Compare these two images."}
    ]
}]

Prompting cho image

Same rules as text (Module 3):

Simple prompt — poor results

Claude count có thể sai với dense image.

Structured prompt — great results

Clear & direct
Specific
Examples
XML structure

text = "How many marbles?"

Structured prompt — great results

Accuracy tăng từ 60% → 95% cho counting task.

Few-shot với image

Dùng image làm example:

text = """Count marbles in the image using this methodology:

1. Identify each unique marble one at a time, numbering as you go
2. Verify by counting differently: bottom-left to top-right, row by row

What is the exact count?"""

Few-shot với image

Claude học pattern từ reference → apply to target.

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "source": {...reference_image}},
        {"type": "text", "text": "This image has 12 marbles. The pattern is 3x4 grid."},
        {"type": "image", "source": {...target_image}},
        {"type": "text", "text": "How many marbles in this second image?"}
    ]
}]

Case study: Fire risk assessment

Scenario

Insurance company wants automated fire risk rating cho properties từ satellite images.

Naive prompt

Output: inconsistent, subjective.

Production prompt

"Rate fire risk 1-10"

Production prompt

Output

prompt = """Analyze satellite image of property with these steps:

1. **Residence identification**: Locate primary residence:
   - Largest roofed structure
   - Typical residential features (driveway, geometry)

2. **Tree overhang analysis**: For trees near residence:
   - Identify branches over roof
   - Estimate % roof coverage (0-25%, 25-50%, 50-75%, 75%+)
   - Note density

3. **Fire risk factors**:
   - Ember catch points
   - Fuel paths from wildland to structure
   - Proximity to chimneys/vents

4. **Defensible space**:
   - Continuous canopy?
   - Fuel ladders (ground → tree → roof)?

5. **Final rating** (1-4):
   - 1 (Low): no overhang, good defensible space
   - 2 (Moderate): <25% overhang, some separation
   - 3 (High): 25-50% overhang, connected canopies
   - 4 (Severe): >50% overhang, dense vegetation

For each item 1-5, write ONE sentence finding. End with rating number.
"""

messages = [{
    "role": "user",
    "content": [
        {"type": "image", "source": {...satellite_image}},
        {"type": "text", "text": prompt}
    ]
}]

Output

Consistent, auditable, scalable. 1000 properties/day automated.

1. Residence: Single-story ranch home, ~2000 sqft, central.
2. Tree overhang: 5 trees within 10m, 30% roof coverage from SE oak.
3. Fire risk: Eaves exposed, branch touches roof (bridge point).
4. Defensible space: Trees form continuous canopy N→W of house.
5. Rating: 3 (High Risk)

Case studies theo ngành

🏥 Healthcare — X-ray second opinion

Prompt: "Identify abnormalities. Rate severity. Flag for radiologist review."

Disclaimer: Not diagnostic. Assists human radiologist.

🏭 Manufacturing — Defect detection

Factory camera + Claude vision → detect visual defects on assembly line.

🏪 Retail — Planogram compliance

Store shelf photos → Claude verify products placed correctly per planogram.

📄 Legal — Document OCR + analysis

Scanned contract → extract clauses, flag issues. Better than traditional OCR for handwritten / poor scan.

🎨 Design — Style match

Brand guidelines image + new creative asset → "Does this match brand style?"

Optimization

Resize before sending

Smaller image = fewer tokens = cheaper. Balance quality.

Cost estimation

1000×1000 image = ~1333 input tokens

Cost Sonnet: 1333 × $3 / 1M = $0.004 per image

Processing 10,000 images/day = $40/day. Reasonable.

Cache common context

Nếu prompt instructions long + reused → enable prompt caching (bài 6.47-6.49) → 90% discount.

from PIL import Image

img = Image.open("large.jpg")
img.thumbnail((1024, 1024))  # resize keeping aspect ratio
img.save("resized.jpg", quality=85)

Anti-patterns

❌ Giant images

5MB raw → slow upload, expensive tokens.

Fix: Resize before. 1024px usually enough.

❌ Low-res cho detail work

256×256 cho OCR → text illegible.

Fix: Use appropriate resolution for task.

❌ Prompt "what's this?" generic

Claude describe generally, không specific.

Fix: Guide với methodology (steps).

❌ Trust numerical count blindly

Claude có thể undercount dense objects (> 20).

Fix: Use step-by-step prompt + verify pattern.

Áp dụng ngay

Bài tập 1: Describe 3 images (20 phút)

3 image khác nhau (personal photos). Prompt:

Compare output quality.

Bài tập 2: Chart analysis (15 phút)

Screenshot chart từ web. Ask Claude:

Test với structured prompt.

Simple: "What's in this image?"
Structured: Methodology-based
Extract: "List all objects with counts"
Extract data points
Identify trend
Suggest 3 insights

Tóm tắt

🎯 Vision có sẵn trên Claude 4+. Base64 hoặc URL.

🎯 Limits: 100 images/request, 5MB each, dimensions scale.

🎯 Token = (w × h) / 750. Resize để tiết kiệm.

🎯 Structured prompt → 95% accuracy cho task count/analyze.

🎯 Case studies: satellite, medical, retail, legal — mở khóa visual workflows.

Nội dung này có hữu ích không?