Tất cả model Claude 4+ có vision. Gửi ảnh + câu hỏi → Claude analyze.
- Encode image base64 và gửi trong message
- Biết limits: size, count, dimensions, token cost
- Apply prompting engineering cho image (chain-of-thought, few-shot)
- Xây case study: Fire risk assessment từ satellite image
Limits
Example: 1000×1000 image = ~1333 tokens. Budget accordingly.
| Limit | |
|---|---|
| Images per request | 100 |
| Size per image | 5 MB |
| Dimensions (single image) | 8000 × 8000 px max |
| Dimensions (multi image) | 2000 × 2000 px max |
| Format | PNG, JPG, GIF, WEBP |
| Source | base64 OR URL |
| Token cost | (width × height) / 750 tokens |
Sending image — Base64
import base64
from anthropic import Anthropic
client = Anthropic()
with open("image.png", "rb") as f:
image_bytes = base64.standard_b64encode(f.read()).decode("utf-8")
messages = [{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_bytes
}
},
{
"type": "text",
"text": "What do you see in this image?"
}
]
}]
response = client.messages.create(
model="claude-sonnet-5-20260205",
max_tokens=1000,
messages=messages
)
print(response.content[0].text)Sending image — URL
URL must be publicly accessible. Anthropic fetch server-side.
messages = [{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/image.jpg"
}
},
{"type": "text", "text": "Describe this."}
]
}]Multiple images
Up to 100 images per request. Useful cho:
- Before/after comparison
- Product variants
- Batch analysis
messages = [{
"role": "user",
"content": [
{"type": "image", "source": {...img1}},
{"type": "image", "source": {...img2}},
{"type": "text", "text": "Compare these two images."}
]
}]Prompting cho image
Same rules as text (Module 3):
Simple prompt — poor results
Claude count có thể sai với dense image.
Structured prompt — great results
- Clear & direct
- Specific
- Examples
- XML structure
text = "How many marbles?"Structured prompt — great results
Accuracy tăng từ 60% → 95% cho counting task.
Few-shot với image
Dùng image làm example:
text = """Count marbles in the image using this methodology:
1. Identify each unique marble one at a time, numbering as you go
2. Verify by counting differently: bottom-left to top-right, row by row
What is the exact count?"""Few-shot với image
Claude học pattern từ reference → apply to target.
messages = [{
"role": "user",
"content": [
{"type": "image", "source": {...reference_image}},
{"type": "text", "text": "This image has 12 marbles. The pattern is 3x4 grid."},
{"type": "image", "source": {...target_image}},
{"type": "text", "text": "How many marbles in this second image?"}
]
}]Case study: Fire risk assessment
Scenario
Insurance company wants automated fire risk rating cho properties từ satellite images.
Naive prompt
Output: inconsistent, subjective.
Production prompt
"Rate fire risk 1-10"Production prompt
Output
prompt = """Analyze satellite image of property with these steps:
1. **Residence identification**: Locate primary residence:
- Largest roofed structure
- Typical residential features (driveway, geometry)
2. **Tree overhang analysis**: For trees near residence:
- Identify branches over roof
- Estimate % roof coverage (0-25%, 25-50%, 50-75%, 75%+)
- Note density
3. **Fire risk factors**:
- Ember catch points
- Fuel paths from wildland to structure
- Proximity to chimneys/vents
4. **Defensible space**:
- Continuous canopy?
- Fuel ladders (ground → tree → roof)?
5. **Final rating** (1-4):
- 1 (Low): no overhang, good defensible space
- 2 (Moderate): <25% overhang, some separation
- 3 (High): 25-50% overhang, connected canopies
- 4 (Severe): >50% overhang, dense vegetation
For each item 1-5, write ONE sentence finding. End with rating number.
"""
messages = [{
"role": "user",
"content": [
{"type": "image", "source": {...satellite_image}},
{"type": "text", "text": prompt}
]
}]Output
Consistent, auditable, scalable. 1000 properties/day automated.
1. Residence: Single-story ranch home, ~2000 sqft, central.
2. Tree overhang: 5 trees within 10m, 30% roof coverage from SE oak.
3. Fire risk: Eaves exposed, branch touches roof (bridge point).
4. Defensible space: Trees form continuous canopy N→W of house.
5. Rating: 3 (High Risk)Case studies theo ngành
🏥 Healthcare — X-ray second opinion
Prompt: "Identify abnormalities. Rate severity. Flag for radiologist review."
Disclaimer: Not diagnostic. Assists human radiologist.
🏭 Manufacturing — Defect detection
Factory camera + Claude vision → detect visual defects on assembly line.
🏪 Retail — Planogram compliance
Store shelf photos → Claude verify products placed correctly per planogram.
📄 Legal — Document OCR + analysis
Scanned contract → extract clauses, flag issues. Better than traditional OCR for handwritten / poor scan.
🎨 Design — Style match
Brand guidelines image + new creative asset → "Does this match brand style?"
Optimization
Resize before sending
Smaller image = fewer tokens = cheaper. Balance quality.
Cost estimation
1000×1000 image = ~1333 input tokens
Cost Sonnet: 1333 × $3 / 1M = $0.004 per image
Processing 10,000 images/day = $40/day. Reasonable.
Cache common context
Nếu prompt instructions long + reused → enable prompt caching (bài 6.47-6.49) → 90% discount.
from PIL import Image
img = Image.open("large.jpg")
img.thumbnail((1024, 1024)) # resize keeping aspect ratio
img.save("resized.jpg", quality=85)Anti-patterns
❌ Giant images
5MB raw → slow upload, expensive tokens.
Fix: Resize before. 1024px usually enough.
❌ Low-res cho detail work
256×256 cho OCR → text illegible.
Fix: Use appropriate resolution for task.
❌ Prompt "what's this?" generic
Claude describe generally, không specific.
Fix: Guide với methodology (steps).
❌ Trust numerical count blindly
Claude có thể undercount dense objects (> 20).
Fix: Use step-by-step prompt + verify pattern.
Áp dụng ngay
Bài tập 1: Describe 3 images (20 phút)
3 image khác nhau (personal photos). Prompt:
Compare output quality.
Bài tập 2: Chart analysis (15 phút)
Screenshot chart từ web. Ask Claude:
Test với structured prompt.
- Simple: "What's in this image?"
- Structured: Methodology-based
- Extract: "List all objects with counts"
- Extract data points
- Identify trend
- Suggest 3 insights
Tóm tắt
🎯 Vision có sẵn trên Claude 4+. Base64 hoặc URL.
🎯 Limits: 100 images/request, 5MB each, dimensions scale.
🎯 Token = (w × h) / 750. Resize để tiết kiệm.
🎯 Structured prompt → 95% accuracy cho task count/analyze.
🎯 Case studies: satellite, medical, retail, legal — mở khóa visual workflows.