When Properties Collide — Chẩn đoán khi 2 thuộc tính giao thoa — Năng lực & Giới hạn của AI

Tuần trước, đây là task thực tế:

Bạn sẽ học được

Nhận ra hầu hết AI failures liên quan 2 hoặc nhiều thuộc tính tương tác, không phải 1
Chẩn đoán các failure pattern phổ biến (hallucinated citations, long-conversation drift, confidently wrong math, agreeable bad premises) bằng cách đặt tên properties liên quan
Áp dụng targeted fix dựa vào thuộc tính limiting (nút cổ chai)
Xây diagnostic habit: trước khi reach for prompt fix, hỏi "which properties am I looking at?"

Four properties — nhắc lại nhanh

Trước khi diagnose collision, nhắc 4 properties (nếu cần ôn, quay lại Bài 17.3, 17.6, 17.8, 17.10):

🔮 Next Token Prediction — "Generates what sounds right"
    Predicts text-shape, không phải lookup facts.

🌐 Knowledge — "Knows what it was trained on"
    Broad but uneven, frozen at cutoff.

🧠 Working Memory — "Attends to what's nearby"
    Fixed context window, cliff failure.

🎚️ Steerability — "Follows the loudest instruction"
    Pattern-matches instructions, not intent.

5 collision patterns phổ biến

Collision 1: Next Token Prediction × Knowledge = Hallucinated specifics

Symptom: Hỏi về niche topic → get paper title, author names, journal, DOI — none real.

Cơ chế:

Ví dụ mạch lạc:

Fix (targeted):

Collision 2: Working Memory × Steerability = Long-conversation drift

Symptom: Set careful constraints at start; 20 messages later, half are being ignored.

Cơ chế:

Ví dụ:

NTP: Model đang generate text fitting "what a plausible citation looks like"
Knowledge: Có gap (niche or post-cutoff)
Model không biết phân biệt what it knows vs what it's inventing
✅ Source grounding: force retrieve from real document corpus → model can only cite retrieved
✅ Explicit "tag fabricated": "For each citation, tag HIGH confidence (verified in your training) or LOW (inferring)"
✅ Use web search + verify: Research mode Claude, Perplexity
❌ Don't: prompt "don't fabricate" — model doesn't know it's fabricating
Working Memory: Early context fading (lost in middle as chat grows)
Steerability: Follows whatever instructions most salient right now
Early constraints → distant → less salient → not applied

User: "Cite 3 papers on X (niche topic)"
Model: 
  - [Plausible-sounding citation 1]     ← maybe real
  - [Plausible-sounding citation 2]     ← fabricated
  - [Plausible-sounding citation 3]     ← fabricated

Collision 2: Working Memory × Steerability = Long-conversation drift

Fix (targeted):

Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale

Symptom: Instructions honored literally, intent consistently missed.

Cơ chế:

Ví dụ:

✅ Re-supply critical context: restate constraints near current message
✅ Start fresh với essentials up front (better than long drifted chat)
✅ Use system prompt for standing constraints (doesn't dilute)
✅ Use Projects/Custom Instructions — persists across chats
❌ Don't: assume "early instructions still apply"
NTP: Pattern-match on words
Steerability: Follow instruction without understanding goal
Model picks easiest-to-match interpretation

Turn 1: "Always respond in markdown. Never use first person. 
         Max 200 words."

Turns 2-30: Normal chat, various topics.

Turn 31: Response in prose, uses "I think", 400 words.
Constraints still in context, but attention weight diluted.

Collision 3: Next Token Prediction × Steerability = Letter-over-spirit at scale

Fix (targeted):

Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions

Symptom: Give domain-specific instruction → AI follows but produces wrong output because lacks domain knowledge.

Cơ chế:

Ví dụ:

✅ State intent per cluster: "These 15 emails: more formal. These 5: warmer."
✅ IPO framework (Bài 17.10)
✅ Chain-of-thought: ask model reason about each case
❌ Don't: rely on single instruction for heterogeneous task
Knowledge: Gap in specialized domain
Steerability: Follows instruction confidently anyway
Result: Confident wrong output (no signal of knowledge gap)

User: "Make these 20 emails more professional."
Model: Adds formal language to all 20.

But: 5 of the 20 had the opposite problem — too formal, 
    losing warmth with long-term customers.
    
Model applied one interpretation uniformly.

Collision 4: Knowledge × Steerability = Confidently wrong on niche instructions

Fix (targeted):

Collision 5: All 4 — Long multi-step task with specific domain

Symptom: Task is long (Working Memory), domain-specific (Knowledge), multi-step (Steerability drift), requires specifics (NTP hallucination). Everything fails.

Ví dụ: "Review 50-page medical contract between pharma company and CRO, flag compliance issues under 21 CFR Part 11."

Fix (layered):

✅ Provide domain context: paste FrameX docs into prompt
✅ Ask to verify knowledge: "Do you know FrameX? If not, flag."
✅ Show example first: give 1-2 examples of correct FrameX tests
❌ Don't: assume model "probably knows"
Working Memory: 50 pages stretching context
Knowledge: 21 CFR Part 11 niche + regulatory
Steerability: Multi-step review drifts
NTP: Specifics (sections, regulatory references) hallucination risk
✅ Chunking (WM fix)
✅ RAG with 21 CFR docs (Knowledge fix)
✅ Checkpoints per chunk (Steerability fix)
✅ "Cite exact section from contract" (NTP fix)
✅ Human legal review of output (all fix)

User: "Write a unit test for this Python function using our 
       internal test framework 'FrameX'."
       
Model: [writes test with pytest-like syntax that LOOKS like 
       FrameX but isn't actually FrameX]
       
Model didn't flag: "I'm not familiar with FrameX."

Diagnostic framework: "Which 2 collided?"

Trước khi reach for prompt fix, ask:

┌──────────────────────────────────────────────────────────┐
│                                                          │
│  DIAGNOSTIC CHECKLIST:                                   │
│                                                          │
│  1. What property question was most relevant?            │
│     - Where do answers come from? (NTP)                  │
│     - What does it know? (Knowledge)                     │
│     - What's it attending to? (WM)                       │
│     - How much am I in control? (Steerability)           │
│                                                          │
│  2. What was the FAILURE MODE?                           │
│     - Fabricated specifics? → NTP                        │
│     - Stale / wrong facts? → Knowledge                   │
│     - Ignored earlier info? → Working Memory             │
│     - Literal interpretation miss intent? → Steerability │
│                                                          │
│  3. PAIR them up — most failures involve 2:              │
│     - Fabricated + niche domain → NTP × Knowledge        │
│     - Drifted + long session → WM × Steerability         │
│     - Wrong output + specific instruction → K × S        │
│     - Literal + varied cases → NTP × Steerability        │
│                                                          │
│  4. Apply TARGETED fix:                                  │
│     - NTP: verify specifics, use grounding               │
│     - Knowledge: add context, use RAG/search             │
│     - WM: re-supply context, chunk, start fresh          │
│     - Steerability: restate goal, checkpoints, examples  │
│                                                          │
└──────────────────────────────────────────────────────────┘

Diagnostic table: Common symptoms → properties → fixes

Symptom	Properties likely at play	Best fix
AI cited paper that doesn't exist	NTP × Knowledge	Source grounding or verify specifics
After 30 messages AI forgot rules	WM × Steerability	Restate constraints or fresh chat
AI confidently wrong on niche term	Knowledge × Steerability	Provide domain context upfront
"Made shorter" but cut key info	NTP × Steerability	State goal alongside instruction (Bài 17.10)
Wrong math despite clear problem	NTP × Steerability	Code execution
AI agreed with wrong premise	Fingerprint (sycophancy) × Steerability	Neutral framing + "disagree if I'm wrong"
Long doc analysis missing middle	WM × NTP	Chunk + front-load critical
Policy answer from outdated info	Knowledge × WM	Provide current policy in context
Multi-step plan goes off rails	Steerability drift + WM	Checkpoints + re-verify goal
AI doesn't ask clarifying question	Steerability (+ over-caution)	Instruct: "Ask if ambiguous before proceeding"

Ví dụ theo ngành

💼 Sales Manager — "AI biến một deal thành 2 deals"

Situation: Sales manager dùng Claude để summarize 20 sales call transcripts cho team review.

Output: Claude cho 15 deal summaries — but 2 của chúng hợp nhất thành 1 deal mới bịa ra, và split 1 real deal thành 2 riêng biệt.

Diagnostic:

Collision: WM × NTP

Fix:

Kết quả: 0 lỗi merge/split. Process 2x faster.

🔍 Research Analyst — "Lit review bịa 3 paper"

Situation: Academic researcher nhờ Claude review 40 papers về emerging field.

Output: Excellent synthesis — but 3/40 paper names cited don't exist.

Diagnostic:

Collision: Knowledge × NTP (classic)

Fix:

Kết quả: 100% citation accuracy.

⚖️ Legal Counsel — "50-page contract review — AI missed 3 critical clauses"

Situation: Senior counsel review 50-page M&A contract via Claude.

Output: Found 8 issues — but missed 3 critical indemnification clauses buried in sections 4-6 (middle).

Diagnostic:

Collision: WM × Steerability

Fix:

Kết quả: Caught all 11 issues (including 3 missed earlier). Deal saved $1.5M risk.

💰 Finance Analyst — "Wrong IRR in memo"

Situation: Analyst asks Claude: "Calculate IRR for project with these cashflows: -100, 30, 35, 45, 50. Write investment memo with IRR + NPV at 10% discount."

Output: "IRR = 18.4%, NPV = $35.72M" (both approximate, off from actual).

Diagnostic:

Collision: NTP × Steerability (brittle arithmetic)

Fix:

Kết quả: Precision 100%. Memo credible.

🎧 Customer Success Manager — "Chat got worse over 30 messages"

Situation: CSM starts chat với Claude to draft renewal proposals for 10 clients. Long session, refining each.

Output: First 5 proposals — excellent, on-brand. Last 5 — drift: inconsistent tone, mixing brand styles, missing structure element from earlier.

Diagnostic:

Collision: WM × Steerability

Fix:

Kết quả: Consistency 10/10 proposals.

🏥 Clinical Research Coordinator — "AI gave wrong drug protocol"

Situation: CRC asks Claude about study drug dosing protocol for Phase III trial.

Output: Claude confidently states protocol. Later, checking with PI: several details wrong.

Diagnostic:

Collision: Knowledge × Steerability

Fix:

Kết quả: 0 errors. CRC can trust outputs.

Working Memory: 20 transcripts ~80K tokens, pushed into context length limits
NTP: Generating summary with normalized structure — model "filled pattern" instead of preserving unique deals
Chunk: 5 transcripts per pass
Per pass: extract structured output {deal_id, summary, status}
Aggregate at end
Verify count (20 in, 20 out)
Knowledge: emerging field → thin coverage
NTP: generating citations in academic format → fabrication pattern
Upload 40 PDFs into Claude Projects
Prompt: "Synthesize based ONLY on uploaded papers. Cite by [Paper #]. If unsure, flag [NEEDS VERIFY]"
After draft: verify every [Paper #] references uploaded doc
No citations from "training memory"
Working Memory: "lost in the middle" — sections 4-6 deep in attention dead zone
Steerability: instruction "find issues" was generic, drifted during long review
Chunk 50 pages into 5 × 10-page sections
Per chunk: specific instruction — "Find all indemnification, limitation of liability, termination clauses. Flag any deviation from market standard."
Aggregate at end with dedupe
NTP: arithmetic weakness (approximating by patterns)
Steerability: instruction "calculate" was literal but model lacked mechanism for precision
Code execution: model write Python, execute, use exact number
Prompt: "Compute using Python. Present exact number. Don't approximate."
Working Memory: early style guide + examples (Turn 1-3) now deep in history
Steerability: latest instruction salience trumps earlier structure
Minor: fingerprint verbosity leaking in as attention diffuses
Setup Claude Project với style guide + brand examples as standing docs
Per proposal: new chat trong Project (context clean, style reference loaded)
If long session: handoff doc every 5 proposals
Knowledge: Phase III trials are specific; protocol details niche
Steerability: Claude followed "tell me the protocol" without flagging uncertainty
Over-caution fingerprint absent when it should've been present (ironically)
RAG: upload study protocol PDF
Prompt: "Answer ONLY from uploaded protocol. If info not in protocol, say 'Not in uploaded documents.'"
Never ask protocol questions without upload

# Output from code execution:
IRR = 18.73%
NPV = $36.23M

Anti-patterns

❌ "It's just hallucination" — single-property diagnosis

Tại sao sai: Too generic → fix too generic.

Cách đúng: Name 2 properties. "NTP × Knowledge gap on niche topic" → targeted fix (RAG + grounding).

❌ "Re-prompt until it works"

Tại sao sai: Random prompting. Same failure likely repeat. Waste time.

Cách đúng: Diagnose first. Different failure types need different fixes.

❌ "Model got better, same prompt works now"

Tại sao sai: Maybe. But limitations shifted, not disappeared. Collision patterns still exist.

Cách đúng: Keep diagnostic habit. Test periodically. Boundaries move.

❌ "Single fix = solves all my problems"

Tại sao sai: Different tasks have different property profiles. One fix (e.g., "always use RAG") helps some tasks, hurts others.

Cách đúng: Per-task diagnosis. Right tool for right collision.

❌ "AI will eventually ask me if confused"

Tại sao sai: Model often picks interpretation without asking (Steerability + NTP collision).

Cách đúng: Instruct upfront: "If ambiguous, ask before proceeding."

Mẹo nâng cao

Mẹo 1: "Failure log" với property tagging

Maintain log:

Over weeks, patterns emerge. Apply preemptively.

Mẹo 2: Task complexity matrix before starting

Date | Task | Failure | Properties | Fix applied | Result

2026-03-01 | Lit review | Bịa 3 citations | NTP × Knowledge | 
  RAG with PDFs | 100% accurate

2026-03-05 | Long contract review | Missed middle clauses | 
  WM × Steerability | Chunking | Caught all

2026-03-10 | Email redraft | Letter-over-spirit | NTP × Steerability | 
  IPO framework | Better

Mẹo 2: Task complexity matrix before starting

Mẹo 3: "Stress test" bằng cách introduce property issues

To test new prompt:

For each task, rate (1-5):
  - Specificity required (NTP risk)
  - Domain niche-ness (Knowledge risk)
  - Content length (WM risk)
  - Instruction complexity (Steerability risk)

Total > 12 → plan for collision. Use RAG + chunking + checkpoints.
Total < 8 → low risk, standard prompt.

Mẹo 3: "Stress test" bằng cách introduce property issues

Mẹo 4: "Property-aware review" habit

After every AI output:

Test 1 (WM stress): Add 30 irrelevant turns before.
Test 2 (Knowledge stress): Ask about obscure niche variant.
Test 3 (Steerability stress): Give contradicting constraints.
Test 4 (NTP stress): Ask for 5 very specific facts.

See which breaks first → know your weakest property for this task.

Mẹo 4: "Property-aware review" habit

Mẹo 5: Cross-property failure cascades

Some failures cascade:

Quick 30-second review:
  ✓ Specifics (names, numbers, citations): verified?
  ✓ Knowledge currency: up-to-date?
  ✓ All input context attended: nothing missed?
  ✓ Intent (not just instruction): achieved?

If no → identify collision → fix specifically.

Mẹo 5: Cross-property failure cascades

Prevent cascades: catch failures early, one layer at a time.

Initial: Bịa specifics (NTP)
  → Include in longer doc
  → Long doc → WM limit
  → User skim, miss that specifics bịa (NTP amplified)
  → Act on wrong info

Áp dụng ngay

Bài tập 1: Failure Diagnosis (~25 phút)

Lý do: Most real-world AI failures aren't 1 property acting up. They're 2 properties meeting. Naming which 2 changes the fix entirely.

Step 1: Nhìn lại experience với AI (kể cả quan sát trong khóa học này). Identify 2-3 lần AI output thực sự làm bạn disappointed hoặc surprised.

For each, describe trong 1-2 câu: what bạn asked, what bạn got, what was disappointing/surprising.

Step 2: Walk through each event with the AI:

Step 3: Evaluate AI's diagnosis against what bạn giờ biết. Do you agree? If not, push back.

Step 4: For each diagnosis:

If possible, test the adjustment right now on a similar task.

Step 5: Nhìn lại task list từ Bài 17.0 với tất cả annotations:

For tasks gave you trouble most, name which 2 properties were colliding. Write diagnosis next to each.

Bài tập 2: Preemptive collision mapping (optional)

For your top 5 recurring AI tasks, complete:

Apply setups. Track whether collision avoided.

Property tags (Bài 17.1)
Verification scores (Bài 17.3)
Knowledge flags (Bài 17.5)
Context needs (Bài 17.7)
Goal statements (Bài 17.9)

Task	Primary property risk	Secondary property risk	Expected collision	Preemptive setup
1.
2.
3.
4.
5.

Failure 1: ________________________________
Failure 2: ________________________________
Failure 3: ________________________________

Suy ngẫm bài học

Naming the property pair có change fix bạn'd reach for không? Before this course, bạn có chọn different (less effective) fix không?
Which property pairing bạn nghĩ you'll gặp most in day-to-day work?
Có failure nào từ trước mà giờ bạn'd gọi tên differently?

Tóm tắt bài học

🎯 Real-world failures usually involve 2 properties interacting, not 1. Naming both → pointed fix.

🎯 5 diagnostic pairs to recognize:

🎯 Diagnostic habit: before reaching for prompt fix, ask "which properties am I looking at?"

🎯 Targeted fix matrix:

🎯 This diagnostic move is Discernment applied. Bạn evaluate better when bạn know what kind of wrong you're looking at.

🎯 It feeds Delegation too: repeated compound failures on same task-type = signal about what to restructure, break up, or keep cho yourself.

NTP × Knowledge (hallucinated specifics)
Working Memory × Steerability (long-conversation drift)
NTP × Steerability (letter-over-spirit at scale)
Knowledge × Steerability (confidently wrong niche)
All 4 (long multi-step specific-domain)
NTP → grounding, verification
Knowledge → RAG, search, context
Working Memory → chunk, re-supply, fresh chat
Steerability → IPO framework, checkpoints, examples

Tài liệu tham khảo

Bài 17.3, 17.6, 17.8, 17.10 — 4 properties in detail
Anthropic — AI Fluency course — Complementary 4D Framework
Bài 17.12 — Putting it all together

Nội dung này có hữu ích không?