Thực hành — Context Degradation (Khi "thêm context" làm tệ hơn) — Năng lực & Giới hạn của AI

Khi dùng AI, bản năng tự nhiên là đưa cho nó mọi thứ. Paste nguyên document.

Bạn sẽ học được

Trải nghiệm trực tiếp hiệu ứng "lost in the middle" qua một memory test trên chính bạn
Giải thích serial position effect — primacy và recency — và tại sao LLM biểu hiện cùng pattern
Nhận ra context engineering không chỉ là "cái gì include" mà còn là "đặt ở đâu"
Áp dụng 3 quy tắc vàng: place critical instructions at edges, repeat what matters, curate ruthlessly

Phần 1: Memory Test trên chính bạn (5 phút)

Cách làm

Dưới đây là 15 từ. Đọc chúng một lần, theo thứ tự, mỗi từ dành 1.5 giây. Không quay lại.

Sau khi đọc hết, che xuống và cố nhớ nhiều từ nhất có thể (theo thứ tự bất kỳ). Ghi chúng ra giấy trước khi scroll xuống.

Sẵn sàng? Bắt đầu đọc chậm, 1 từ mỗi 1.5 giây:

...

OK, dừng. Che xuống. Viết ra tất cả từ bạn nhớ.

Kết quả

Đếm số từ bạn nhớ. Kiểm tra vị trí chúng trong list:

Dự đoán cao: Bạn nhớ đa số đầu + cuối. Giữa biến mất.

Mặt trời
Máy tính
Con mèo
Cửa sổ
Ô tô
Cây cối
Ly nước
Đồng hồ
Quyển sách
Điện thoại
Đèn bàn
Chuột máy tính
Kính mắt
Sô cô la
Cà phê
Bạn có nhớ từ 1-3 không? (Mặt trời, máy tính, con mèo)
Bạn có nhớ từ 13-15 không? (Kính mắt, sô cô la, cà phê)
Từ 6-9 (giữa list) có nhớ không? (Cây cối, ly nước, đồng hồ, quyển sách)

Phần 2: U-Shaped Curve — Serial Position Effect

Cái bạn vừa trải nghiệm có tên: Serial Position Effect. Nhà tâm lý học đã nghiên cứu 100+ năm. Mô tả qua 2 hiện tượng:

Primacy (đầu list)

Items ở đầu nhận nhiều rehearsal hơn (bạn đọc, rồi đọc lại trong đầu khi đọc tiếp). Chuyển sang long-term memory nhiều hơn.

Recency (cuối list)

Items ở cuối vẫn tươi trong short-term memory — chưa bị "đẩy ra".

Giữa

Không được cả hai lợi thế. Bị cover lên bởi cái mới, không đủ thời gian rehearse. Biến mất.

   Recall    ▲
   probability│
              │ Primacy                 Recency
           ●  │    ●                       ●     ●
              │         ●                ●
    high      │              ●        ●
              │                   ●
              │              Lost in the middle
    low       │
              │
              └──────────────────────────────────▶
                 1   2   3   4   5   6   7   8   9  ... 15
                                Position in list

Phần 3: LLMs biểu hiện cùng pattern

Đây là phần shock: Large language models cho thấy đặc điểm giống hệt.

Research Stanford 2023

Researchers tại Stanford test: đặt 1 fact quan trọng ở các vị trí khác nhau trong một context window dài. Rồi hỏi câu hỏi có đáp án là fact đó.

Kết quả:

Không phải một quirk. Đó là structural. Transformer attention patterns tự nhiên weight edges nhiều hơn.

Tại sao?

Trong transformer self-attention, mỗi token attend tới tất cả tokens khác với trọng số khác nhau. Vì complex lý do kỹ thuật (absolute positional encoding, softmax normalization, training data distribution), attention mass tập trung ở:

Giữa: không có lợi thế structural → attention yếu.

Beginning: token đầu được attend bởi mọi token sau
End: token cuối được attend khi generating next token

Accuracy của retrieval:
   ▲
   │ ████                                           ████
   │ ████                                           ████
   │ ████                                           ████
   │ ████         ██                 ██             ████
   │ ████   ██   ████   ██   ██   ████   ██        ████
   │ ████   ██   ████   ██   ██   ████   ██        ████
   └─────────────────────────────────────────────────────▶
    đầu      giữa context (lost)                    cuối

   Accuracy cao 80-90% ở đầu
   Accuracy drop > 30% ở giữa
   Accuracy cao lại 85-90% ở cuối

Phần 4: Điều này có ý nghĩa gì cho prompting?

Thực tế: nếu bạn paste 20-page document

Bạn paste 20-page doc vào prompt + ask câu hỏi về cái gì đó trên page 11. Model dễ miss hơn nếu cùng thứ ở page 1 hay page 20.

Hệ quả thật:

Dangerous pattern (cần tránh)

Safer pattern

┌──────────────────────────────────────────────────────┐
│  ✓ SAFER PATTERN                                      │
├──────────────────────────────────────────────────────┤
│                                                      │
│  System prompt                                       │
│  ★ Key instruction (up front) ★                      │
│  Chat message 1                                      │
│  Chat message 2                                      │
│  ...                                                 │
│  Chat message 18                                     │
│  Latest user message                                 │
│  ★ Key instruction (repeated) ★                      │
│                                                      │
│  → Critical info at BOTH edges                       │
│                                                      │
└──────────────────────────────────────────────────────┘

Task:     Review 100-page contract, find clause 47 issues
Setup:    Paste toàn contract, ask "Review contract. 
          Any issues với clause 47?"
Reality:  Model focus pages 1-5 và 95-100. 
          Clause 47 (ở page 43-45) → nhận attention yếu 
          → miss issues tinh tế

3 quy tắc vàng context engineering

Quy tắc 1: Place critical info at edges

Đầu (primacy): System prompt, style guide, role, format requirements, critical constraints.

Cuối (recency): Current question, most urgent instruction, output format reminder.

Giữa: Reference material, examples, conversation history — cái không cần attend precisely.

Quy tắc 2: Repeat what matters

Nếu instruction là critical, restate nó near user message ở cuối:

Quy tắc 3: Curate ruthlessly

Quality > Quantity. 5 page relevant reference > 50 pages "all the info I have".

Mọi piece of context bạn add đẩy pieces khác vào middle — attention dead zone.

System prompt:
"You are Claude. Always respond in markdown. Max 200 words.
 Never recommend medications."

[... middle of prompt with lots of content ...]

User:
"My friend has a headache, what should I do?

REMINDER: Output in markdown, max 200 words, no medication 
recommendations."

Context engineering vs prompt engineering

Prompt engineering: Cách viết instruction để có output tốt.

Context engineering: Cách chọn + cấu trúc + đặt vị trí information để model attend đúng chỗ.

Cả hai đi đôi. Nhưng context engineering quan trọng hơn khi:

Task có long reference material
Conversation dài
Multiple constraints cần giữ nguyên
Accuracy > creativity

Ví dụ theo ngành

⚖️ Legal — Redesign contract review prompt

Trước (dangerous pattern):

Sau (safer pattern):

"You're a legal counsel. Review this contract for issues.

[Paste 60-page contract]

What do you think?"

⚖️ Legal — Redesign contract review prompt

Kết quả: Coverage từ 70% → 95%. Issues phát hiện ở sections 8-12 (middle) tăng 3x.

💰 Finance — 10-K analysis prompt

SYSTEM:
You are senior legal counsel specializing in M&A. For each contract,
find: (1) ambiguous termination clauses, (2) uncapped indemnification,
(3) missing IP assignments, (4) non-compete scope issues.

Output format: Table with columns [Issue #, Severity H/M/L, 
Section/Clause ref, Description, Suggested fix]

--- CONTRACT START ---
[Paste 60-page contract with clear section headers]
--- CONTRACT END ---

REMINDER: Output must be table. Severity levels per above. 
Focus on the 4 categories listed. Scan ALL sections, 
including the middle sections where ambiguity often hides.

💰 Finance — 10-K analysis prompt

Sau (với anchors):

"Read this 10-K and tell me about the company's financial health."
[Paste full 10-K, 200 pages]

Ví dụ theo ngành (tiếp)

Kết quả: Discovery rate of material items increase 40%. Ít bị miss Risk Factors (historically ở middle).

📝 Content Marketing — Brief dài

SYSTEM:
Analyze 10-K for:
  1. Revenue growth trajectory (YoY, QoQ)
  2. Margin trends (gross, operating, net)
  3. Balance sheet health (cash, debt, working capital)
  4. Risk factors NEW or MATERIALLY CHANGED from prior year

Output: Executive summary (< 300 words) + detailed findings table.

--- 10-K CONTENT START ---
[full 10-K]
--- 10-K CONTENT END ---

REMINDER: Focus on the 4 areas above. Specifically check 
Item 1A (Risk Factors), Item 7 (MD&A), and Notes to Financial 
Statements — which often contain material info in the middle 
of the document. Cite specific page/section for each finding.

📝 Content Marketing — Brief dài

[Dán 20 tài liệu reference — brand guidelines, examples, 
campaign briefs, persona docs, competitive analyses]

"Write a campaign brief for new product launch."

Ví dụ theo ngành (tiếp)

Kết quả: Campaign brief on-brand 95% of time. Reduced editing from 45 minutes to 10.

🎧 Customer Support — Ticket analysis

Pain: "Paste 50 tickets + ask 'find common issues' → model miss patterns ở giữa."

Solution:

SYSTEM:
You are creating a campaign brief for product launch.

CRITICAL REQUIREMENTS:
  - Tone: [brand voice tone - 1 sentence]
  - Audience: [persona - 2 sentences]
  - Goal: [campaign objective - 1 sentence]
  - Format: [required template structure]

REFERENCE MATERIAL:
[Brand guidelines - summary only, not full doc]
[2-3 best past examples, not all]
[Competitive snapshot - 1 page]

User request: Create campaign brief for [Product X launch].

REMINDER: Follow the format template. Match tone and persona 
from the critical requirements above. Reference past examples 
for structure inspiration.

🎧 Customer Support — Ticket analysis

Kết quả: Coverage bằng nhau across ticket positions. Pattern detection 2x better.

TASK: Identify top 5 recurring issues in customer tickets.

Output format:
  Issue | Frequency | Example ticket IDs | Severity

METHODOLOGY:
  1. Read all tickets
  2. Cluster by issue type
  3. Rank by frequency × severity
  
--- TICKETS START ---
[50 tickets, chronological order]
--- TICKETS END ---

REMINDER: Focus on CLUSTERING, not individual ticket summary. 
Check ticket IDs 20-30 (middle batch) same as 1-10 and 41-50.
Output must be table format.

Anti-patterns

❌ "Big context dump → AI tự figure out cái quan trọng"

Tại sao sai: AI không có priority labeling. Mọi token nhận attention (với weight khác nhau). Dump = middle attention loss.

Cách đúng: Bạn làm priority labeling cho model qua cấu trúc prompt.

❌ "Repeat critical info là redundant"

Tại sao sai: Không. Trong context dài, repeat ở cuối = counteract "lost in middle". Attention cao ở cuối.

Cách đúng: Repeat critical constraint ở đầu (system prompt) và cuối (near user message). Không phải nonsense repetition — purposeful.

❌ "Chat dài vẫn OK vì model handle 200K tokens"

Tại sao sai: Capacity khác attention quality. Model có thể fit 200K, nhưng attend đều thì không. Lost in middle vẫn xảy ra.

Cách đúng: Khi chat > 50 turns, checkpoint + start fresh với summary.

❌ "Copy-paste tất cả tin nhắn Slack hôm qua vào cho AI context"

Tại sao sai: Signal-to-noise ratio thấp. 95% tin nhắn irrelevant → dilute attention cho 5% relevant.

Cách đúng: Extract + summarize tin nhắn relevant trước khi paste. Hoặc dùng RAG.

Mẹo nâng cao

Mẹo 1: "XML tags" làm attention anchors

Model Claude được huấn luyện attention đặc biệt với XML-like tags:

Tags giúp Claude parse structure → attend đúng.

Mẹo 2: "Table of contents" cho long prompts

Add TOC ở đầu long prompt:

<task>
Review contract for 4 types of issues.
</task>

<critical_requirements>
- Output format: table
- Coverage: all sections
- Severity rating: H/M/L
</critical_requirements>

<contract>
[long contract content]
</contract>

<reminder>
Verify table format. Ensure middle sections covered.
</reminder>

Mẹo 2: "Table of contents" cho long prompts

Model "plan" attention với structure đã biết.

Mẹo 3: "Highlight" critical instruction

Dùng markdown emphasis + capitalization + punctuation:

DOCUMENT STRUCTURE:
1. System instructions (section 1, crucial)
2. Background context (section 2, reference only)
3. Past examples (section 3, for style reference)
4. Current task (section 4, main question)
5. Output requirements (section 5, crucial)

--- SECTION 1: SYSTEM INSTRUCTIONS ---
...

--- SECTION 5: OUTPUT REQUIREMENTS ---
...

Mẹo 3: "Highlight" critical instruction

Visual prominence → attention draw.

Mẹo 4: Adaptive chunking thresholds

Chunk nhỏ hơn khi cần precision.

Mẹo 5: Cross-chunk synthesis pattern

Khi chunk long doc:

Task complexity	Chunk size (pages)
Simple summarization	20-30
Issue detection	10-15
Detailed analysis	5-10
Line-by-line review	3-5

⚠️ **CRITICAL**: All monetary figures must include currency code. 
NEVER output "$100" — always "$100 USD" or "100 EUR".

[long content]

⚠️ **REMINDER**: Currency codes required on all figures.

Mẹo 5: Cross-chunk synthesis pattern

Final pass catches cross-chunk patterns.

Per chunk: Save findings to running notes

After all chunks:
  "Below are findings from 10 chunks of a 100-page document.
  Synthesize into final report. Check for:
  - Contradictions across chunks
  - Patterns only visible cross-chunk
  - Missing coverage
  
  [notes from all chunks]"

Áp dụng ngay

Bài tập 1: Lost-in-the-middle probe (~15 phút)

Đo hiệu ứng trực tiếp trên tool bạn dùng:

Setup:

Test A (instruction at start):

Observe: Claude có bắt đầu với "The secret word is ROSE" không?

Test B (instruction in middle):

Copy 3 đoạn text dài (khoảng 300 từ mỗi đoạn) từ Wikipedia của 3 chủ đề khác nhau (VD: Mars, Photosynthesis, World War II)
Tổng ~900 từ = ~1200 tokens — không quá dài, vừa đủ test

CRITICAL INSTRUCTION: When you finish reading the 3 articles below, 
your first sentence must be "The secret word is ROSE."

Article 1: Mars...
Article 2: Photosynthesis...
Article 3: World War II...

Now summarize the 3 articles in 3 bullets.

Bài tập 1: Lost-in-the-middle probe (~15 phút)

Observe: Claude có bắt đầu với "The secret word is TULIP" không? Có thể bỏ qua.

Test C (instruction at start + end):

Article 1: Mars...
Article 2: Photosynthesis...

CRITICAL INSTRUCTION: When you finish reading the 3 articles below, 
your first sentence must be "The secret word is TULIP."

Article 3: World War II...

Now summarize the 3 articles in 3 bullets.

Áp dụng ngay (tiếp)

Observe: Reliability cao hơn?

Compare: Position determines attention quality. Ghi lại cái thấy được.

Bài tập 2: Audit 1 long prompt của bạn

Pick 1 prompt bạn dùng recurring với long context:

Ghi ra structure hiện tại (system → docs → instruction)
Identify critical instructions → đang ở đâu?
Redesign với:
Critical ở đầu (system)
Reference material ở middle
Critical reminder ở cuối (before user message)
Test cả 2 versions → compare output quality

CRITICAL INSTRUCTION: When you finish, first sentence = 
"The secret word is LILY."

Article 1...
Article 2...
Article 3...

REMINDER: First sentence = "The secret word is LILY."

Now summarize...

Suy ngẫm bài học

Kết quả memory test của chính bạn có feel familiar không? (Cái giữa biến mất.)
Có prompt nào trong workflow của bạn đang rơi vào dangerous pattern?
Context engineering có làm bạn thay đổi cách nghĩ về "prompt" không?

Tóm tắt bài học

🎯 Serial position effect (primacy + recency) — con người nhớ đầu + cuối tốt hơn giữa.

🎯 LLMs show same pattern — Stanford 2023 research: fact ở giữa context → accuracy giảm > 30% vs đầu/cuối.

🎯 Đây là structural, không phải quirk. Transformer attention weight edges naturally.

🎯 3 quy tắc vàng:

🎯 Context engineering = cái bạn include + ở đâu + cái loại ra. Không chỉ prompt.

🎯 Core takeaway: "The fix isn't more — it's smarter."

Place critical instructions at edges (đầu + cuối)
Repeat what matters
Curate ruthlessly — more context ≠ better results

Tài liệu tham khảo

Liu et al. (2023) — "Lost in the Middle: How Language Models Use Long Contexts" (Stanford)
Miller (1956) — "The Magical Number Seven, Plus or Minus Two" — seminal work on working memory
Anthropic — "Prompting with XML tags" docs
Bài 17.7 — Working Memory lý thuyết
Bài 17.9 — Steerability (next)

Nội dung này có hữu ích không?