Nâng caoHướng dẫnClaude ChatNguồn: Anthropic

Claude cho Bio Research: Phát triển Nextflow pipelines với nf-core

Minh TuấnCTO, Transform GroupTheo dõi

26/03/2026 633 0 3 phút đọc

Nghe bài viết

00:00

Grey porsche with bright orange wheels parked outside.

Các nhà khoa học tại bench thường phải dựa vào đội bioinformatics để phân tích dữ liệu sequencing — một nút cổ chai làm chậm tiến độ nghiên cứu. Với Claude và framework nf-core, các nhà nghiên cứu có thể tự chạy phân tích RNA-seq, variant calling, hoặc ATAC-seq mà không cần chuyên môn lập trình sâu.

Ba pipeline nf-core được hỗ trợ

Loại dữ liệu	Pipeline	Mục đích
RNA-seq	`nf-core/rnaseq` v3.22.2	Gene expression, differential expression
WGS/WES	`nf-core/sarek` v3.7.1	Variant calling (germline & somatic)
ATAC-seq	`nf-core/atacseq` v2.1.2	Chromatin accessibility analysis

Checklist quy trình đầy đủ

Lấy dữ liệu (nếu từ GEO/SRA)
Kiểm tra môi trường (bắt buộc pass)
Chọn pipeline
Chạy test profile (bắt buộc pass)
Tạo samplesheet
Cấu hình và chạy pipeline
Xác nhận output

Bước 0: Tải dữ liệu từ GEO/SRA (nếu cần)

Nếu bạn đã có FASTQ local, bỏ qua bước này. Nếu muốn phân tích lại dataset công khai từ NCBI GEO:

# Xem thông tin study
python scripts/sra_geo_fetch.py info GSE110004

# Tải xuống (interactive mode — chọn subset)
python scripts/sra_geo_fetch.py download GSE110004 -o ./fastq -i

# Tạo samplesheet tự động
python scripts/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir ./fastq -o samplesheet.csv

Sau khi xem thông tin study, hãy xác nhận với Claude:

Tôi muốn phân tích dataset GSE110004 — RNA-seq của tế bào T.
Thông tin study cho thấy có 12 mẫu (6 control, 6 treated).
Hãy đề xuất genome và pipeline phù hợp.

Bước 1: Kiểm tra môi trường

Pipeline sẽ thất bại nếu môi trường không đạt yêu cầu. Chạy kiểm tra trước:

python scripts/check_environment.py

Các vấn đề thường gặp và cách xử lý:

Vấn đề	Giải pháp
Docker chưa cài	Cài từ https://docs.docker.com/get-docker/
Docker permission denied	`sudo usermod -aG docker $USER` rồi re-login
Docker daemon không chạy	`sudo systemctl start docker`
Nextflow chưa cài	`curl -s https://get.nextflow.io \| bash && mv nextflow ~/bin/`
Nextflow phiên bản cũ (< 23.04)	`nextflow self-update`
Java chưa cài / < v11	`sudo apt install openjdk-11-jdk`

Bước 2: Chạy test profile

Trước khi xử lý dữ liệu thật, chạy test với dataset nhỏ để xác nhận môi trường hoạt động:

# RNA-seq
nextflow run nf-core/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq

# Sarek (variant calling)
nextflow run nf-core/sarek -r 3.7.1 -profile test,docker --outdir test_sarek

# ATAC-seq
nextflow run nf-core/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq

Xác nhận test thành công:

ls test_rnaseq/multiqc/multiqc_report.html
grep "Pipeline completed successfully" .nextflow.log

Bước 3: Tạo Samplesheet

Tạo samplesheet tự động từ thư mục FASTQ:

python scripts/generate_samplesheet.py /path/to/fastq rnaseq -o samplesheet.csv

Format samplesheet cho từng pipeline:

rnaseq:

sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,/data/ctrl1_R1.fq.gz,/data/ctrl1_R2.fq.gz,auto
TREATED_REP1,/data/treat1_R1.fq.gz,/data/treat1_R2.fq.gz,auto

sarek (tumor/normal pairs):

patient,sample,lane,fastq_1,fastq_2,status
patient1,tumor,L001,/data/tumor_R1.fq.gz,/data/tumor_R2.fq.gz,1
patient1,normal,L001,/data/normal_R1.fq.gz,/data/normal_R2.fq.gz,0

atacseq:

sample,fastq_1,fastq_2,replicate
CONTROL,/data/ctrl_R1.fq.gz,/data/ctrl_R2.fq.gz,1
TREATMENT,/data/treat_R1.fq.gz,/data/treat_R2.fq.gz,1

Bước 4: Cấu hình và chạy pipeline

Kiểm tra genome tham chiếu trước:

python scripts/manage_genomes.py check GRCh38
# Nếu chưa có:
python scripts/manage_genomes.py download GRCh38

Genome thường dùng: GRCh38 (người), GRCm39 (chuột), R64-1-1 (nấm men)

Chạy pipeline RNA-seq đầy đủ:

nextflow run nf-core/rnaseq     -r 3.22.2     -profile docker     --input samplesheet.csv     --outdir results     --genome GRCh38     --aligner star_salmon     -resume

Flag quan trọng:

-r: Pin version để tái lập kết quả
-profile docker: Dùng Docker (hoặc singularity cho HPC)
-resume: Tiếp tục từ checkpoint nếu bị gián đoạn
--max_cpus 8 --max_memory '32.GB': Giới hạn tài nguyên nếu cần

Bước 5: Xác nhận và đọc output

Output RNA-seq:

# Gene counts (dùng cho DESeq2, edgeR)
results/star_salmon/salmon.merged.gene_counts.tsv

# TPM values (dùng cho visualization)
results/star_salmon/salmon.merged.gene_tpm.tsv

# MultiQC report
results/multiqc/multiqc_report.html

Output Sarek (variant calling):

# VCF files
results/variant_calling/*/

# Recalibrated BAM
results/preprocessing/recalibrated/

Output ATAC-seq:

# Peak calls
results/macs2/narrowPeak/

# BigWig tracks cho genome browser
results/bwa/mergedLibrary/bigwig/

Tiếp tục sau khi pipeline thất bại

Nếu pipeline dừng giữa chừng, chỉ cần resume — Nextflow sẽ tự động skip các step đã hoàn thành:

nextflow run nf-core/rnaseq -resume

Claude có thể giúp chẩn đoán lỗi khi bạn paste nội dung file .nextflow.log và mô tả vấn đề gặp phải.

Lưu ý về citation

Khi publish kết quả, hãy cite pipeline đã dùng. Thông tin citation có trong file CITATIONS.md của từng repository nf-core — ví dụ: https://github.com/nf-core/rnaseq/blob/3.22.2/CITATIONS.md.

Bước tiếp theo

Đây là một trong những pipeline bioinformatics mạnh mẽ nhất hiện có. Sau khi có gene counts từ RNA-seq, bước tiếp theo thường là phân tích differential expression với DESeq2 hoặc edgeR. Khám phá thêm các hướng dẫn tại bộ sưu tập Ứng dụng.

Gợi ý cho bạn

Claude cho Bio Research: Bắt đầu dự án nghiên cứu sinh học

Claude cho Bio Research: Phát triển Nextflow pipelines với nf-core

Ba pipeline nf-core được hỗ trợ

Checklist quy trình đầy đủ

Bước 0: Tải dữ liệu từ GEO/SRA (nếu cần)

Bước 1: Kiểm tra môi trường

Bước 2: Chạy test profile

Bước 3: Tạo Samplesheet

Bước 4: Cấu hình và chạy pipeline

Bước 5: Xác nhận và đọc output

Tiếp tục sau khi pipeline thất bại

Lưu ý về citation

Bước tiếp theo

Bài viết liên quan

Gợi ý cho bạn

Claude cho Bio Research: Bắt đầu dự án nghiên cứu sinh học

Claude cho Nghiên cứu Sinh học: Tổng quan Plugin

Claude cho Bio Research: Chuyển đổi dữ liệu thiết bị sang định dạng Allotrope

Claude cho Bio Research: Phân tích single-cell với scvi-tools

Tin liên quan nên xem

Claude cho Bio Research: QC dữ liệu single-cell RNA-seq

Claude Nghiên cứu Sinh học: Hướng dẫn Kết nối Công cụ

Claude cho Bio Research: Chọn vấn đề nghiên cứu khoa học

Tool Use với Pydantic — Type-safe tools cho Claude

Claude cho Bio Research: Phát triển Nextflow pipelines với nf-core

Ba pipeline nf-core được hỗ trợ

Checklist quy trình đầy đủ

Bước 0: Tải dữ liệu từ GEO/SRA (nếu cần)

Bước 1: Kiểm tra môi trường

Bước 2: Chạy test profile

Bước 3: Tạo Samplesheet

Bước 4: Cấu hình và chạy pipeline

Bước 5: Xác nhận và đọc output

Tiếp tục sau khi pipeline thất bại

Lưu ý về citation

Bước tiếp theo

Bài viết liên quan

Gợi ý cho bạn

Claude cho Bio Research: Bắt đầu dự án nghiên cứu sinh học

Claude cho Nghiên cứu Sinh học: Tổng quan Plugin

Claude cho Bio Research: Chuyển đổi dữ liệu thiết bị sang định dạng Allotrope

Claude cho Bio Research: Phân tích single-cell với scvi-tools

Tin liên quan nên xem

Claude cho Bio Research: QC dữ liệu single-cell RNA-seq

Claude Nghiên cứu Sinh học: Hướng dẫn Kết nối Công cụ

Claude cho Bio Research: Chọn vấn đề nghiên cứu khoa học

Tool Use với Pydantic — Type-safe tools cho Claude

Đăng ký nhận bản tin