{"product_id":"claude-cho-bio-research-phat-triển-nextflow-pipelines-với-nf-core","title":"Claude cho Bio Research: Phát triển Nextflow pipelines với nf-core","description":"\n\u003cp\u003eCác nhà khoa học tại bench thường phải dựa vào đội bioinformatics để phân tích dữ liệu sequencing — một nút cổ chai làm chậm tiến độ nghiên cứu. Với Claude và framework nf-core, các nhà nghiên cứu có thể tự chạy phân tích RNA-seq, variant calling, hoặc ATAC-seq mà không cần chuyên môn lập trình sâu.\u003c\/p\u003e\n\n\u003ch2\u003eBa pipeline nf-core được hỗ trợ\u003c\/h2\u003e\n\u003ctable\u003e\n  \u003cthead\u003e\u003ctr\u003e\n\u003cth\u003eLoại dữ liệu\u003c\/th\u003e\n\u003cth\u003ePipeline\u003c\/th\u003e\n\u003cth\u003eMục đích\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n\u003ctd\u003eRNA-seq\u003c\/td\u003e\n\u003ctd\u003e\n\u003ccode\u003enf-core\/rnaseq\u003c\/code\u003e v3.22.2\u003c\/td\u003e\n\u003ctd\u003eGene expression, differential expression\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eWGS\/WES\u003c\/td\u003e\n\u003ctd\u003e\n\u003ccode\u003enf-core\/sarek\u003c\/code\u003e v3.7.1\u003c\/td\u003e\n\u003ctd\u003eVariant calling (germline \u0026amp; somatic)\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eATAC-seq\u003c\/td\u003e\n\u003ctd\u003e\n\u003ccode\u003enf-core\/atacseq\u003c\/code\u003e v2.1.2\u003c\/td\u003e\n\u003ctd\u003eChromatin accessibility analysis\u003c\/td\u003e\n\u003c\/tr\u003e\n  \u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003ch2\u003eChecklist quy trình đầy đủ\u003c\/h2\u003e\n\u003col\u003e\n  \u003cli\u003eLấy dữ liệu (nếu từ GEO\/SRA)\u003c\/li\u003e\n  \u003cli\u003eKiểm tra môi trường (bắt buộc pass)\u003c\/li\u003e\n  \u003cli\u003eChọn pipeline\u003c\/li\u003e\n  \u003cli\u003eChạy test profile (bắt buộc pass)\u003c\/li\u003e\n  \u003cli\u003eTạo samplesheet\u003c\/li\u003e\n  \u003cli\u003eCấu hình và chạy pipeline\u003c\/li\u003e\n  \u003cli\u003eXác nhận output\u003c\/li\u003e\n\u003c\/ol\u003e\n\n\u003ch2\u003eBước 0: Tải dữ liệu từ GEO\/SRA (nếu cần)\u003c\/h2\u003e\n\u003cp\u003eNếu bạn đã có FASTQ local, bỏ qua bước này. Nếu muốn phân tích lại dataset công khai từ NCBI GEO:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003e# Xem thông tin study\npython scripts\/sra_geo_fetch.py info GSE110004\n\n# Tải xuống (interactive mode — chọn subset)\npython scripts\/sra_geo_fetch.py download GSE110004 -o .\/fastq -i\n\n# Tạo samplesheet tự động\npython scripts\/sra_geo_fetch.py samplesheet GSE110004 --fastq-dir .\/fastq -o samplesheet.csv\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eSau khi xem thông tin study, hãy xác nhận với Claude:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003eTôi muốn phân tích dataset GSE110004 — RNA-seq của tế bào T.\nThông tin study cho thấy có 12 mẫu (6 control, 6 treated).\nHãy đề xuất genome và pipeline phù hợp.\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 1: Kiểm tra môi trường\u003c\/h2\u003e\n\u003cp\u003ePipeline sẽ thất bại nếu môi trường không đạt yêu cầu. Chạy kiểm tra trước:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003epython scripts\/check_environment.py\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eCác vấn đề thường gặp và cách xử lý:\u003c\/p\u003e\n\u003ctable\u003e\n  \u003cthead\u003e\u003ctr\u003e\n\u003cth\u003eVấn đề\u003c\/th\u003e\n\u003cth\u003eGiải pháp\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n\u003ctd\u003eDocker chưa cài\u003c\/td\u003e\n\u003ctd\u003eCài từ https:\/\/docs.docker.com\/get-docker\/\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eDocker permission denied\u003c\/td\u003e\n\u003ctd\u003e\n\u003ccode\u003esudo usermod -aG docker $USER\u003c\/code\u003e rồi re-login\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eDocker daemon không chạy\u003c\/td\u003e\n\u003ctd\u003e\u003ccode\u003esudo systemctl start docker\u003c\/code\u003e\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eNextflow chưa cài\u003c\/td\u003e\n\u003ctd\u003e\u003ccode\u003ecurl -s https:\/\/get.nextflow.io | bash \u0026amp;\u0026amp; mv nextflow ~\/bin\/\u003c\/code\u003e\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eNextflow phiên bản cũ (\u0026lt; 23.04)\u003c\/td\u003e\n\u003ctd\u003e\u003ccode\u003enextflow self-update\u003c\/code\u003e\u003c\/td\u003e\n\u003c\/tr\u003e\n    \u003ctr\u003e\n\u003ctd\u003eJava chưa cài \/ \u0026lt; v11\u003c\/td\u003e\n\u003ctd\u003e\u003ccode\u003esudo apt install openjdk-11-jdk\u003c\/code\u003e\u003c\/td\u003e\n\u003c\/tr\u003e\n  \u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003ch2\u003eBước 2: Chạy test profile\u003c\/h2\u003e\n\u003cp\u003eTrước khi xử lý dữ liệu thật, chạy test với dataset nhỏ để xác nhận môi trường hoạt động:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003e# RNA-seq\nnextflow run nf-core\/rnaseq -r 3.22.2 -profile test,docker --outdir test_rnaseq\n\n# Sarek (variant calling)\nnextflow run nf-core\/sarek -r 3.7.1 -profile test,docker --outdir test_sarek\n\n# ATAC-seq\nnextflow run nf-core\/atacseq -r 2.1.2 -profile test,docker --outdir test_atacseq\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eXác nhận test thành công:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003els test_rnaseq\/multiqc\/multiqc_report.html\ngrep \"Pipeline completed successfully\" .nextflow.log\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 3: Tạo Samplesheet\u003c\/h2\u003e\n\u003cp\u003eTạo samplesheet tự động từ thư mục FASTQ:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003epython scripts\/generate_samplesheet.py \/path\/to\/fastq rnaseq -o samplesheet.csv\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eFormat samplesheet cho từng pipeline:\u003c\/p\u003e\n\n\u003cp\u003e\u003cstrong\u003ernaseq:\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003esample,fastq_1,fastq_2,strandedness\nCONTROL_REP1,\/data\/ctrl1_R1.fq.gz,\/data\/ctrl1_R2.fq.gz,auto\nTREATED_REP1,\/data\/treat1_R1.fq.gz,\/data\/treat1_R2.fq.gz,auto\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003e\u003cstrong\u003esarek (tumor\/normal pairs):\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003epatient,sample,lane,fastq_1,fastq_2,status\npatient1,tumor,L001,\/data\/tumor_R1.fq.gz,\/data\/tumor_R2.fq.gz,1\npatient1,normal,L001,\/data\/normal_R1.fq.gz,\/data\/normal_R2.fq.gz,0\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003e\u003cstrong\u003eatacseq:\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003esample,fastq_1,fastq_2,replicate\nCONTROL,\/data\/ctrl_R1.fq.gz,\/data\/ctrl_R2.fq.gz,1\nTREATMENT,\/data\/treat_R1.fq.gz,\/data\/treat_R2.fq.gz,1\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eBước 4: Cấu hình và chạy pipeline\u003c\/h2\u003e\n\u003cp\u003eKiểm tra genome tham chiếu trước:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003epython scripts\/manage_genomes.py check GRCh38\n# Nếu chưa có:\npython scripts\/manage_genomes.py download GRCh38\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eGenome thường dùng: \u003ccode\u003eGRCh38\u003c\/code\u003e (người), \u003ccode\u003eGRCm39\u003c\/code\u003e (chuột), \u003ccode\u003eR64-1-1\u003c\/code\u003e (nấm men)\u003c\/p\u003e\n\n\u003cp\u003eChạy pipeline RNA-seq đầy đủ:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003enextflow run nf-core\/rnaseq     -r 3.22.2     -profile docker     --input samplesheet.csv     --outdir results     --genome GRCh38     --aligner star_salmon     -resume\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eFlag quan trọng:\u003c\/p\u003e\n\u003cul\u003e\n  \u003cli\u003e\n\u003ccode\u003e-r\u003c\/code\u003e: Pin version để tái lập kết quả\u003c\/li\u003e\n  \u003cli\u003e\n\u003ccode\u003e-profile docker\u003c\/code\u003e: Dùng Docker (hoặc \u003ccode\u003esingularity\u003c\/code\u003e cho HPC)\u003c\/li\u003e\n  \u003cli\u003e\n\u003ccode\u003e-resume\u003c\/code\u003e: Tiếp tục từ checkpoint nếu bị gián đoạn\u003c\/li\u003e\n  \u003cli\u003e\n\u003ccode\u003e--max_cpus 8 --max_memory '32.GB'\u003c\/code\u003e: Giới hạn tài nguyên nếu cần\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2\u003eBước 5: Xác nhận và đọc output\u003c\/h2\u003e\n\n\u003cp\u003e\u003cstrong\u003eOutput RNA-seq:\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003e# Gene counts (dùng cho DESeq2, edgeR)\nresults\/star_salmon\/salmon.merged.gene_counts.tsv\n\n# TPM values (dùng cho visualization)\nresults\/star_salmon\/salmon.merged.gene_tpm.tsv\n\n# MultiQC report\nresults\/multiqc\/multiqc_report.html\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003e\u003cstrong\u003eOutput Sarek (variant calling):\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003e# VCF files\nresults\/variant_calling\/*\/\n\n# Recalibrated BAM\nresults\/preprocessing\/recalibrated\/\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003e\u003cstrong\u003eOutput ATAC-seq:\u003c\/strong\u003e\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003e# Peak calls\nresults\/macs2\/narrowPeak\/\n\n# BigWig tracks cho genome browser\nresults\/bwa\/mergedLibrary\/bigwig\/\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003ch2\u003eTiếp tục sau khi pipeline thất bại\u003c\/h2\u003e\n\u003cp\u003eNếu pipeline dừng giữa chừng, chỉ cần resume — Nextflow sẽ tự động skip các step đã hoàn thành:\u003c\/p\u003e\n\u003cpre\u003e\u003ccode\u003enextflow run nf-core\/rnaseq -resume\u003c\/code\u003e\u003c\/pre\u003e\n\n\u003cp\u003eClaude có thể giúp chẩn đoán lỗi khi bạn paste nội dung file \u003ccode\u003e.nextflow.log\u003c\/code\u003e và mô tả vấn đề gặp phải.\u003c\/p\u003e\n\n\u003ch2\u003eLưu ý về citation\u003c\/h2\u003e\n\u003cp\u003eKhi publish kết quả, hãy cite pipeline đã dùng. Thông tin citation có trong file \u003ccode\u003eCITATIONS.md\u003c\/code\u003e của từng repository nf-core — ví dụ: \u003ccode\u003ehttps:\/\/github.com\/nf-core\/rnaseq\/blob\/3.22.2\/CITATIONS.md\u003c\/code\u003e.\u003c\/p\u003e\n\n\u003ch2\u003eBước tiếp theo\u003c\/h2\u003e\n\u003cp\u003eĐây là một trong những pipeline bioinformatics mạnh mẽ nhất hiện có. Sau khi có gene counts từ RNA-seq, bước tiếp theo thường là phân tích differential expression với DESeq2 hoặc edgeR. Khám phá thêm các hướng dẫn tại \u003ca href=\"\/collections\/ung-dung\"\u003ebộ sưu tập Ứng dụng\u003c\/a\u003e.\u003c\/p\u003e\n\n\n\u003chr\u003e\n\u003ch3\u003eBài viết liên quan\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\u003ca href=\"\/products\/claude-cho-bio-research-b%E1%BA%AFt-d%E1%BA%A7u-d%E1%BB%B1-an-nghien-c%E1%BB%A9u-sinh-h%E1%BB%8Dc\"\u003eClaude cho Bio Research: Bắt đầu dự án nghiên cứu sinh học\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/products\/claude-nghien-c%E1%BB%A9u-sinh-h%E1%BB%8Dc-h%C6%B0%E1%BB%9Bng-d%E1%BA%ABn-k%E1%BA%BFt-n%E1%BB%91i-cong-c%E1%BB%A5\"\u003eClaude Nghiên cứu Sinh học: Hướng dẫn Kết nối Công cụ\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/products\/claude-cho-bio-research-chuy%E1%BB%83n-d%E1%BB%95i-d%E1%BB%AF-li%E1%BB%87u-thi%E1%BA%BFt-b%E1%BB%8B-sang-d%E1%BB%8Bnh-d%E1%BA%A1ng-allotrope\"\u003eClaude cho Bio Research: Chuyển đổi dữ liệu thiết bị sang định dạng Allotrope\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/products\/claude-cho-data-validation-va-data-quality\"\u003eClaude cho Data: Validation và data quality\u003c\/a\u003e\u003c\/li\u003e\n\u003cli\u003e\u003ca href=\"\/products\/claude-cho-engineering-debug-va-x%E1%BB%AD-ly-l%E1%BB%97i\"\u003eClaude cho Engineering: Debug và xử lý lỗi\u003c\/a\u003e\u003c\/li\u003e\n\u003c\/ul\u003e","brand":"Minh Tuấn","offers":[{"title":"Default Title","offer_id":47722090987732,"sku":null,"price":0.0,"currency_code":"VND","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0821\/0264\/9044\/files\/claude-cho-bio-research-phat-tri_n-nextflow-pipelines-v_i-nf-core_e178a73d-ac38-4d80-b25d-5535ccb5c084.jpg?v=1774521907","url":"https:\/\/claude.vn\/products\/claude-cho-bio-research-phat-tri%e1%bb%83n-nextflow-pipelines-v%e1%bb%9bi-nf-core","provider":"CLAUDE.VN","version":"1.0","type":"link"}