Ting-You Wang

Nanopore direct RNA data analysis

Posted on January 20, 2022

An Introduction to Nanopore direct RNA data analysis.

Software preparation

# Install Guppy CPU version
wget -c  https://cdn.oxfordnanoportal.com/software/analysis/ont-guppy-cpu_6.5.7_linux64.tar.gz
tar zxvf ont-guppy-cpu_6.5.7_linux64.tar.gz

# Install Guppy GPU version
wget -c https://cdn.oxfordnanoportal.com/software/analysis/ont-guppy_6.5.7_linux64.tar.gz
tar zxvf ont-guppy_6.5.7_linux64.tar.gz

# add ont-guppy-cpu/bin to $PATH in .bashrc file
PATH=/path/to/ont-guppy-cpu/bin:$PATH

# install minimap2 and samtools
conda install -c bioconda minimap2 # paftools.js will be install automatically.
conda install -c bioconda samtools

Annotation preparation

wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/gencode.v39.annotation.gff3.gz
gunzip gencode.v39.annotation.gff3.gz

paftools.js gff2bed gencode.v39.annotation.gff3 > hg38.bigbed

Step1: Basecalling

CPU-based basecalling

guppy_basecaller --input_path ./fast5 --save_path ./guppy_output --flowcell FLO-MIN106 --kit SQK-RNA002 --calib_detect --num_callers 16 --cpu_threads_per_caller 8 --compress_fastq --reverse_sequence --u_substitution

GPU-based basecalling

guppy_basecaller --input_path ./fast5 --save_path ./guppy_output --flowcell FLO-MIN106 --kit SQK-RNA002 --calib_detect --num_callers 16 ----gpu_runners_per_device 80 -x "cuda:all" --compress_fastq --reverse_sequence --u_substitution

WARNING: Use RNA-specific parameters, –calib_detect, –reverse_sequence, –u_substitution.

Options

--input_path  # The location of FAST5 files
--save_path # The location of output FASTQ files. It have three subfolders (pass, fail, and calibration_strands).
--calib_detect  # Enable RNA calibration strand (RCS) detection and filtering.
--reverse_sequence # Reverse the called sequence.
--u_substitution # Substitute 'U' for 'T' in the called sequence.
--compress_fastq # Compress fastq output files with gzip
--flowcell # flowcell name
--kit # kit name

List supported flowcells and kits:

guppy_basecaller --print_workflows

Alternatively, you can specific config file

guppy_basecaller --input_path ./fast5 --save_path ./guppy_output -c rna_r9.4.1_70bps_hac --calib_detect --num_callers 16 --cpu_threads_per_caller 8 --compress_fastq --reverse_sequence and --u_substitution

What is RNA Calibration Strand (RCS)?

The RNA CS (RCS) is the RNA Calibration Strand is the Enolase II from YHR174W at a concentration of 50 ng/μL. The reference fasta file for YHR174W ENO2 is available at ont-guppy-cpu/data/YHR174W.fasta. RCS is included in included in the Direct RNA Sequencing kit, SQK-RNA002, and PCR-cDNA Barcoding Kit, SQK-PCB109

Step3: Aign to Genome

We currently recommend using minimap2 to align to the reference genome.

minimap2 -Y -t 8 -R "@RG\tID:Sample\tSM:hs\tLB:ga\tPL:ONT" --MD -ax splice -uf -k14 --junc-bed hg38.bigbed hg38.fasta sample.fastq | samtools sort -@ 8 -O BAM -o aligned.bam -
samtools index aligned.bam

Published in categories tutorial  Tagged with Nanopore  ONT  Long-reads  analysis