Ting-You Wang

Nanopore direct RNA data analysis

Posted on January 20, 2022

An Introduction to Nanopore direct RNA data analysis.

Software preparation

# Install Guppy CPU version
wget -c https://mirror.oxfordnanoportal.com/software/analysis/ont-guppy-cpu_6.0.1_linux64.tar.gz
tar zxvf ont-guppy-cpu_6.0.1_linux64.tar.gz
# add ont-guppy-cpu/bin to $PATH in .bashrc file
PATH=/path/to/ont-guppy-cpu/bin:$PATH

# install minimap2 and samtools
conda install -c bioconda minimap2 # paftools.js will be install automatically.
conda install -c bioconda samtools

Annotation preparation

wget -c https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/gencode.v39.annotation.gff3.gz
gunzip gencode.v39.annotation.gff3.gz

paftools.js gff2bed gencode.v39.annotation.gff3 > hg38.bigbed

Step1: Basecalling

guppy_basecaller --input_path ./fast5 --save_path ./guppy_output --flowcell FLO-MIN106 --kit SQK-RNA002 --calib_detect --num_callers 16 --cpu_threads_per_caller 8 --client_id 300 --compress_fastq

Options

--input_path  # The location of FAST5 files
--save_path # The location of output FASTQ files. It have three subfolders (pass, fail, and calibration_strands).
--calib_detect  # Enable RNA calibration strand (RCS) detection and filtering.
--compress_fastq # Compress fastq output files with gzip
--flowcell # flowcell name
--kit # kit name

List supported flowcells and kits:

guppy_basecaller --print_workflows

Alternatively, you can specific config file

guppy_basecaller --input_path ./fast5 --save_path ./guppy_output -c rna_r9.4.1_70bps_hac --calib_detect --num_callers 16 --cpu_threads_per_caller 8 --client_id 300 --compress_fastq

What is RNA Calibration Strand (RCS)?

The RNA CS (RCS) is the RNA Calibration Strand is the Enolase II from YHR174W at a concentration of 50 ng/μL. The reference fasta file for YHR174W ENO2 is available at ont-guppy-cpu/data/YHR174W.fasta. RCS is included in included in the Direct RNA Sequencing kit, SQK-RNA002, and PCR-cDNA Barcoding Kit, SQK-PCB109

Step3: Aign to Genome

We currently recommend using minimap2 to align to the reference genome.

minimap2 -Y -t 8 -R "@RG\tID:Sample\tSM:hs\tLB:ga\tPL:ONT" --MD -ax splice -uf -k14 --junc-bed hg38.bigbed hg38.fasta sample.fastq > aligned.sam
samtools sort -@ 8 -O BAM align.sam -o aligned.sort.bam
samtools index aligned.sort.bam

Published in categories tutorial  Tagged with Nanopore  ONT  Long-reads  analysis