Nanopore raw data visualization using squigualiser
Posted on June 18, 2024
An Introduction to Nanopore raw data visualization using squigualiser
Software preparation
Tool for converting raw data to BLOW5 format
- If the raw data is in POD5 format, Blue-crab is required.
- If the raw data is in FAST5 format, slow5tools is required.
# Create environment
mamba create --name ont python=3.9
mamba activate ont
# Install blue-crab
mamba install zstd
python3 -m pip install --upgrade pip
pip install blue-crab
# Install slow5tools
mamba install hdf5
mamba install slow5tools
Aligning raw signals to basecalled reads using F5C
mamba install f5c=1.4
Signal-to-read visualization using squigualiser
pip install squigualiser
Step1: Converting data format
For multiple FAST5 files
slow5tools f2s ./fast5_dir -d blow5_dir # convert multiple FAST5 files to multiple BLOW5 files
slow5tools merge blow5_dir -o data.blow5 # merge BLOW5 into one
slow5tools get data.blow5 -l read_ids.txt --to blow5 -o target.blow5 # extract records from a blow5 file based on a list of read ids
slow5tools index target.blow5 # index BLOW5 file
For single POD5 file
blue-crab p2s data.pod5 -o data.blow5
Step2: raw signals to basecalled reads alignment
f5c resquiggle -c --rna --pore r9 -o target.paf target.fastq target.blow5
Step3: Signal-to-read visualization
squigualiser plot -f target.fastq -s target.blow5 -a target.paf -o out_dir --save_svg
# show the whole fastq sequence, output in the HTML file
squigualiser plot -f target.fastq -s target.blow5 -a target.paf -o target_dir --rna --sig_scale znorm --fixed_width --base_limit 20000 --sig_plot_limit 99999999
# show the whole fastq sequence, output in the SVG file
squigualiser plot -f target.fastq -s target.blow5 -a target.paf -o target_dir --rna --sig_scale znorm --fixed_width --base_limit 20000 --sig_plot_limit 99999999 --no_samples --no_colours --save_svg --region 200-500 --xrange 6000 --plot_limit 9000
Options
-r read_id # specify the read with read_id to plot
--rna # specify for RNA reads
--fixed_width # plot with fixed base width
--base_limit # maximum number of bases to plot
--sig_plot_limit # maximum number of signal samples to plot