For more information contact us at bina.rd@roche.com

Publication [Open access]

If you use LongISLND in your work, please cite the following:
Bayo Lau, Marghoob Mohiyuddin, John C. Mu, Li Tai Fang, Narges Bani Asadi, Carolina Dallett, and Hugo Y.K. Lam
LongISLND: In silico Sequencing of Lengthy and Noisy Datatypes
Bioinformatics first published online September 25, 2016 doi:10.1093/bioinformatics/btw602

Introduction

LongISLND is a read simulator which profiles the characteristics of third generation, single-molecule sequencing technologies and simulates accordingly. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. Please read on to see application examples to PacBio and oxford nanopre (ONT) data.

Download LongISLND

Github repository: https://github.com/bioinform/longislnd

System Requirements

The following must be installed:

Installing LongISLND

From Source

From Binary Download

Running LongISLND

Usage of the Java JAR can be carried out/demonstrated by the following convinient Python scripts.

simulate.py sample.py

Troubleshooting

Usage Examples

These are demonstrations of LongISLND, from downloading data, to alignment, to learning, and to simulation.

These are tested only on Linux platform due to various depedencies.

Installing Aligners

  1. PacBio
  2. This sets up PacBio's SMRTAnalysis2.3, if it's not already available to you. All download_and_align.sh scripts in PacBio examples assume an installation location of sampling_example/smrtanalysis, please change those scripts if you want to use your own version of SMARTAnalysis.

    1. Change working directory to sampling_example.
    2. execute setup_smrt23.sh
      • download and build PacBio's SMRTAnalysis2.3
      • (to avoid complications) select NONE in answering the question "What job management system will you be using?"

  3. ONT
  4. This sets up GraphMap (Sovic, I. et al (2016). Fast and sensitive mapping of nanopore sequencing reads with graphmap. Nat Commun, 7.) and Samtools (H. Li, et al (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16)), if it's not already available to you. sampling_example/ont_ecoli/download_and_align.sh assumes installation locations of sampling_example/graphmap and sampling_example/samtools-1.2, please change the script if you want to use your own version of aligners.

    1. Change working directory to sampling_example.
    2. execute setup_ont.sh
      • download and build GraphMap
      • download and build Samtools

PacBio P6 E. coli

  1. Change working directory to sampling_example/ecoli.
  2. execute download_and_align.sh
  3. execute learn_and_simulate.sh

ONT R7.3 E. coli

  1. Change working directory to sampling_example/ont_ecoli.
  2. execute download_and_align.sh
  3. execute learn_and_simulate.sh

PacBio P6 CHM1

This is storage/io/compute-intensive due to the scale of the human-scale sequencing data.

  1. Change working directory to sampling_example/p6_chm1.
  2. execute download_and_align.sh
  3. execute learn.sh
  4. please use simulate.py as in sampling_example/ecoli/learn_and_simulate.sh to simulate