Region Of Interest complex example of miRNA experiment

Region Of Interest complex example of miRNA experiment

Example showing some of the features of the ROI (region of interest) concept in the context of miRNA data exploration.

The already pre-executed workflow with sample data can be found here:


seqan-roi tools : This includes an R package (ngsroi) R script to calculate some transformations:

. This should be renamed into KNIME and the NGS plugin.


miRNAs and their precursor RNAs belong to the family of small RNAs. Mature miRNAs are in the range of 21 nucleotides and precusor molecules are around 120nt. (For more detailed information see recent literature.)

In this example we are looking at a subset of regions that can be associated to known miRNAs and non-annotated regions. This should show that it is possible to explore such data using KNIME and that it is quite well possible to visually identify these regions.

We are using three ROIs, but this can be easily expanded to any number of ROIs since we are using the appropiate nodes to loop to any given number of files. This introduces some additional complexity of the workflow in order to deal with the file names.

Input files are: 

  • ROIs overlapping annotation
  1. Align reads to hg19 and produce a sorted (by position) sam/bam file. 
  2. Apply bam2roi (bam2roi -if input.bam -of output.roi [--strand-specific] [--ignore-pairing] [--link-over-skipped]) 
  3. ROI feature projection (roi_feature_projections --in-roi output.roi --out-roi miRNA.hg19.bowtie.roi --in-features hg19.sorted.gff [--strand-specific] --gff-type exon --gff-group-by ID)
  • ROIs derived from alignments to reference genome (here hg19):
  1. Align reads to hg19 and produce a sorted (by position) sam/bam file.
  2. Apply bam2roi (bam2roi -if input.bam -of output.roi [--strand-specific] [--ignore-pairing] [--link-over-skipped])
  3. (optional) apply an R script to compute some characteristics. ( output.roi miRNA1[345].hg19.bowtie.roi


IGV link

The following picture shows the view to link from a specific row to the corresponding IGV view.

Collection line plot

The ROIs can be visualized using the Collection Line Plot node:


Using the RegionOverlapp node we can join different regions. This has to be done for each chromosome separately and thus has to be encapsulted in a loop structure:

The java Snippet (Node 85 in the screen shot) is giving the sample a name so we an identity the sample later on. This is based on the file name and uses some simple JAVA command to create a new cell/column.

Scatter Plot

Individual miRNAs can be displayed in a scatter plot using KNIME's standard features:

Display all resutls

The lower part of the workflow shows how multiple experiments can be displayed. Here we used an alignment against the hairpins where the reference sequences are individual hairpin sequences from miRBase.

We can show the experiments in the rows

or the miRNAs in the rows:

Again, the reason for looking like a complicated workflow is that we have to extract the experiment name from the file name and propagate this information into the tables. Also transposing the table is not as straight forward as some might believe... But, in the end we have interactive tables where we could also color the rows using the color manage according to the experiment design..

Please let me know if there are any problems: bernd dot jagla at pasteur dot fr....




What are you looking for?