Simple workflow that intersects annotation from a flat-file coming from UCSC with regions of interest.

as a single ZIP file

Here we read in the region of interest that were produced in the "getRegions" workflow and annotation from UCSC mm9 (ensGene.txt.gz).
Since the chromosome names have to be exactly the same we need to add "chr" to the chromosome name from the ROI table. This is done in the Java snippet (node 48). We also rename the columns from the mm9 annotation according to the names for the .sql file from UCSC.
The value counters, RowIds and joiner nodes are calculating the common chromosomes. Then we combine all this information in the meta workflow where we go through all the chromosomes (last input), inject the chromosome name into the other input tables and filter for this chromosome name.

as a single ZIP file

Then the RegionOverlapp node calculates all overlapping (minimum 1 nucleotide overlapp) regions.
The results for all chromosomes are then combined at the end of the meta node. We have now annoated our regions of interest.


What are you looking for?