Gene expression analysis is widely used in bioinformatics because it enables researchers to find gene products with increased or decreased synthesis in individuals with, for example, particular diseases. Typically, researchers find many genes that are differentially expressed in these analyses. To narrow that set of genes down to the ones of interest, scientists investigate the functional annotations of those genes and ones with similar expression patterns.
The first step in gene expression is called transcription, during which DNA is transcribed to RNA. In this use case, RNA-Seq data from tumors and matched normal tissue from three patients with oral squamous cell carcinomas are analyzed. Differentially expressed genes are discovered using the R integration in KNIME and then displayed in an interactive view. All statistically significant over/under expressed genes are investigated and interesting ones are selected by looking into their functional annotations. Using hierarchical clustering, a cluster of similarly expressed genes is selected. Their biological function is investigated through a shared component that allows researchers to perform a pathway enrichment analysis. Lastly, researchers can search for compounds that target the gene products by querying bioactivity data from Google BigQuery.
Why KNIME Software
The open source KNIME Analytics Platform enables data scientists and researchers to mix and match tools, allowing them to create reproducible workflows in one platform. In this case, researchers were able to use their favorite R library, extract data from Google’s BigQuery, and use shared components to customize the analysis in one reusable workflow. All steps of the analysis can also be performed on the KNIME WebPortal using the interactive views from the components.