Recent Advance in Biomedical Omics Data Analysis | Biomedgrid llc

Sequencing technology is dominant in the study of complex human genetic diseases. Identified risk genes and other biomarkers will make genetic counselling, risk assessment and avoidance, and disease diagnosis and treatment possible. Benefit from the discovered biomarkers, many diseases could be accurate diagnosed and prevented, as well as accurate treated. Nevertheless, the molecular mechanisms of many neuropsychiatric disorders are still unclear and there are no effective treatments for these neuropsychiatric disorders. The rapid development of whole genome sequencing technology, large amounts of genetic data are generated concerning larger cohorts with diverse neuropsychiatric disorders [1], making it possible to detect various genetic variants, i.e. SNVs, InDels, and SVs. Genetic research of these disorders are focused on genome-wide association analysis (GWAS), pedigree analysis, twin family studies, trio family studies, etc., Over the last decades, GWAS have identified several hundred loci for neuropsychiatric disorders successfully [2,3]. Whole exome/ genome sequencing have greatly facilitated the identification of risk de novo variants [4]. Besides, with the increasing amount of patient cohorts, more and more risk mutations have been discovered [5].

However, there is still a deep gap between the explaining the molecular mechanisms and thousands of risk gene locis identified in the past years using GWAS. For example, it has been reported that 142 risk mutation locis contribute to schizophrenia, while most of these locis do not contain any genes and it is not clear how they related to the disorder. Traditional single-dimensional genomic study are less useful in explaining the mechanisms of these disorders [6,7]. However, with the decreasing cost of sequencing technologies, large amounts of omics data become available using RNA-seq [8]. ChIp-seq [9]. Methyl-seq [10] Hi-C[11–12] and ATACseq [13–14] etc Whole exome/genome sequencing could generate the sequence information of genes. RNA-seq technology could be used for genotyping and identification of transcription factor binding sites, as well as generating gene expression data. ChIpseq (Chromatin immunoprecipitation followed by sequencing) could be used for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Methyl-seq could be used for detecting and quantify DNA methylation. Hi-C (high throughput chromatin conformation capture) technology is a method to study the three-dimensional architecture of genomics and could be used to comprehensively detect chromatin interactions in the mammalian nucleus. And ATAC-Seq (assay for transposaseaccessible chromatin with high throughput sequencing) could be used for mapping chromatin accessibility genome-wide. With all these biotechnologies, we could detect mutations in gene sequence, gene regulation and epigenetic mechanisms, reactivities of methylated and nonmethylated cytosines, chromatin interactions, and variations in gene expressions under complex disease phenotypes.

Recently, the accumulation of multi-omics data and singlecell sequencing data have greatly promoted the development of machine learning algorithms, statistical learning methods, and deep learning methods. Integrative analysis of multi-omics data generated by various sequencing technologies is helpful for screening robustness biomarkers related to diseases. Diagnosis and treatment should be further personalized according to multidimensional information [15]. Nowadays, applications of machine learning and deep learning methods are becoming ubiquitous in biology and encompass not only risk gene identification, but also protein binding prediction, disease subtypes classification, and transcriptional regulatory networks characterization, etc. [16,17]. In the near future, these intelligent computational methods should be further used to multi-omics data integrated analysis.

Article by Xue Jiang