 
New machine learning model fills in missing information about our individual cells
UBC Computer Science researchers create new artificial intelligence model to help scientists integrate and generate new multiomics data
With technological advances in the last decade, scientists can now probe individual cells in our bodies at different levels to understand how they function and work together. For example, some techniques give us information about the genes in our DNA, while other techniques give us information about gene activity or how tightly packed different regions of our genome are.
While each of these separate methods tells us detailed, in-depth information about our cells, researchers can also use a combined approach, known as “multiomics,” which probes cells at multiple levels simultaneously. This approach gives us a bigger picture of how our cells function and what happens to different cells as we age or experience different health conditions. However, data from multiomics experiments are complex, making it challenging for scientists to integrate the data to uncover patterns in our biology. Conducting these experiments is also expensive, so multiomics data are sparse.
To get both breadth and depth of information from individual cells, UBC Computer Science researchers used machine learning to integrate data from separate methods to mimic a multiomics approach in a new study, published in Cell Reports Methods. Their deep learning model, scPairing, learns from existing multiomics data and generates new artificial data from unimodal data, or data from these separate methods, by pairing cells together.
"scPairing presents an opportunity to leverage the wealth of unimodal single-cell data to generate high-quality artificial multiomics data,” says Jeffrey Niu, first author of the study and Ph.D. student in UBC’s Department of Computer Science. “This approach can help us discover new and important relationships in individual cells.”
Their new model is based on a type of deep learning model called variational autoencoders, which compresses and then reconstructs the data, picking out the most important features and discarding the irrelevant ones — similar to having several bags of mixed Halloween treats and picking out the chocolates while discarding the candy corn and peanuts. Variational autoencoders can reconstruct data and fill in missing information or fix the corrupted parts of files — in this case, adding different types of candy bars that may be missing in different treat bags.
The researchers tested scPairing on fifteen datasets, using the model to generate multiomics data for different types of cells with limited data, including human retinal cells, a rare type of immune cell and kidney cancer cells.
“Our model allows us to widen the scope and scale of what we can study in terms of our genome, transcriptome, epigenome, and more, ultimately providing new insights into fundamental biological processes and diseases,” said Dr. Jiarui Ding, Assistant Professor in UBC’s Department of Computer Science and senior author of the study.