Science News Hubb
Advertisement
  • Home
  • Science News
  • Technology
  • Contact us
No Result
View All Result
  • Home
  • Science News
  • Technology
  • Contact us
No Result
View All Result
Science News Hubb
No Result
View All Result
Home Technology

An artificial intelligence tool, scGPT, can identify cell types, predict the effects of disrupting genes, and pinpoint which genes interact with each other.

admin by admin
August 27, 2023
in Technology



Scientists investigate disease targets by studying gene expression data often obtained by assaying entire cell populations. For instance, researchers used bulk RNA sequencing to discover druggable cancer-associated protein targets1 and to uncover potential blood-based biomarkers for the early diagnosis of Alzheimer’s disease.2 

More recently, scientists have turned to single cell RNA sequencing (scRNA-seq), which provides insights into how gene expression varies between individual cells.3 Scientists typically analyze scRNA-seq data using machine learning tools that have been built from scratch to carry out specific individual tasks. 

Bo Wang, a computational biologist, and his team of computer scientists and cell biologists at the University of Toronto have built a new artificial intelligence (AI) model called single cell generative pretrained transformer, or scGPT, which can be finetuned to carry out a diverse range of tasks using scRNA-seq data. These tasks include predicting the effects of manipulating specific genes and merging distinct batches of data together to reveal otherwise undetectable cell types.

scGPT is a foundational AI tool because the core model can be built upon and tweaked into distinct versions that carry out a range of downstream tasks. The increasingly popular AI known as ChatGPT works much the same way; while the chatbot generates the next words in a sentence, scGPT predicts the expression levels of genes in a cell.

According to Wang, employing a single base model to perform many downstream tasks is beneficial because using various computational models to carry out different tasks can cause a misalignment when comparing data from the distinct analyses. Each computational approach might make different assumptions regarding the structure of the same data depending on how it was built, and this can lead to less accurate conclusions.

In their recent preprint study, Wang’s team showed that scGPT analyzes scRNA-seq data better than standard approaches.4 They first trained scGPT for four days by feeding the model scRNA-seq data from more than 10.3 million blood and bone marrow cells, including more than 50 cell types. This allowed the model to learn fundamental links between the expression of genes within and across cells. As not all genes are expressed in a given cell, and some genes are expressed at levels undetectable by current sequencing technology, each cell provided information on a few thousand of the 20,000 genes in the human genome. Overall, the model learned nearly all of the genes in the genome.

See also “Now AI Can Be Used to Design New Proteins”  

One task that the team finetuned the foundational model to achieve was merging together 10 distinct batches of scRNA-seq data that were previously collected from human immune cells. Using a portion of the data from each batch, they taught the model to categorize the same cell types across the datasets into common clusters. scGPT also learned to adjust for any differences between batches caused by nonbiological factors, such as the day the experiment was carried out or how the cells were collected. Pooling datasets together in this way, a process known as batch integration, boosts the amount of data on each cell type, allowing scientists to better detect and characterize rare cell types that could play a role in healthy or disease states.

The researchers then tested how well the finetuned version of scGPT and three of the most popular methods used for this task merged together the remaining previously unseen data. scGPT categorized cell types from different batches together five percent more effectively than the standard models and corrected for nonbiological effects similarly well compared to the widely used methods.

The team also tested how well a different honed version of scGPT and a standard model called GEARS predicted the effects of perturbing more than 80 genes—either alone or in pairs—on the activity of other genes.5 By focusing on the expression of 20 genes that were most affected by each genetic manipulation, Wang and his colleagues found that scGPT came out on top.

“Do these improvements really result in additional biological knowledge? Are they useful in generating new hypotheses?” questioned Ahmed Mahfouz, a computational biologist at Leiden University Medical Center in the Netherlands who was not involved in the study. 

While the findings are promising, Mahfouz cautioned that these models have millions of parameters and require a lot of data to train. As a result, they use a lot of energy and have a huge carbon footprint. Based on this high energy demand during training and because researchers will need some familiarity with machine learning to supervise the finetuning process, it is unclear how widely used scGPT could become among cell biologists.

Nevertheless, “the fine-tuning is extremely efficient,” said Wang. “For a dataset of let’s say 10,000 or 20,000 cells, you only need five to ten minutes.” The team hopes that this will make scGPT widely accessible. “We have made the code and model available to everyone, and we are working really hard to create educational websites, providing lots of tutorials with concrete examples for every task it can solve,” he said.  

Wang’s team plans to continue working on scGPT. While the original version of the model is useful for analyses of bone marrow and immune cells, the team recently released an updated version of scGPT that was trained on 33 million cells including brain, blood, pancreas, lung, heart, kidney, cancer, and gut cells.6

Recently, other foundational models similar to scGPT have been released, making it only a matter of time before it is known which, if any, gain traction in research.7,8,9 Mahfouz thinks that models such as scGPT will likely provide answers to important biological questions in the near future, although this can only be proven with time. “It is an exciting time. By the end of the year, I think you will have a very different picture than what we see now,” said Mahfouz.

References

  1. Stransky N, et al. The landscape of kinase fusions in cancer. Nat Commun. 2014;5:4846.
  2. Shigemizu D, et al. Identification of potential blood biomarkers for early diagnosis of Alzheimer’s disease through RNA sequencing analysis. Alzheimers Res Ther. 2020;12(1):87.
  3. Li X, C Wang. From bulk, single-cell to spatial RNA sequencing. Int J Oral Sci. 2021;36(13).
  4. Cui H, et al. scGPT: Towards building a foundation model for single-cell multi-omics using generative AI. bioRxiv. 2023.
  5. Roohani Y, et al. GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations. bioRxiv. 2022.
  6. Cui H, et al. scGPT: Towards building a foundation model for single-cell multi-omics using generative AI. bioRxiv. 2023.
  7. Theodoris CV, et al. Transfer learning enables predictions in network biology. Nature. 2023;618:616-624.
  8. Yang F, et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell. 2022;4:852-866.
  9. Shen H, et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience. 2023;26(5):106536.

Note: This article was updated to better represent Ahmed Mahfouz’s profession.



Source link

Previous Post

An apology to Indigenous communities sparks a mental health rethink

Next Post

Memes About Animal Resistance are Everywhere — Here’s Why you Shouldn’t Laugh off Rebellious Orcas and Sea Otters too Quickly

Next Post

Memes About Animal Resistance are Everywhere — Here’s Why you Shouldn’t Laugh off Rebellious Orcas and Sea Otters too Quickly

Recommended

Chinese Rocket Will Crash to Earth on November 5: Here’s What We Know

November 3, 2022

FTX Collapse Imperils Philanthropic Research Funding

November 15, 2022

Don't miss it

Science News

Antimatter falls like matter, upholding Einstein’s theory of gravity

September 28, 2023
Science News

Huge earthquake shook Seattle 1100 years ago and it could happen again

September 28, 2023
Science News

1st black hole imaged by humanity is confirmed to be spinning, study finds

September 28, 2023
Technology

Pangaea Ultima, the Next Supercontinent, May Doom Mammals to Far-Future Extinction

September 28, 2023
Technology

Did Humans and Dinosaurs Ever Live Together?

September 28, 2023
Technology

Cell Culture Collective, Inc. Announces Partnership with Defined Bioscience, Inc. to Distribute Serum-Free Stem Cell Culture Products

September 28, 2023

© Science News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Science News
  • Technology
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Science News
  • Technology
  • Contact us

© 2022 Science News Hubb All rights reserved.