According to Oliver Fiehn, a metabolomics researcher at the University of California, Davis, people are much more than meets the eye. “We are not just human,” he said. “We are an ecosystem.”
The ecosystem in question houses trillions of bacteria and fungi that live in and on humans and make up the human microbiome. These microscopic organisms carry on their own complex lives in tandem with ours, consuming and excreting molecules that affect our cells. Most of the microbes live in our digestive systems, and they influence diseases such as Crohn’s disease, anxiety, cardiovascular disease, and cancer.
Peering into a person’s labyrinthine gut is one way to find microbes, but there’s an easier way, as long as researchers are not squeamish: studying human feces. These waste products are like samples from the site of an ancient civilization. Stool contains chemical artifacts of the microbes in the gut, and sometimes even pieces of the microbes themselves. Scientists, akin to archaeologists, can dig into the samples to gather clues about the inhabitants.
Over the past decade, researchers have dissected these complex ecosystems, and their results have upturned the conventional understanding of human health that centers on our own cells and organs, and in the process, introduced the potential for new microbe-targeting diagnostics and therapies. But not every idea has borne out.
In one highly publicized example from 2020, scientists announced that the blood and tissue microbiome could be used to predict whether healthy individuals would develop cancer.1 But last year, another group of scientists claimed that the 2020 study had errors that made its findings not only irreproducible, but also incorrect.2 While the jury’s still out on this case, lack of reproducibility has been a much broader problem for the field of gut microbiome research. Scientists are now clamoring for some standards to ward off spurious results. “This has become crucial to move the field forward and to help ultimately come up with treatments for chronic conditions,” said David Wishart, a biochemist at the University of Alberta.
That’s where the National Institute of Standards and Technologies (NIST) comes in. The institute finds ways to use the science of measurement to improve the way that everything in daily life operates, from clocks to computer chips. Now, they’re taking on their newest challenge. Scott Jackson, leader of the microbiology group at NIST, said that they’re making “the most well-characterized fecal material on Earth.”
A point of embarrassment
Corey Broeckling believes that standard reference materials could help metabolomics core facilities like the one he runs at Colorado State University.
John Cline, Colorado State University
Corey Broeckling leads an analytic chemistry core facility at Colorado State University that helps researchers measure molecules in their stool samples. He knows firsthand just how inconsistent these measurements can be.
“One of the challenges that we’re left with at the end of the day is convincingly demonstrating data quality, not just to ourselves, but also to the researchers who we’re doing work for and the broader community that they’re presenting their work to,” Broeckling said.
Part of the challenge comes from the methods used to analyze the microbes living in the gut. The two most popular are metagenomics, which map fragments of DNA to specific microbial species that may be in the gut, or metabolomics, which measures the proteins, fats, and other molecules in the stool that may have been produced by gut microbes. “[Bacteria] are basically little chemical factories,” Wishart said. “Some of them will produce good things, and some of them will produce bad things.”
Being able to tease apart these good and bad emissions from gut microbes can help researchers understand microbes’ effects on their human host’s health. For example, one study revealed that higher levels of the short-chain fatty acid butyrate in stool associated with better glucose tolerance.3
Accurate measurements of these metabolite levels will enable reliable diagnostics and clinical tools. But current measurements are typically not very accurate, and the data often fail to convey the whole story. For starters, a stool sample contains thousands of molecules, and typical untargeted metabolomics approaches can’t identify and measure all of them. “It is an incompletely characterized sample at best,” Broeckling said. Different labs may use different methods to prepare and analyze samples, and even the same machine’s calibration can change over time. There’s also no ground truth for what molecules or species to even expect in the stool.
That means that two labs can analyze the exact same sample and end up with completely different measurements. Even the same lab can analyze the same sample one week apart and get different measurements. “In principle, we should all get very similar results,” Wishart said. “In reality, we don’t, and this has been a point of embarrassment for the metabolomics community.”
Solving reproducibility issues is essential for the development of microbiome-based drugs, according to David Wishart at the University of Alberta
David Wishart
A reference material can help. It typically comes in an unassuming one-milliliter tube—in this case, a tube of poop. There are thousands of tubes all containing the same material, prepared in the same consistent way, such that it is effectively identical across tubes.4 Labs can then include a tube of the reference material in each gut microbiome experiment that they conduct. Even if there are subtle differences between experiments, the reference material serves as a baseline, and any results can be reported relative to the reference material and compared across experiments. For example, the levels of short-chain fatty acids might be measured in one lab at a level 100x higher than in the reference material, and in another lab at a level 1000x higher than in the reference material, making it 10x higher than in the first lab’s study. “It’s a yardstick that we can compare against,” said Fiehn.
Scientists can go one step further to create standard reference materials (SRM), where they not only create many tubes of identical material, but they also precisely measure and report the levels of each molecule in the tubes. This provides researchers with a benchmark to compare their measurements. “If you don’t have a reference material where you know what’s in it, you don’t know what you’re missing,” said Martha Carlin, founder of The BioCollective, a company working to improve stool sample storage, processing, and shipping for microbiome research.
The measurement experts
SRM are NIST’s bread and butter: They have been making and selling SRM for more than a century. Their first SRM, released in 1910, was a type of limestone created for the limestone industry to measure the levels of trace minerals. More recently, NIST has begun making SRM for the nascent fields of metabolomics and genomics. For example, SRM 1950 is a blood plasma SRM made by blending 100 individuals’ plasma together and measuring the levels of about 100 molecules.5 It has become popular among metabolomics researchers.
The idea of creating a human stool SRM first occurred to Jackson 10 years ago, when he arrived at NIST’s Gaithersburg, Maryland headquarters. Five years later, his team officially started working on the project. They were immediately faced with difficult questions: What consistency should the material have? How could they ensure homogeneity across 1,000 tubes? How could they make sure the sample wouldn’t degrade, even if it was kept on the shelf for five years? Over the following years, they tackled each of these questions, settling on a liquid formulation produced by blending together multiple stool samples and diluting the mix with water.
Scott Jackson, leader of NIST’s Complex Microbial Systems Group, has been working on developing standard reference materials for the gut microbiome for about a decade.
Riley Wilson, NIST
The early test batches were small: only 500 tubes. NIST hired a contractor to recruit six vegans and six omnivores—two groups with very different diets and gut microbiomes to facilitate comparisons for calibration—and collect and freeze their stool samples over the course of multiple days. Then, they thawed the samples and blended them together while adding water to achieve the right final consistency, and then froze one-milliliter tubes of the final material.
The process may sound simple, but there were plenty of “little things learned the hard way,” Jackson said. For example, in an early effort, they learned that they couldn’t stick labels onto frozen tubes, so they started prelabeling the tubes before freezing.
These lessons have helped NIST get to what they hope will be their final batch, a whopping 10,000 aliquots from the stool of vegetarians and omnivores that they’re currently testing for stability over time. Having a large batch is important to make sure that many people can get and use the reference material; once NIST runs out, they will have to recruit a new batch of people to provide stool, make another homogeneous product, and test it again.
That’s a paradox of reference materials: as products developed for a rapidly changing scientific landscape, even the standards have to change over time. The current batch, which NIST aims to release to the research community in 2024, is just version 1.0, and at approximately one-year intervals, they hope to release a version 2.0, then 3.0, and so on.
Scientists at NIST combine stool from multiple donors to make a homogeneous reference material that can be frozen and distributed.
Deb Ellisor, NIS
In the interim, they will perform a more in depth characterization of the molecules and species in the sample. Jackson isn’t sure yet if they will release official values of each metabolite—known as certified values—to make the material a bonafide SRM. Measuring certified values is an expensive undertaking: for SRM 1950, it cost $10 million and 10 years of work to measure the levels of approximately 100 metabolites. It all comes down to what the research community wants. “I feel like the worst thing you could do is spend a lot of time and money to develop these things, and then nobody uses them,” Jackson said.
Community needs
Taking the temperature of the research community is an important part of Jackson’s job. He led an early outreach effort in 2019. He convened a conference of researchers and organizations concerned with microbiome research standards to discuss priorities for the stool reference material.6 At this meeting, the consensus was that the time was ripe for NIST to start making a stool reference material.
Many of these researchers remained involved with NIST’s efforts, and in 2020, NIST shared tubes of an early version of the reference material produced from the stool of vegans and omnivores with many of their labs. Each lab was tasked with analyzing the samples with whatever method of metabolomic profiling they typically employed and identifying the top 100 metabolites whose levels differed between vegans and omnivores. Jackson’s goal was to see how much consistency there was across labs, and sure enough, “The results were all over the place,” he said. “It’s shocking.”
Fiehn’s group was involved in this test and, with NIST’s permission, ultimately published their findings. Led by Raquel Cumeras, now a metabolomics researcher at Institute of Health Research Pere Virgili, they found nearly 1000 molecules with at least four-fold differences between vegans and omnivores.7
Tests like this are sometimes called ring trials, and they can turn an unflattering mirror on the microbiome community’s lack of standardization. In a 2019 study, researchers across 14 labs measured the metabolites in SRM 1950 and found vast disagreement over the values.8 Some molecules’ levels varied by nearly 40 percent across the tests from different labs. But these trials can also point to the best practices for metabolomics. The ring trial of SRM 1950 revealed that groups using the right controls and high-resolution methods tended to arrive at more accurate measurements.
Pieter Dorrestein, a chemist at the University of California, San Diego, has also worked with NIST to help characterize reference materials. In one of the samples that he is working on right now, his team found that they could only identify around 15 percent of the molecules detected.9 He thinks that this points to an ongoing limitation of metabolomics: You can only identify molecules if you know what they look like. He also believes that there should be as much high quality data available about reference materials as possible, and he plans to make his ongoing measurements publicly accessible when they are completed.
For many microbiome researchers, analyzing NIST samples and advocating for SRM is a service to the broader community. It’s a way to promote the values that they aspire to for their field. “Science wouldn’t be science if we didn’t have standards and consistent measurement concepts,” Wishart said.
See Also “Making Standards Exceptional“
NIST hasn’t been the only group rallying the community to develop gut microbiome standards, though. Pharmaceutical companies have also become alarmed by the lack of standardization across the academic microbiome studies they used to identify therapeutic targets. In 2017, Janssen partnered with NIST to launch a competition called the Mosaic Standards Challenge, where they assessed how accurately and precisely more than 50 teams could measure microbial genomic data from a set of standard stool samples produced by The BioCollective.10
The technology that Carlin’s team developed for the Mosaic Challenge has since been licensed to Zymo Research as a product called TruMatrix that can be used as a reference material for metagenomics studies of stool samples. The BioCollective produced around 2 million vials of the material, projecting that it would last 10-15 years. But Carlin recognizes that this is just the beginning. “This was planting the seed for the need for this reference material,” Carlin said.
Spreading the word
NISTs efforts are coming to fruition just as the field of microbiome research reaches a crucial milestone. Last year, the FDA approved the first microbiome drug: Reboyta, used to treat C. difficile infections. But these success stories are not the norm.
“We now have a decade of microbiome research without good reference materials and without consistent methods,” Carlin said. NIST’s reference material has a chance to change the tide for microbiome research—that is, if people use it.
This is at the front of Jackson’s mind. Developing a reference material is a labor of love but still a costly one. His nightmare is if the 10,000 tubes just sit on NIST’s shelf for years, collecting dust. “Everyone was clamoring for this,” he said. “Well, I hope they buy it because it was their idea.”
Wishart remembers that when SRM 1950 was released, uptake was slow simply because of lack of awareness. “It just didn’t make a splash because no one knew about it,” he said. Another limitation is the cost. Reference materials can cost between $1,000 and $2,000 per box of five one-milliliter tubes. “Laboratories that are under resourced will not be able to afford it,” Dorrestein said.
These obstacles have already hamstrung existing standards and references in metabolomics studies. Broeckling is involved in a standards setting group called the metabolomics Quality Assurance and Quality Control Consortium (mQACC), and in a literature review that he is currently leading, he found that very few people describe the standards and references that they use in their publications, and many don’t use them at all. Fiehn admitted that he is sometimes guilty of this. Even though he tries his best to always have SRM 1950 close at hand, sometimes his lab doesn’t have enough for a large-scale study, or they prioritize focusing on other quality controls that journals mandate.
Oliver Fiehn’s group at the University of California, Davis has published some of their findings about NIST’s stool reference material.
Oliver Fiehn
So how can the field make sure that the new stool reference material doesn’t fall to the wayside?
“It becomes a social initiative, really,” Broeckling said. “It has to be sort of a social pressure to raise the bar on quality control.” To him, that means everyone has to play a part. Scientists have to make an effort to use reference materials and highlight them in their papers. Regulators must start mandating the use of reference materials for preclinical studies. Journal editors and peer reviewers should start incorporating reference materials into their evaluations of a study’s rigor.
Jackson agreed that it will take more than just creating the reference materials to really make a change. He has already been involved in conversations with major journals and the FDA to discuss the role that SRM should play in evaluating studies.
Ultimately, the people with the most influence on scientists are their peers. Grassroots metabolomics communities like mQACC are compiling lists of best practices that include using standards and reference materials. A similar minded group of microbiome researchers put together a checklist called STORMS: Strengthening the Organizing and Reporting of Microbiome Studies.11 Chloe Mirzayi, the lead author of the checklist and a public health data scientist at the City University of New York, knew that NIST’s reference material would soon be released, so she included the recommendation that researchers include positive and negative controls. Ultimately, she recognizes that reference materials add a burden to already busy researchers, so the messaging is important.
“You want to create something that is seen as helpful and less prescriptive,” she said. “Create some kind of tool or system that people want to use because it adds something… and gives a paper more clout.”
References
- Poore GD, et al. Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature. 579(7800):567-574.
- Gihawi A, et al. Major data analysis errors invalidate cancer microbiome findings. mBio. 14(5):e0160723.
- Sanna S, et al. Causal relationships between gut microbiome, short-chain fatty acids and metabolic diseases. Nat Genetics. 51(4): 600–605.
- Lippa KA, et al. Reference materials for MS-based untargeted metabolomics and lipidomics: a review by the metabolomics quality assurance and quality control consortium (mQACC). Metabolomics. 2022;18(4):24.
- Phinney KW, et al. Development of a Standard Reference Material for Metabolomics Research. Anal Chem. 2013;85(24):11732–11738.
- Mandal R, et al. Workshop report: Toward the development of a human whole stool reference material for metabolomic and metagenomic gut microbiome measurements. Metabolomics. 2020;16(11):119.
- Cumeras R, et al. Differences in the Stool Metabolome between Vegans and Omnivores: Analyzing the NIST Stool Reference Material. Metabolites. 2023;13(8):921.
- Thompson JW, et al. International Ring Trial of a High Resolution Targeted Metabolomics and Lipidomics Platform for Serum and Plasma Analysis. Anal Chem. 2019;91(22):14407-14416.
- Gauglitz JM, et al. Enhancing untargeted metabolomics using metadata-based source annotation. Nat Biotechnol. 2022;40(12):1774-1779.
- Forry SP, et al. Variability and Bias in Microbiome Metagenomic Sequencing: an Interlaboratory Study Comparing Experimental Protocols. bioRxiv. DOI: 10.1101/2023.04.28.538741.
- Mirzayi C, et al. Reporting guidelines for human microbiome research: the STORMS checklist. Nat Med. 2021;27(11):1885-1892.