Among the vast contents of the human genome, geneticists are most interested in the tiny fraction—about 1.5 percent—that contains instructions for building proteins. Protein building is DNA's main function, and these complex molecules are essential for development, growth and reproduction across the entire body.

But we don't know what most of these protein-coding genes actually do. Only about 20 percent of human coding genes are well studied, leaving the function of the other 80 percent (about 16,000 genes, along with the proteins they make) largely a mystery. This is because of a long-standing bias in genetics research: scientists more often study genes and proteins already known to have important functions. These high-profile projects, such as studying genes with known implications for cancer, are the ones that seem “sexy” to funders, says University of Oxford cell biologist Matthew Freeman.

Freeman and his colleagues have dubbed the well of untapped genetic potential the “unknome,” and they have been working for 10 years to create a database that compiles and catalogs these understudied genes. It ranks them by “knownness” and tracks which of the genes appear in various other species' DNA. Their research tool and accompanying paper in PLOS Biology were recently released online.

The ability to filter for genes found across various species sets this project apart from others with similar aims, says bioinformatician Avi Ma'ayan of the Icahn School of Medicine at Mount Sinai in New York City, who was not involved in the new work. “The concept of the unknome is not a new one,” Ma'ayan says, but with so much undiscovered, researchers might not know which genes to prioritize. That's why the interspecies comparison can be so helpful. When genes are conserved across many species, that's a good hint that they play “an essential role in the organism,” Ma'ayan says. The unknome database allows scientists to search, for example, for understudied genes that exist only in invertebrates, that are found in all living cells, or that are predicted to be found only in the cell membrane. As Freeman says, “it's very tunable.”

To test the unknome database's utility, Freeman and his team isolated 260 unknown fruit fly genes that are also present in humans. Knocking out many of those genes in the flies either made the insects unviable or gave them various defects. The results show that “within these ‘unknown’ genes and proteins, there are some that are critical for our development and could potentially have important clinical implications,” says Eduard Porta Pardo, a computational biologist at the Barcelona Supercomputing Center, who was not involved in the work. With such resources and technological advances, the researchers hope the unknome will be one knowledge base that only shrinks with time.*

*Editor’s Note (10/17/23): This paragraph was edited after posting to include Eduard Porta Pardo’s comments.