Researchers fully sequence the Y chromosome for the first time
Led by the National Human Genome Research Institute (NHGRI), a team of researchers at the National Institute of Standards and Technology (NIST) and many other organizations used advanced sequencing technologies to read out the full DNA sequence of the Y chromosome — a region of the genome that typically drives male reproductive development. The results of a study published in Nature demonstrate that this advance improves DNA sequencing accuracy for the chromosome, which could help identify certain genetic disorders and potentially uncover the genetic roots of others.
DNA sequencing isn’t as simple as reading genetic material from a genome’s beginning to its end. DNA gets chopped up when it is extracted from cells, plus even the best sequencing equipment can only handle relatively small bits of DNA at a time. So, researchers and clinicians rely on special software to piece together fragments of sequenced code in the correct order like a puzzle.
A reference genome is a separate, already pieced-together genome that serves as a guide, similar to the pictures on the front of puzzle boxes. And because 99.9% of our species’ genetic code is shared, any human genome would closely match a reference.
Last year, a team from the Telomere-to-Telomere (T2T) consortium, which is made up of experts from dozens of organizations such as NIST, generated the most complete reference genome at the time by using new sequencing technologies to crack previously indecipherable regions of the genome. But cells used in that work did not contain the most puzzling of all, the Y chromosome.
“Chromosomes all contain sections of very repetitive DNA, but well over half of the Y chromosome is like that,” said study co-author Justin Zook, who leads NIST’s Genome in a Bottle (GIAB) consortium. “If you use the puzzle analogy, a lot of the Y chromosome looks like the backgrounds often do, where all the pieces look really similar.”
With this new endeavor, T2T was not starting at zero as the GIAB had already gotten the ball rolling.
The GIAB’s mission is to produce test materials, or benchmarks, that can be used to evaluate sequencing technologies or methods. The materials themselves are highly accurate readouts of specific genes that can act as an answer key for checking the results of a particular sequencing method.
NIST has rigorously analyzed several individual human genomes to create their benchmarks. While GIAB has not yet produced a benchmark for the Y chromosome specifically, the consortium has studied one genome extensively, accumulating the largest collection of Y chromosome data prior to the new study.
That data served as a jumping-off point for the new study’s authors, who focused their analysis on the best understood GIAB Y chromosome. They examined the sample with a combination of cutting-edge technologies — namely high fidelity and nanopore sequencing — that make the DNA fragment puzzle pieces larger and thus easier to assemble.
A machine-learning analysis tool and gamut of other advanced programs helped the team identify and assemble the pieces of the chromosome. More than 62 million letters of genetic code later, the authors had spelled out the GIAB Y chromosome front to back.
The researchers pitted their complete Y chromosome sequence, named T2T-Y, against the most widely used reference genome’s Y chromosome parts, which are riddled with stretches of absent code. Using them both as guides for sequencing a diverse group of over 1,200 separate genomes, they found that T2T-Y drastically improved the outcomes.
T2T-Y, in combination with the group’s previous reference genome, T2T-CHM13, represents the world’s first complete genome for the half of the population with a Y chromosome.
The newest addition could be useful in identifying and diagnosing the few known conditions related to genes in the Y chromosome. But what’s more is the new reference’s potential to shed light on new genes and their function.
“There are certainly aspects of fertility and some genetic disorders that are connected to genes in the Y chromosome,” Zook said. “But because it’s been so hard to analyze up to this point, we may not even know yet just how important the Y chromosome is.”
At NIST, Zook and his fellow GIAB researchers have developed a new benchmark based on the X and Y chromosomes assembled by T2T to help translate the potential impact of the new reference material into reality.