Human diversity captured in new reference ‘pangenome’
The human reference genome has just got an upgrade: it is now a “pangenome” to represent more diversity between various individuals and populations.
The human reference genome has just got an upgrade: it is now a “pangenome” to represent more diversity between various individuals and populations.

The reference genome is an open-access resource that is key to the study of genetics and genomics, a better understanding of the cause of various diseases, and their treatment. But it has had its limitations: the data so far came from just 20 individuals, and 70% of it from just one man of predominantly African-European background.
The new pangenome reference includes the genome sequences of 47 individuals. It’s still a work in progress, with researchers aiming to extend that to 350 individuals by the middle of 2024.
The upgrade from scientists on the Human Pangenome Reference Consortium, funded by the US National Human Genome Research Institute and involving researchers from American and European institutions, has been described in a set of six papers published in Nature, Genome Research, Nature Biotechnology, and Nature Methods.
The existing reference
A genome is the complete set of DNA instructions that helps an organism function, and the sequences in them differ only slightly between one individual and another of the same species. In humans, on average, more than 99% of the genome is identical between any two individuals. The remaining 0.2% to 1% that is different, is what makes every person unique, and these differences can also provide insights about their health.
The first draft of the human genome, published in 2003, has been upgraded a number of the times over the years as researchers corrected errors and technology advanced. Last year, a milestone was reached when the last 8% was sequenced, resulting in a nearly complete genome.
The reference genome, however, has remained imperfect despite these advances, especially in terms of 0.2% to 1% of DNA, a critical region that represents diversity.
“Today, virtually all genomic analyses compare genome sequencing data to one ‘reference genome’. The big limitation being that it is just one sequence chosen quite arbitrarily. That does not properly reflect human diversity and entails problems analysing parts of a genome that is not very close to the reference sequence,” Professor Tobias Marschall of Heinrich Heine University, Düsseldorf, said in an email. He is part of the Steering Committee of the Human Pangenome Reference Consortium and one of the senior authors on the paper in Nature.
It was to address this problem that the consortium was launched in 2019. Its objective has been to help migrate common genomic analysis and develop a pangenome, so that diverse populations are better represented.
“The reference pangenome is a paradigm shift in proposing to use a collection of diverse sequences as a basis for comparison,” Marschall said.
What’s new
The 47 people whose genomic sequences have been assembled come from around the world. In the reference pangenome so developed, 99% of each sequence is rendered with high accuracy, and these sequences revealed nearly 120 million DNA base pairs that were previously unseen.
“This complex genomic collection represents significantly more accurate human genetic diversity than has ever been captured before. With a greater breadth and depth of genetic data at their disposal, and greater quality of genome assemblies, researchers can refine their understanding of the link between genes and disease traits, and accelerate clinical research,” Erich D Jarvis, one of the primary investigators, said in a statement released by the Rockefeller University.
The analysis of 47 individuals resulted in 94 distinct genome sequences (two for each set of chromosomes) besides the sex chromosome Y in males. Computational analysis of these sequences, in turn, revealed the 120 million DNA base pairs that were previously unseen (or in a different location in the previous reference). Of these, about 90 million derive from what are called “structural variations”.
Structural variations refer to differences that arise between DNA of individuals when chunks of their chromosomes are moved, deleted, or duplicated. Structural variants are understood to play a major role in human health, as well as in population-specific diversity.
The pangenome also fills in gaps that were due to duplicated genes. The Rockefeller University cites the example of MHC, a cluster of genes that code proteins that, in turn, help the immune system recognise antigens, such as those from SARS-CoV-2. Using the older sequencing methods, it was impossible to study MHC diversity, Jarvis said in the university statement.
All Access.
One Subscription.
Get 360° coverage—from daily headlines
to 100 year archives.



HT App & Website
