Annotating human genes: where do we stand now?
In a perspective article published in Nature, Piero Carninci and colleagues discuss the status of human gene annotation and how high-throughput RNA sequencing technologies have accelerated the discovery of non-coding RNA genes with unknown functions and will promote the completion of the human gene catalogue.
Mapping the location of human genes and understanding their function is essential for investigating human evolution and gene-disease associations. International consortia such as the Human Genome Project (HGP) and the Telomere-to-Telomere (T2T) have contributed to mapping the human genome sequence over the past 20 years and to identifying thousands of protein-coding genes. This has been facilitated by new high-throughput RNA sequencing technologies that also enabled the identification of numerous non-coding RNA genes (ncRNAs), which are not translated into proteins but rather possess regulatory functions.
Piero Carninci – Head of the Functional Genomics Research Centre at Human Technopole (HT) – and Steven Salzberg – Director of the Center for Computational Biology at Johns Hopkins University (USA) – organised a meeting at The Banbury Center (Cold Spring Harbor Laboratories, USA) where major experts in the field gathered to review the progress made so far towards obtaining a complete list of human genes and to discuss the technology needed to complete gene annotation. The outcome of the discussion is summarised in an article that is now published in the international journal Nature.
The researchers propose several steps to complete the annotation of protein-coding genes in the coming years, including the identification of gene isoforms and pseudogenes. They also discuss the role of new technologies in expanding the list of non-coding RNA genes (one of the main goals of the Carninci Group at HT) and in the functional annotation thereof and emphasise the importance of identifying medically important genes and gene variants and their association with specific disorders. Finally, they also advocate that adopting a standardised description of disease-associated genetic variants would help avoid inconsistencies and misunderstandings in the field.
Finally, Carninci and colleagues suggest that exploring the genomic landscape of different human populations – the so-called human pangenome reference, an international project to which HT researchers from the Population and Medical Genomics Research Centre are also contributing – will provide a more comprehensive view of the gene content of our genome.