Eighty million years ago humans, rats, and mice shared the same mammalian ancestor.
More recently, researchers made the astonishing discovery that even today our genomes contain close to 500 segments that have remained totally unchanged since then. These segments are called ultra-conserved elements (UCEs). Nearly all the UCEs are also highly unchanged in the chicken and dog genomes, and many are significantly conserved in fish, too.
What biological constraint maintained the UCEs intact in so many different genomes for these tens of millions of years?
For a long time scientists had no idea — so much so that our ignorance was considered profound.

But a team made a breakthrough this year. Researchers led by David J. Elliott of Newcastle University in the UK reported in EMBO Journal that a UCE in a mouse gene has an important role in limiting the production of the protein encoded by this gene.
Using genetic engineering, they deleted this gene in mouse testes, and found that these mice over-produced the corresponding protein in their testes. The overproduction resulted in death of the sperm-producing cells and the mice becoming infertile.
This result suggested that if the UCE underwent any change that interfered with its role in limiting that protein’s levels, it would result in loss of sperm production. Thus the altered UCE wouldn’t be transmitted to the next generation, accounting for the maintenance of the UCE across species.
From gene to protein
The DNA is a double-helix molecule. Each helix is a string of four bases. The double helix is held together because a base on one strand bonds with a base on the other. Each bond represents a base-pair. A gene is a relatively short stretch of the DNA molecule, typically only a few thousand base-pairs long.
When a gene is ‘expressed’, the cell copies its sequence of bases into a messenger RNA (mRNA) and loads it onto a cellular machine called the ribosome. There its base sequence specifies the sequence amino acids should be stitched together to make the protein encoded by the gene.
The mRNA also has any one of three short sequences of bases called stop codons. When the ribosome encounters a stop codon, it stops adding more amino acids and releases the newly synthesised protein.
Our genome contains 20,000 genes that code for proteins and another 20,000 used to make RNA that influence the expression of other genes.
(Some DNA sequences, called promoters and enhancers, bind to regulatory proteins which specify when and where other genes are copied into mRNA. These too are called genes because changes in their sequence can have visible effects on the organism.)
The poison exon
Researchers first identified UCEs as DNA sequences longer than 200 base-pairs that retained perfect sequence identity across the human, rat, and mouse genomes. That is, they hadn’t tolerated even a single base-pair change in the last 80 million years.
Most UCEs don’t code for proteins. Initially researchers thought the UCEs might be exceptionally long enhancers. A subset even showed enhancer activity in the mouse. However, mice bearing altered versions of UCEs didn’t have significantly perturbed enhancer function, so the enhancer role couldn’t account for the extreme conservation.
The fact that some UCEs could be deleted from the genome without any observable consequence only added to their intrigue.
After the RNA is copied from a gene’s DNA strand, the cell subjects it to a maturation step called splicing: splicing removes, or splices out, segments called introns from the newly made mRNA. The segments retained in the mature mRNA are called exons.
For some genes, an intron is removed from only a subset of mRNA molecules but retained in the rest. This gives rise to alternative forms of mRNA called splice variants: they vary in whether they contain an intron.

The mouse Tra2b gene has eight introns and nine exons. Interestingly, the Tra2β protein that the gene encodes for is used for splicing. There is a UCE embedded within the first intron of the Tra2b gene. Once the level of the Tra2β protein rises above a certain threshold in the cell, it recognises this UCE as an additional exon to be included in a new splice variant of the gene’s mRNA.
The new exon does not introduce new protein-coding sequences. Instead, it contains multiple stop codons that cause protein synthesis to terminate, the mRNA to fall off the ribosome, and then the mRNA to enter a degradation pathway.
Effectively, the new exon prevents further accumulation of the Tra2β protein. Hence it is also called a poison exon.
In sum, the UCE in the Tra2b gene’s first intron acts as a poison exon that limits the production of the Tra2β protein.
A precise intervention
A protein named Cre can recognise certain short sequences in the DNA and bind to them. The binding causes the DNA sequence located between the two short sequences to be cut out.
The researchers inserted these sequences into the first intron of the Tra2b gene, on either side of the UCE. Next, they engineered mice to express Cre in only the sperm-producing cells of the testes. These mice thus lacked the poison exon and were unable to limit the production of the Tra2β protein in their testes. This led to death of the sperm-producing cells.
Admittedly, completely cutting out a UCE from the genome is not the same as changing one or a few of its base-pairs. As of now, we know no biological function that depends on a unique 200 base-pair DNA sequence. If altering even a single base-pair of the UCE keeps it from being included as a poison exon in the Tra2b splice variant and renders the mouse infertile, then an 80-million-year-old mystery will be cracked.
The new study represents a big step towards achieving this goal.
D.P. Kasbekar is a retired scientist.
Published – February 26, 2025 05:30 am IST