New tRNA genes can emerge through multiplication by transposable elements.
Transfer RNAs (tRNAs) are an underappreciated group of molecules. Apart from transporting the amino acids to the protein-synthesizing ribosomes, they fulfil are range of other important functions in the cell (nicely summarized in this review). During a six-month postdoc at the Karolinska Institute (Stockholm, Sweden), I entered the wonderful world of tRNA genes under the expert guidance of Claudia Kutter. The tRNA-team further consisted of PhD-student Keyi Geng and transposon-guru Alexander Suh. Taking advantage of the increasing number of available bird genomes, we explored the evolutionary dynamics of tRNA genes in our feathered friends. And we were in for some surprises! But before we dive into the results, let’s refresh some basic molecular biology.
The Genetic Code
DNA contains the instructions to make proteins. But the DNA alphabet consists of four letters (A, T, G and C), while the language of the proteins has twenty letters (i.e. amino acids). How does the cell translate DNA language into protein language? The Russian physicist George Gamow looked at the problem through a mathematical lens. How can you combine four letters so that each of the twenty amino acids has a unique DNA code? A code based on two DNA letters does not work, because it only yields sixteen combinations. Not enough for the twenty amino acids. But a code with three DNA letters is possible. That code results in 64 different letter combinations, more than enough for twenty amino acids.
However, this solution led to another problem. Which combination of three DNA letters codes for which amino acid? In 1954 the American biologist James Watson founded the RNA Tie Club with Gamow to crack this genetic code. This club had twenty members (one for each amino acid) and four honorary members. Each member was given a woolen tie with the double helix embroidered on it. One of its members, South African biologist Sydney Brenner, suggested the term “codon” to refer to a combination of three DNA letters. Sixty-four codons and twenty amino acids. Who could solve this puzzle?
The first codons were deciphered with simple experiments. Scientists made long strands containing one DNA letter in the lab. These strands were then translated by a cell into a chain of amino acids. When one strand of As was used, a long chain of the amino acid lysine was formed. And if a strand of C’s was translated, the chain consisted only of the amino acid proline. The conclusion was crystal clear: AAA codes for lysine and CCC codes for proline. With further experiments, researchers deciphered the entire genetic code. The code even turned out to contain start and stop signs. The codon ATG – which codes for methionine – marks the starting point of a protein, while the codons TAA, TGA and TAG signal the end.
The genetic code. Notice that the DNA-letter T (thymine) has been replaced with the RNA-letter U (uracil).
Redundancy and Wobbling
With sixty-four codons and twenty amino acids, it is obvious that multiple codons code for the same amino acid. For example, AAT and AAC both refer to asparagine. This phenomenon, known as the redundancy of the genetic code, provides the cell with some protection against mutations. A mutation in a codon can lead to a different amino acid or stop character in the protein chain. Suppose the codon AAA (which codes for lysine) mutates into TAA (a stop codon). This mutation puts a stop codon in the wrong place in the protein and the production of that protein is stopped prematurely with possible negative consequences for the cell. A detailed look at the genetic code shows that many mutations, however, have no harmful effect. Take the amino acid alanine, which is encoded by the codons GCT, GCC, GCA and GCG. A mutation at the third position of these codons, for example from GCT to GCC, does not lead to a change in amino acid, as both GCT and GCC refer to alanine. After this mutation, the cell still produces the same protein.
Each organism has a certain number of tRNAs per codon available. For example, humans have 44 tRNAs for lysine: 24 for the codon AAG and 20 for the codon AAA. Certain tRNAs are also missing in the human genome. You will not encounter any tRNAs for GGT (glycine), CGC (arginine) or CAT (histidine). Fortunately, there are other tRNAs that provide these amino acids. The absence of certain isoacceptors is explained by wobble base pairing, in which the third anticodon position can deviate from the standard Watson-Crick base pairing, allowing for the translation of multiple synonymous codons by a single tRNA. In addition, modifications at certain positions in the anticodon loop can improve translational efficiency. For example, in the G34 anticodon sparing strategy, an enzyme converts adenine-34 to inosine-34 in specific isoacceptors. This conversion enables position 34 to wobble with adenine, cytosine and uridine. One tRNA molecule can thus be used for multiple codons in the mRNA.
The third position in the anticodon can wobble, allowing it to bind with both C and U on the mRNA molecule. Hence, multiple codons can be translated by a single tRNA.
Genome Size Reduction
Now that we have refreshed our knowledge about the genetic code, we can finally dive in to the results of the paper, which appeared in Genome Biology and Evolution. Comparing the total number of tRNA genes between avian genomes and other vertebrates revealed a striking pattern. On average, birds have about 169 tRNA genes which is significantly less than reptiles (466), amphibians (1229), mammals (579) and fish (813). This reduction could be a by-product of an evolutionary trend towards smaller genomes in birds through deletions of non-coding DNA. Interestingly, the tRNA gene repertoire in birds still contains all necessary tRNA genes that code for all twenty amino acids. Moreover, when we investigated the expression of tRNA genes in chicken (Gallus gallus) and zebra finch (Taeniopygia guttata), we found that all twenty amino acids were represented by at least one tRNA gene. Hence, the reduction in tRNA gene number and complexity in birds occurred within the functional constraints on efficient protein translation mechanisms.
An overview of the total number of tRNA genes in the genomes of birds (brown), reptiles (green), mammals (orange), amphibians (yellow), fish (blue) and yeast (black). From: Ottenburghs et al. (2021) Genome Biology and Evolution.
At the very start of the project, we noticed that some bird species had a overrepresentation of certain tRNA genes when we did not apply a quality filter. Why would the Dalmatian pelican (Pelecanus crispus) need almost 600 tRNA genes for isoleucine? And what does the bar-tailed trogon (Apaloderma vittatum) do with 2750 valine tRNA genes? Close inspection of these tRNA genes uncovered the presence of transposable elements: SINEs (short interspersed elements) to be precise. These selfish and highly active genetic elements incorporate themselves into genomes by a copy-paste mechanism, continuously giving rise to new genomic loci. If a tRNA gene is associated with such a transposable element, it can quickly increase in frequency. Our detailed analyses of these SINEs pointed to several known elements, such as TguSINE1 in Eupasseres (i.e. all passerines except the New Zealand wrens) and ManaSINE1 in manakins. But we also discovered some new SINEs, including PeleSINE1 in the Dalmatian pelican and ApalSINE1 in bar-tailed trogon.
And now for my personal favorite fact in our paper: the evolutionary dynamics of transposable elements in the golden-collared manakin (Manacus vitellinus) genome. This small songbird houses two SINEs that have been active at different times during its evolution: TguSINE1 was jumping around about 30 million years ago, while the activity of ManaSINE1 is more recent (about 5 million years ago). When these transposable elements become inactivated by the cell, they get “stuck” in a genomic location and start accumulating mutations. Because TguSINE1 was active millions of years before ManaSINE1, we expected that copies of this transposable element have accumulated more mutations.
We could test this prediction by using the Cove score which is calculated by the tRNAscan-SE progam. This score is based on the ability of a given sequence to form tRNA stem-loop structures and the presence of particular promoter and terminator sequences. Active tRNA genes will have high Cove scores, whereas inactive tRNA genes might be decaying into pseudogenes, leading to lower Cove scores. So, our prediction was straightforward: TguSINEs should have lower Cove scores compared to ManaSINEs. And that is exactly what we found (see figure below)! Isn’t it great when you find support for a hypothesis?
The evolution of transposable elements in the golden-collared manakin genome. Based on past activity patterns, we expected that TguSINEs would have lower quality scores compared to ManaSINEs. And that is exactly what we see. From: Ottenburghs et al. (2021) Genome Biology and Evolution.
Based on the findings of our work (and there is much more that I did not cover in this blog post), we formulated a model for the coevolution of tRNA genes and transposable elements. Some parts of this model are strongly supported by our results, while others still need to be investigated further.
The model consists of three phases. First, a TE recruits a copy of a tRNA gene for its own mobilization and increases its copy number in the genome. Second, the TE is silenced by epigenetic control mechanisms. Third, some TE-associated tRNA genes decay into pseudogenes, while others remain transcriptionally active and become coopted for their original tRNA function.
The last phase in the model – transposable elements that become active tRNA genes – is difficult to prove because the activity of these genes can be due to functioning as an actual tRNA gene or simply as an active TE. In zebra finch, we noticed that the correlation between codon usage in proteins and the available tRNA genes improved if we included transposable elements. This suggests that the activity of transposable elements can help shape the tRNA gene repertoire of an individual and potentially improve the efficiency of protein translation.
The correlation between codon usage and tRNA genes in the zebra finch genome improved if we included transposable elements (notice how rho increases and the p-value decreases). From: Ottenburghs et al. (2021) Genome Biology and Evolution.
An Extra Acknowledgement
I would like to end this long blog post with a personal note. The story behind this paper already starts in January 2016 when I attended the Plant and Animal Genomics (PAG) conference in San Diego. I was invited to give a talk at the “Avian Genomics” workshop (organized by Robert Kraus) where I met another invited speaker: Alexander Suh. Fast-forward to the spring of 2017: I was getting ready to move to Uppsala to start a postdoc, studying the genomics of hybridization in geese. I had already signed the contract and secured an apartment when my postdoc supervisor informed me that the sequencing of the samples was delayed. He wanted me to start a few months later so that I could work on the data during my entire postdoc. I tried to convince him that I could work on other projects while I waited for the goose data, but he would not give in. To fill the unexpected gap in my schedule, I contacted several people that might need help with a small project. Alex replied quickly and told me about a potential collaboration with Claudia Kutter in Stockholm. We discussed the project over Skype and a few weeks later I was on a plane to Sweden. I thoroughly enjoyed my six months in Claudia’s group and I am happy that we managed to turn my work into a nice paper. But more importantly, I made countless new friends during my Swedish adventures.
All invited speakers at the Avian Genomics workshop in San Diego where I (third from the left) met Alexander Suh (second from the left). The workshop was organized by Robert Kraus (right).
Ottenburghs, J., Geng, K., Suh, A. & Kutter C. (2021) Genome size reduction and transposon activity impact tRNA gene diversity while ensuring translational stability in birds. Genome Biology and Evolution. Early Online. https://doi.org/10.1093/gbe/evab016
Featured image © Claudia Kutter.