When it was first discovered, the coelacanth caused a lot of excitement. It was a living example of a group of fish that was thought to only exist as fossils. And not just any group of fish. With their long, stalk-like fins, coelacanths and their kin are thought to include the ancestors of all vertebrates that aren’t fish—the tetrapods, or vertebrates with four limbs. Meaning, among a lot of other things, us.
Since then, however, evidence has piled up that we’re more closely related to lungfish, which live in freshwater and are found in Africa, Australia, and South America. But lungfish are a bit weird. The African and South American species have seen the limb-like fins of their ancestors reduced to thin, floppy strands. And getting some perspective on their evolutionary history has proven difficult because they have the largest genomes known in animals, with the South American lungfish genome containing over 90 billion base pairs. That’s 30 times the amount of DNA we have.
But new sequencing technology has made tackling that sort of challenge manageable, and an international collaboration has now completed the largest genome ever, one where all but one chromosome carry more DNA than is found in the human genome. The work points to a history where the South American lungfish has been adding 3 billion extra bases of DNA every 10 million years for the last 200 million years, all without adding a significant number of new genes. Instead, it seems to have lost the ability to keep junk DNA in check.
Going long
The work was enabled by a technology generically termed “long-read sequencing.” Most of the genomes that were completed were done using short reads, typically in the area of 100–200 base pairs long. The secret was to do enough sequencing that, on average, every base in the genome should be sequenced multiple times. Given that, a cleverly designed computer program could figure out where two bits of sequence overlapped and register that as a single, longer piece of sequence, repeating the process until the computer spit out long strings of contiguous bases.
The problem is that most non-microbial species have stretches of repeated sequence (think hundreds of copies of the bases G and A in a row) that were longer than a few hundred bases long—and nearly identical sequences that show up in multiple locations of the genome. These would be impossible to match to a unique location, and so the output of the genome assembly software would have lots of gaps of unknown length and sequence.
This creates extreme difficulty for genomes like that of the lungfish, which is filled with non-functional “junk” DNA, all of which is typically repetitive. The software tends to produce a genome that’s more gap than sequence.
Long-read technology gets around that by doing exactly what its name implies. Rather than being able to sequence fragments of 200 bases or so, it can generate sequences that are thousands of base pairs long, easily covering the entire repeat that would have otherwise created a gap. One early version of long-read technology involved stuffing long DNA molecules through pores and watching for different voltage changes across the pore as different bases passed through it. Another had a DNA copying enzyme make a duplicate of a long strand and watch for fluorescence changes as different bases were added. These early versions tended to be a bit error-prone but have since been improved, and several newer competing technologies are now on the market.
Back in 2021, researchers used this technology to complete the genome of the Australian lungfish—the one that maintains the limb-like fins of the ancestors that gave rise to tetrapods. Now they’re back with the genomes from African and South American species. These species seem to have gone their separate ways during the breakup of the supercontinent Gondwana, a process that started nearly 200 million years ago. And having the genomes of all three should give us some perspective on the features that are common to all lungfish species, and thus are more likely to have been shared with the distant ancestors that gave rise to tetrapods.