All insights
EngineeringJune 30, 20265 min read

Using AI to mine your genome for hidden virus-like elements

An open-source tool for detecting LINE-1 insertions in whole-genome sequencing data

In the 1940s, Barbara McClintock discovered jumping genes - virus-like genetic elements that activate in response to stress and copy themselves across the genome. She did this through years of painstaking research under a microscope. It took over 20 years for her work to be widely accepted. She was awarded the Nobel Prize in 1983

Barbara McClintock in 1947

Jumping genes, now known as retrotransposons, helped us evolve by reshaping our genomes - but this came at a cost. They are normally silenced in adults, but activate under stress in diseased cells. They contribute to cancer, neurodegeneration, autoimmunity, and neurodevelopmental disorders

They do this by damaging DNA. They also trigger an immune response, as our cells think they are infected by a virus. In fact, some long-lived animal species have completely inactivated retrotransposons through mutation. Crucially, there are ways to slow or stop retrotransposons. The challenge is identifying which patients to treat, and when. For that, we need to collect and analyze genomic sequences from diseased tissue

It is hard to detect retrotransposons in genomic data, even today. Using AI, we were able to develop a new tool that makes this process easier. We applied it to Illumina whole-genome sequencing data. The tool can mine the genome for candidate insertions and even detects ones that were not reported in samples sequenced with multiple technologies by large academic studies

A few years ago, prototyping this might have taken months. With AI, we had a working version in two weeks

Screenshots from random sections of chromosome 22 in a healthy individual. Grey bars represent unmutated DNA, and colors indicate either a mutation or sequencing errors. The final screenshot shows a barcode-like signature indicating a retrotransposon insertion - one that had not previously been reported in this individual in published studies using the same data

AI is very human-like in its limitations, but super-human in its strengths. It took shortcuts, made assumptions, introduced bugs, caused regressions, and used inefficient or bad practices. Yet the final product was far beyond what we could have achieved alone

Knowledge and experience help guide the AI in the right direction. You can't one-shot new science (yet) - this took thousands of prompts, multiple agents, and 40 thousand lines of code

The demand for code only increases as it gets easier. Jevons' Paradox is real. And vibe-coding is more art than science. We look forward to using the next models as soon as they are available

This tool is one step towards developing therapeutics for retrotransposons, and it is open sourced on GitHub. It was built primarily using GPT-5 Codex, Cursor, Amazon Web Services, JupyterLab, and specialized bio tools.

Have a question about this piece, or a collaboration in mind? Get in touch

Further reading

All insights