Using AI to mine your genome for hidden virus-like elements
An open-source tool for detecting LINE-1 insertions in whole-genome sequencing data
In the 1940s, Barbara McClintock discovered jumping genes - virus-like genetic elements that activate in response to stress and copy themselves across the genome. She did this through years of painstaking research under a microscope. It took over 20 years for her work to be widely accepted. She was awarded the Nobel Prize in 1983
Jumping genes, now known as retrotransposons, helped us evolve by reshaping our genomes - but this came at a cost. They are normally silenced in adults, but activate under stress in diseased cells. They contribute to cancer, neurodegeneration, autoimmunity, and neurodevelopmental disorders
They do this by damaging DNA. They also trigger an immune response, as our cells think they are infected by a virus. In fact, some long-lived animal species have completely inactivated retrotransposons through mutation. Crucially, there are ways to slow or stop retrotransposons. The challenge is identifying which patients to treat, and when. For that, we need to collect and analyze genomic sequences from diseased tissue
It is hard to detect retrotransposons in genomic data, even today. Using AI, we were able to develop a new tool that makes this process easier. We applied it to Illumina whole-genome sequencing data. The tool can mine the genome for candidate insertions and even detects ones that were not reported in samples sequenced with multiple technologies by large academic studies
A few years ago, prototyping this might have taken months. With AI, we had a working version in two weeks
AI is very human-like in its limitations, but super-human in its strengths. It took shortcuts, made assumptions, introduced bugs, caused regressions, and used inefficient or bad practices. Yet the final product was far beyond what we could have achieved alone
Knowledge and experience help guide the AI in the right direction. You can't one-shot new science (yet) - this took thousands of prompts, multiple agents, and 40 thousand lines of code
The demand for code only increases as it gets easier. Jevons' Paradox is real. And vibe-coding is more art than science. We look forward to using the next models as soon as they are available
This tool is one step towards developing therapeutics for retrotransposons, and it is open sourced on GitHub. It was built primarily using GPT-5 Codex, Cursor, Amazon Web Services, JupyterLab, and specialized bio tools.
Have a question about this piece, or a collaboration in mind? Get in touch
