Google makes AI tool, that makes sense of your Genome open source

Its been almost 15 years after scientists first sequenced the human genome, making sense of the enormous amount of data that encodes human life remains a formidable challenge. But it is also precisely the sort of problem that machine learning excels at.

Google created  DeepVariant that uses Artifical Intelligence Techniques to improve the accuracy of genomic sequencing accurately.

On Monday Google Brain Team has announced an open source version of DeepVariant to encourage its use and collaboration. It was created Google Brain team (open-ended machine learning research with system engineering and Google-scale computing resources) and Verily, another Alphabet subsidiary that is focused on the life sciences.

To further this goal, we partnered with Google Cloud Platform to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API,” they wrote in Google blog. “This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a cloud-based solution to satisfy the needs of even the largest genomics datasets.”- they said

DeepVariant has the capability to automatically identify deletion mutations, small insertions and single-base-pair mutations in sequencing data.  From 2000, High-throughput sequencing can only provide a data with limited, error-prone snapshot of a full genome.

Tools like GATK, VarDict, and FreeBayes were used to interpret these readouts, however, these programs use simpler statistical and machine-learning approaches to identifying mutations by attempting to rule out read errors.

“One of the challenges is in difficult parts of the genome, where each of the [tools] has strengths and weaknesses,”– Brad Chapman, a research scientist at Harvard’s School of Public Health who tested an early version of DeepVariant.

The Verily team collected high-throughput reads and fully sequenced genomes from the GIAB (Genome in a Bottle) project to and fed the data to a deep-learning system and painstakingly tweaked the parameters of the model until it learned to interpret sequenced data with a high level of accuracy.

“We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we’ve further reduced the error rate by more than 50{4c121b525f261b6e9cd1def8a3985dbe850ad058af7b7776b0968f5bd69c65ba}.” — Google Brain said.

“DeepVariant is the first of what we hope will be many contributions that leverage Google’s computing infrastructure and machine learning to both better understand the genome and to provide deep learning-based genomics tools to the community,” DePristo and Poplin said.

Google Press Release

Leave a Reply

Your email address will not be published. Required fields are marked *