Just joining us? Check out Part I and Part II of the series.
Synthetic biology is often touted as a software-driven and automation-empowered approach to biological research. Buzzwords like artificial intelligence and robotics are commonly used alongside the phrase synthetic biology, leading many to assume synthetic biologists spend their time coding and automating huge experiments. This is not quite the case. While technologies like automation and machine/deep learning play an increasing role in our field, it has not become dominated by them yet.
These 21st Century technologies have helped rapidly excel modern R&D by improving the quality and quantity of data. Machine/deep learning algorithms were originally built to simulate human learning, detect patterns, and play complex games. When combined with high-throughput liquid handlers, artificial intelligence has proven useful in advancing molecular biology techniques by complementing the engineering strategies that lie at the core of our field. The perceived value of these technologies within the industry has become so great that graduate programs are now training the next generation of synbio researchers in both biology and coding skills. As part of our continuing blog series on demystifying synthetic biology, we look at how automation and software have advanced synthetic biology R&D, what the current limitations are, and where they make the most impact today.
Data science as a cornerstone
Synthetic biology is built around the Design-Build-Test-Learn (DBTL) cycle. Connecting Design and Learn in the R&D process is arguably the most important part of synthetic biology, as this is where we learn from the data to generate an improved design. It’s no surprise that software to capture and wrangle data, mine databases, and design experiments plays a major role at the junction between the Design and Learn phases.
Computational tools have become essential for today’s biological research, from identifying new protein targets to comparing gene variations. Vast databases of -omics data – collectively containing an organism’s known genes (genome), mRNA (transcriptome), and proteins (proteome) – exist for thousands of organisms and microorganisms. By mining these databases, synbio researchers can start to make predictions. Common prediction targets include the function of an enzyme or group of enzymes in a pathway, the conserved areas of genes likely necessary for their product’s functionality, and regions more amenable to mutation and engineering.
Comparing these sequences alone is not enough to draw insightful conclusions; the genotype data must be related to a meaningful phenotype, that is the physical effect genetic variations have on the organism and its behavior. Knowing which mutations are associated with increased production of a compound means we can rationally design a pathway for the organism relatively quickly using related datasets. But this approach is limited by the quality of the data. Robust, high-quality data are essential for drawing insightful conclusions and making biology a little more predictable. Data science underpins synthetic biology, helping iterate organism design through the DBTL cycle.
Automating research with robotics
Automation enables high-throughput research while reducing human error, dramatically scaling the amount of high-quality data that can be produced. Robotics in modern laboratories looks similar to the automation components used in car factories but on a smaller scale. For example, chemical reagents, DNA, buffers, and even cell cultures are often stored as liquids. Specialized liquid handling robots are adept at consistently and accurately preparing hundreds of samples simultaneously, compared to just dozens of samples by manual preparation.
Many of the molecular biology techniques used in the Build phase are amenable to automation. The most common involve performing molecular cloning techniques like PCR, assembling DNA sequencing libraries, and transforming target organisms like E. coli or yeast. Automation also plays a role in the Test phase, when small-scale fermentation and cell culture can be performed and monitored, helping to screen up to hundreds of engineered strains coming from a high-throughput Build step. The increased throughput and consistency afforded by automation help synthetic biologists standardize their research and boost the statistical power of data to help draw meaningful conclusions.
However, automation is not a magic bullet for enabling R&D. Every experiment must be tested and optimized and one machine cannot do every single step, nor can everything be automated. Programming and optimizing new experimental steps across several devices requires a lot of time and specific skill sets uncommon among biologists. Bottlenecks in the R&D process may occur where certain steps simply cannot be automated.
While automation is not essential for synthetic biology, its benefits of consistency and accuracy have led to widespread adoption across the field. In addition, automated platforms can also capture valuable metadata such as ambient temperature and precise timings of each step, creating a robust data set for each experiment and helping identify deviations.
Artificial intelligence in synthetic biology
Data is more available than ever. Thanks to automation, we have mountains of internally generated biological data, as well as publicly available research and -omics databases. How can we sort through it all to derive meaningful insights? Often, there is so much information that we cannot know exactly where to look, or we are trying to make a prediction based on obvious conclusions. So, how do we investigate the non-obvious and get the most from data? This is where artificial intelligence software like machine learning (ML) and deep learning (DL) tools come into play for synthetic biologists, capitalizing on the sheer amount of data now available in the field.
ML and DL algorithms can play a role in the Design phase, for example, by suggesting iterations on previous experimental conditions or proposing beneficial genetic modifications that may improve product yield. These algorithms can point out non-obvious correlations in the data to identify connections between genotype and phenotype that would otherwise escape our notice.
To spot these trends and patterns within datasets, the data must be statistically powerful enough to have meaning – after all, you don’t want to chase a perceived pattern in what turns out to be random noise. High-throughput and high-quality data capture provide statistically powerful data for modeling, making predictions, and further analysis with ML and DL algorithms. Good quality data are therefore essential for feeding into these algorithms and obtaining meaningful outputs. As the old computer science axiom goes: garbage in equals garbage out.
Artificial intelligence is surging in popularity across all fields, synthetic biology included. Though its value and potential impact are clear, artificial intelligence remains a relatively inaccessible technology for many biologists – most biologists are not software engineers and most software engineers don’t understand the intricacies of biological systems. Biology is incredibly complex and these tools are only as good as the data fed into them, which must be validated to further train and improve their capability. Synthetic biologists must become fluent in both disciplines and work in a cross-disciplinary environment to maximize the impact of this very new, highly valuable, and rapidly evolving technology.
Faster solutions to world problems
Data is the Alpha and Omega of synthetic biology, driving forward novel and sustainable bio-based solutions to global challenges. High-quality datasets inform everything from our experimental design to the tools we use to capture new data. Automation tools help standardize molecular biology and allow a higher throughput of data gathering, while artificial intelligence software can help us make sense of more data than ever before.
The synthetic biology industry is increasingly adopting cross-discipline teams to make the most of these technologies, bringing together biologists, software engineers, data specialists, and more. Future synthetic biologists will be better equipped with the skills to use these technologies, as budding biologists are now encouraged to learn to code and gain a deeper understanding of data and statistical knowledge. These technologies allow researchers to test more hypotheses accurately in less time, leading to more cost-effective and robust innovation of bio-based solutions to global challenges.
Next Time …
In the next edition of our Demystifying Synthetic Biology series, we’ll dig into fermentation and the many variables synthetic biologists encounter at this stage of the process. We’ll discuss the history of fermentation and how a centuries-old technology once used exclusively for brewing beer and fermenting foods has been reimagined and reapplied to solve some of the most pressing challenges of our lifetime.
Follow Antheia on LinkedIn to stay up to date on our newest demystifying synthetic biology blogs.