In today’s blog, we are pleased to introduce Ben Kotopka, Director of Data Science at Antheia. Ben’s work enables teams across the business to track and make sense of more than a decade of R&D data. We sat down with Ben to learn more about his background and what inspires him about working at Antheia:
Tell us about your background and career journey to Antheia.
I’ve been interested in genetic engineering since reading Jurassic Park as a child. This ultimately sparked my interest in metabolic engineering in high school and then led to studying molecular biology in undergrad at Princeton, followed by a PhD in bioengineering at Stanford.
During graduate school, I worked in our CEO and co-founder Dr. Christina Smolke’s lab. I wasn’t directly involved with the scientific work that led to Antheia’s founding, but I was an observer to the earlier parts of the story, and that drove me to rejoin the team later in my career so I could be part of the next chapter. I’m originally from eastern Nebraska and being from the Midwest, I’m very interested in new applications for the kind of agricultural work we do here.
After graduating, I collaborated with a fellow classmate on a brain MRI segmentation project that evolved into a startup called BrainKey. I was the co-founder and Chief Scientific Officer, which offered me a tremendous opportunity to see what it takes to build a startup – we were part of Y Combinator in 2019 and experienced the fundraising process firsthand. I learned that responsibility and flexibility are crucial. As a founder, there’s no room to say, “That’s not my job,” and I’ve kept that mindset throughout my journey.
When the pandemic hit, my career perspective shifted and I found myself wanting to get back into wet lab biology, which brought me to Antheia at first as a consultant and then with a full-time position as the head of data science in 2021.
Can you talk more about the field of data science and how you got into it?
Data science is still an emerging field. A lot of people in this line of work got started because they had a problem to solve that required them to pick up the skill set. We’re now seeing universities offer master’s degrees in data science, but your typical data scientist – myself included – likely does not have data science on the education section of their resume.
While at Stanford, I was working on ways to design artificial promoters, which are short DNA sequences that control the behavior of genes. The approach I took was measuring the activity of about a million randomly generated artificial promoters and then using that data to inform the design of new ones that would do what we wanted them to. This project was my entry point into the broader data science field.
In 2017, a friend of mine shared an article about recent progress using neural networks for images and I realized I could apply those techniques to the problem I was trying to solve at the time. I set up a basic experimental network, threw my data at it, and for the first time, I ended up with a system that seemed to understand what was going on in my artificial promoters.
What does your day-to-day work as Director of Data Science look like at Antheia?
Data science is like a pyramid. At the lowest levels, you have data integrity, ensuring we are not losing experimental information that we spend a lot of time and money collecting. In the middle, we should be able to easily access and visualize data, and then the top of the pyramid would be AI or an automated system pulling out insights from the data with less human intervention.
As the director of data science at Antheia, I’m responsible for all levels of that pyramid. On any given day, I could be writing code, coordinating a project, analyzing data, building a visualization, constructing a database, or sequencing the DNA of a strain. Since every aspect of our R&D team both generates and uses data, I work closely with biology, fermentation, bioanalytics, and downstream processing (DSP) to ensure that efficient systems are in place to best support our goals.
How important is data science throughout the R&D process?
Data science is critical. We’ve done tens of thousands of different experiments at the company over the years. Without putting some real thought and effort into a data strategy, those experiments would at best be logged in a stack of hundreds of excel spreadsheets with random names and storage methods, making it impossible to know what was done in 2018, for example, or how it informs what we’re doing today.
For the biology team, we need to keep track of what yeast strains they’ve built and how they’re related to each other genetically. They need to be able to see the difference between two strains, so I’ve created tools to help them do that more efficiently and effectively. The fermentation team runs 60 tanks a week, requiring many experimental and analysis steps, so software has had a big impact in more seamlessly orchestrating that work for fermentation and bioanalytics operators. It also allows them to effectively receive, store, and display the resulting data. The bioanalytics team takes hundreds of samples per week of our fermentation broth and analyzes the chemical composition. That data needs to be stored in an organized way so we can easily find it and leverage it in the future. The DSP team, which is responsible for refining the fermentation broth into a finished product, has the largest number of experiments and processes. DSP has to split and recombine samples in complex and varied ways as they test different refining strategies, and analyzing these results requires many different instruments. As a result, they need careful accounting of what’s been done over time.
At all levels, there are data science tools that are being used, developed, and maintained, and that’s where I come in. Without a data science team, you end up with a messy situation. Scaling a new technology and growing a business – especially in the sciences – is impossible without these systems in place.
What made you say yes to Antheia, first on a contractor basis and then eventually full-time? What inspires you about the work you’re doing?
I always wanted to work in metabolic engineering and when I learned about Christina’s lab at Stanford, I knew this was what I wanted to work on. As a reader and a lover of stories, it’s a privilege to feel that my work is part of a wonderful, potentially triumphant story. It’s a huge opportunity and personally, it’s something I have to see through.
I think a lot about self-reliance and sovereignty, especially how fragile the global supply chain has become. As a country, we could run into shortages of vital drugs if we don’t have the ability to produce them more efficiently here in the U.S. There’s a big opportunity for Antheia to help solve this if (or when) that ever becomes an issue. Moderna’s innovation and swift action during the COVID crisis is an excellent example of how advanced technologies like synthetic biology can – and should – be a part of our national strategy. You never wish for a crisis, but with climate change and geopolitics, more issues are arising that impact the availability of critical drugs through traditional routes. Knowing that Antheia is building the solution for these shortages is a big deal and I’m proud to be part of that.
How has your data science work impacted the path toward Antheia’s recent commercial milestones?
The team that figured out the strain and the process required to reach those milestones did so based on the results of thousands of experiments. Those results and the visualizations used to make decisions and the insights that drove progress were done with data science tools. While the team members are the main actors in the play, I’m the crew backstage making sure the stage lights are pointed in the right direction and the microphone is on – if I’m doing my job right, you won’t even know I’m there.
What are the hot topics and trends in data science and AI that you’re following?
One big area of progress is that the AI hype cycle is starting to calm down and we’re seeing a more balanced appraisal of the technology. We are beginning to view AI systems not as gods but as tools, which I think is a healthy shift. Currently, I’m interested in applications that help us handle unstructured data. Often, you have a “notes” field in experiment descriptions where scientists can add in context about their work. It’s an important feature that allows them to explain things in more detail, but that unstructured data is hard to analyze computationally years down the road. Generative AI might give us the ability to pull out valuable information from those notes at scale.
I’m also excited about the potential for AI in protein design, which is complementary to our work in pathway engineering. We’re trying to build a bridge from glucose to a target molecule, and AI seems like it’s becoming more tractable to design individual enzymes for that process, some of which may not even exist yet.
I don’t believe there’s a future where we give an entire data set to a machine and it replaces a team of scientists, but AI can complement the work we do. AI can’t match a human’s flexibility and ability to understand broad context, but we are seeing places where it can push us forward and I look forward to seeing how we can leverage it in our workflows.
What would you say makes Antheia a great place to work?
As a remote employee, there’s always been a great deal of trust. I’ve never felt micromanaged, and the leadership is confident and trusting, which inspires me to put my best foot forward. We have meetings where people engage with other departments with real interest, showing that everyone is not just focused on their piece of the puzzle, but on our overall mission. Maintaining the spirit of a small company with 50+ people speaks volumes about what’s been built at Antheia.