The Tree of Life Is Rooted in Math
Claudia Solís-Lemus reveals a clearer picture of the evolutionary interconnectedness of organisms by modeling data, both big and small
Claudia Solís-Lemus has always loved numbers. Throughout her schooling in Mexico City, she found mathematics classes to be the easiest and most interesting — perhaps because the logic from which math is derived naturally made sense to her. She’s also a puzzle enthusiast, and math exercises felt like opportunities to solve them.
Now an assistant professor with the Department of Plant Pathology and the Wisconsin Institute for Discovery, Solís-Lemus is a statistician embedded in a life sciences department. She solves puzzles for biologists; in turn, biologists inspire the statistical tools she builds. She is pursuing two main lines of work.
First, in the field of evolutionary biology, she aims to understand how we determine which organisms are more closely related to others and how we can better estimate networks of life, which can have practical applications in the life sciences. These phylogenetic networks are like standard evolutionary trees that link species to one another and to common ancestors, but hers become more complicated because they account for gene flow or hybridization between species.
Second, Solís-Lemus studies microbiomes; specifically, how microbial communities help or harm plants through interactions with pathogens and impact on growth and yield. By using mathematical modeling, she can study more variables at one time to determine how different species of microbes may be interacting with a number of characteristics in a field, such as fertilizer use or drainage. By connecting these elements, she creates a tool that will allow growers to input variables related to microbial abundance and environmental conditions and come away with a network that will tell them how connected those components are and, ultimately, how to manage their plants.
What is it like being a statistician in a plant pathology department?
I have worked on applications related to biology ever since my Ph.D. program, where I focused on developing methods for evolutionary biology. Being in a biological department is mutually beneficial, I think. I get to collaborate with really smart people who have many biological questions they want to explore, and they collect a lot of data. I bring my expertise as a data scientist to help them analyze their data, and it ultimately inspires the statistical tools that I develop. I think it’s a really nice symbiosis.
What is big data, and how do the tools you’ve developed address big data?
We’re collecting massive amounts of data, and it’s important to be able to handle that data. In biology, because the cost of genetic sequencing is dropping, for example, people are sequencing more whole genomes for different species. The standard method for reconstructing phylogenetic networks could only handle 10 species at the most. We simplified the process so that we were approximating the answer instead of trying to find the exact network. We proved that this was accurate enough. But it was much more scalable and much, much faster.
Now we are able to handle 25 or 50 species with our current tools. Even with just 52 species, you have more possible phylogenetic trees than there are atoms in the universe. But biologists would love to have networks for hundreds or thousands of taxa [i.e, groups of organisms], so we’re still investigating ways to approach that.
There is a balance with big data. You either go the exact route, but then it is not scalable, or we come up with ways to approximate the answer that you lose a little bit of accuracy. So, there is this trade-off of accuracy and scalability.
Why is it also important to also think about smaller data sets?
Everybody talks about big data. But there’s also the problem on the other end. For certain machine learning methods, you need a lot of data for them to work. But some of my collaborators do not have that many replicates [repeats of an experiment] because they’re doing the experiments in their lab. We don’t necessarily always have gigantic data sets, and so we still want to be able to draw conclusions from smaller data sets. We want to have methods for both.
In what ways are you seeing the tools you develop being used?
Recently, people have been using a lot of phylogenetic techniques to track and understand the evolution of the coronavirus. That’s one area where people are using the tools that I’m creating. This has been a learning opportunity. Many of our methods are not primarily for viral evolution. They are meant for eukaryotes, plants, or animals. Viruses have completely different ways of evolving, so some of our methods are not suitable to explain their evolution. That’s why I think it’s important that we continue developing phylogenetic networks.
For the microbiome work, if you’re a farmer, you want to protect crops from pathogens, right? The standard way is to use chemicals — fungicides, for example. But pathogens evolve and become resistant. Some of the work that we’re doing is trying to not rely on chemicals but on the microbiomes — the plant microbiome, the soil microbiome. We’re asking what is it about the microbial communities that makes some plants stronger than others.
When making phylogenetic networks, why is the microbiome more difficult to study than specific species?
When we’re reconstructing the tree of, say, fish, you grab one fish and sequence it. You know that this sample is coming from this fish. But when you’re studying the microbiome, you cannot just grab one microbe. You get a sample of the whole soil, and then you sequence everything in there. You don’t know which microbe the sequences are coming from. You have a collection of genomes.
That means that we first need to use statistics to cluster these pieces of genetic information. Then, after we cluster them, we can identify what is specific to a given microbe. But it’s an extra step that we need to do with microbiome studies.
What are your favorite parts of your work?
I really enjoy the process of writing scientific papers, and I also like programming. Learning to program is learning to communicate with the computer, and once you’ve learned one language, it is easy to translate to other languages. I think that knowing how to program opens doors, regardless of your field. Right now, in my lab, we are producing many different programs that are meant to allow biologists to analyze their data.
But there’s another aspect that I really enjoy, and that’s mentoring students. I’m just starting to have students in my lab, and mentoring is something you have to learn. Just getting a Ph.D. doesn’t make you a good mentor. So I’ve had to learn a lot about how to mentor students and best support people from many different backgrounds. That’s something I really like about my job.
What do you enjoy most about being at CALS and UW–Madison?
I was born and raised in Mexico City. So when I first came here, I was not sure if I would like living in a smaller city. But I really loved it. I felt so instantly welcome here at the university and the city in general. And that was so valuable to me. The people are so smart, and they do such great research, and they’re so nice and friendly. Plus, I like to be outdoors. I like to run or bike, and it’s a great city for that. In the summer, I like to go to Devil’s Lake State Park and swim. That’s my happy place.