A DECADE AGO, CALS GENETICS researchers Nicole Perna and Jeremy Glasner were part of the team that sequenced the genome of the bacterium E. coli, at the time one of the most complex organisms to be completely coded. The breakthrough uncovered what was then considered a treasure trove of new genetic information—some 4,200 genes, 40 percent of which had no known function.
But the coding didn’t stop then. Counting unique genes across all of the strains of E. coli that have been sequenced, “we’re up to about 25,000 genes now,” says Perna, an associate professor of genetics. “So we’re starting to approach the number of genes in a eukaryotic species.”
New genetic information is coming faster than ever, thanks to meteoric advances in sequencing technology. In the infancy of genomics, scientists used hands-on lab work to read DNA by eye, viewing a few hundred base pairs at a time. Completing the sequence of even simple forms of life required years of effort. Now sequences are read by machines that are able to spit out increasingly huge chunks of data. The newest generation of sequencing equipment in UW-Madison’s Biotechnology Center can generate the entire genome of a bacterium—roughly four to 12 million base pairs—in one read.
Speed has opened a new frontier in genomics: the freedom to hunt. In the old days, the cost and labor of sequencing meant that only obviously important forms of life were coded. Now scientists can scour genomes of multiple organisms in hopes of finding surprises. Perna and Glasner are now making comparisons across several strains of E. coli to seek out genes most responsible for key differences in ability—why some strains of E. coli are pathogenic to humans, for example, while others are harmless.
“We’re finding out that we actually had very little information before this,” laughs Josh Hyman, who directs the DNA sequencing facility in the Biotech Center. “I mean, a tiny amount of information.” By comparison, today’s rush of data can seem like “trying to drink out of a fire hose.”