

The benefit of NovaFold is that it uses a hybrid approach that uses protein threading to select templates. And as you get closer to the 30% sequence identity level, or the “twilight-zone,” it becomes exceedingly difficult, because any two random pairs of proteins can have this level of sequence identity. As you get closer to 50% sequence identity, it becomes difficult to select templates. But relying on sequence similarity alone has its weaknesses. These methods work well for proteins with at least 70% sequence identity. This works by using sequence alignment to identify proteins that have a high degree of sequence similarity in the Protein Data Bank. Many tools for protein structure prediction rely on homology modeling. Since the number of new sequences continues to grow exponentially faster, that gap is here to stay and will only become wider over time.įinding alternative ways to predict a protein structure becomes more and more important as this gap increases. When this article was first published 5 years ago, that difference was only 400 times more.


Given the difficulty of solving an experimental structure, and considering the rate at which new protein sequences are discovered, it has become clear that with today’s technology, we will not solve structures for all the new proteins being identified and sequenced.Ĭomparing the number of protein sequences in UniProt to the number of known structures in the PDB (Figure 1), we see over 1700 times more sequences than structures. The cost of solving a new, unique structure is on the order of $100,000. Solving structures using crystallography and NMR requires extremely specialized training, a high degree of skill, and a lot of luck. From an experimental point of view, the largest challenges are cost, time and expertise.
