Predicting protein structure just got a little 'easier'

Predicting protein structure just got a little 'easier'

Image from the website of the ESM Fold model by Meta. / Meta

Science | Artificial intelligence

Meta's enhanced language model, called ESMFold, can make predictions 60 times faster than other algorithms

Elena Martin Lopez

The prediction of the structure of proteins is an extremely complex task, but very important at a scientific level. This knowledge allows, among other things, to understand the function and role of these molecules in biological processes; study the evolution of organisms or develop more effective drugs. This Thursday, the company Meta (formerly Facebook), has published a
study in the journal 'Science' about new software, called EMSFold, capable of predicting the structure of proteins 60 times faster than other similar algorithms, such as AlphaFold, developed by Google's DeepMind company and the European Institute of Bioinformatics, but maintaining the same resolution and precision.

It is estimated that the human body contains around 20,000 different proteins. Proteins are made up of long chains of amino acids (organic molecules) that interact with each other to form a specific three-dimensional structure. There are many possible three-dimensional structures that a protein can adopt depending on the interaction between amino acids, and even small variations in the sequence of these molecules can generate large differences in the final structure of the protein. In addition, the cellular environment influences this process. All of this makes it very difficult to accurately predict what the final shape of a protein will be.

The new model—this is the third version that Meta presents—includes predictions of some 617 million protein structures. Of these, more than 225 million are predictions with a high degree of reliability. “The quality of the results is convincing enough. The first difference with the previous proposals, based on deep neural networks (AlphaFold and RoseTTAFold), is that the new models are much easier to calculate and much faster (between one and two orders of magnitude)”, has expressed Alfonso Valencia, professor at the Catalan Institution for Research and Advanced Studies (ICREA) and director of Life Sciences at the Barcelona National Supercomputing Center (BSC), in statements collected by the Science Media Center (
SMC).

competition between companies

EMSFold's predictions include structures (more than 10%) of some of the least understood proteins on Earth. “This makes the new methodology directly applicable to the prediction of the consequences of point mutations, something that was outside the scope of previous methods and has a direct impact on applications in biomedicine,” says Valencia. Likewise, the authors have used EMSFold to predict the structure of unnatural proteins, those modified in the laboratory to have properties that are not found in proteins produced naturally in living organisms, which has very interesting applications for biotechnology and biomedicine.

To determine these structures,
EMSFold It has been based on language models, that is, statistical methods that are used to analyze large sets of natural language data and predict the probability of a sequence of words. "The principle is the same as the already popular ChatGPT, in this case, applied to chains of amino acids (a 20-letter code) that make up proteins, instead of the characters of a human language," explains Valencia.

The professor adds: “It is very surprising that big technology companies invest all these efforts in a subject that was considered minority and theoretical. It is easy to think that this is a competition between Meta and Google/DeepMind. In this sense, it is interesting that both companies have developed software and that the results are openly available, something not so common in these companies”. Another possible reason is that protein structure prediction is the most useful reference for refining the predictions of text-based language models.