Proteins Assemble Like Lego Bricks!

The importance of proteins in diet is gaining widespread acknowledgement everywhere. Aptly called the ‘building blocks’ of life, they participate in nearly every metabolic function of the body, such as hormone production, immune response, and nutrient transport. They are also responsible for building skin, blood, and muscles.

Proteins are made up of individual components called peptides, which when further broken down yield what are known as amino acids, which are the basic units of proteins.

Peptide-based materials find use in regenerative medicine, drug delivery, adhesives, and electronic materials. An important aspect of these materials is their inherent ability to self-assemble into well-defined nanostructures. This ability helps alter the functionality of the peptide materials making them useful for medicine and further study. Therefore detailed study on peptides would be fruitful.

Peptides can be organized at four structural levels – primary, secondary, tertiary, and quaternary structures, based on the complexity of their molecular arrangement.

Of these, the secondary structure of peptides pose a major challenge because of complex molecular interactions, sequence-specific behaviour, and environmental factors.

Secondary structures are usually found in two forms – α helices (alpha helices), and β sheets (beta sheets). Although computational methods have been used more successfully to predict α helices, β sheets pose a problem.

This is because of the inability of the models to fully capture the complexity of the self-assembly process as well as the influence of factors such as pH, temperature, and solvation. This is further complicated by the small size of the available experimental dataset, which prevents effective training as well as prediction of diverse, noncrystalline structures that peptides often form.

Also, the use of traditional design strategies to determine and evaluate secondary structures can be biased, thus not giving rise to the discovery of interesting, diverse, and unconventional peptides with desired nanostructure assembly.

Further, computational methods alone cannot help in the prediction of nonintuitive sequences that give rise to β sheet structures due to the vast chemical space and the underlying limitations of the theoretical models used.

This gap highlights the necessity of combining the experimental data with artificial intelligence (AI) workflows that have the ability to directly learn from the scarce datasets.

In this study, the authors have used an integrated high-throughput experimental workflow and an artificial intelligence-driven active learning framework to improve the prediction accuracy of self-assembly using β sheet formation in pentapeptides (a chain of 5 peptides) as a case study.

268 pentapeptides were synthesised and tested, out of which 96 β sheet assemblies were identified, including unconventional sequences not predicted by traditional methods.

The machine learning (ML) models used here outperformed conventional β sheet propensity tables, revealing useful chemical design rules. A web interface was provided to facilitate community access to these models.

The following are the authors of this paper:

Mr. Y. Nissi Talluri from the Department of Metallurgical and Materials Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India.
Dr. Subramanian KRS Sankaranarayanan from the Center for Nanoscale Materials, Argonne National Laboratory, Lemont, USA. Dr. Subramanian is also affiliated with the Department of Mechanical and Industrial Engineering, University of Illinois, Chicago, USA.
Dr. H. Christopher Fry from the Center for Nanoscale Materials, Argonne National Laboratory, Lemont, USA.
Dr. Rohit Batra from the Department of Metallurgical and Materials Engineering, Indian Institute of Technology (IIT) Madras, Chennai, India. Dr. Rohit is also affiliated with the Center for Nanoscale Materials, Argonne National Laboratory, Lemont, USA, and the Center for Atomistic Modelling and Materials Design, IIT Madras.

Prof. Arun Kumar Mannodi Kanakkithodi, who is an Assistant Professor of Materials Engineering at Purdue University, Indiana, USA, explained the work done by the authors and lauded their efforts and spirit of open-science with the following comments: “In this work, the authors developed a framework that combines existing experimental data, machine learning (ML)-based predictive models, and active learning-driven new data generation, to drive the discovery of novel pentapeptide materials that self-assemble into β-sheet networks. Data-driven strategies are essential to navigate the vast and complex chemical space of peptides. Molecular simulations are often inadequate in predicting “non-intuitive sequences” that lead to β-sheet formation, thus making it vital to learn from experimental characterization data. ML models were trained on compiled experiments to directly predict the ratio of infrared absorbance at two different wavenumbers, which proves to be an effective surrogate for the degree of formation of β-sheets. The choice of material descriptors is crucial for ML model accuracy and generalizability: here, the authors combined high-level fingerprints that include seven properties derived for any pentapeptide sequence and cheminformatics-based fingerprints that include information about atom types, bonds, and functional groups, among other things.

Active learning (AL), which involves intelligently generating new data based on iterative ML model improvement and identification of the most promising materials, played a major role in this work. The authors used a unique strategy of focusing the AL loops on regions of the peptide chemical space where predicted β-sheet formation is very different from any known values, thus ultimately leading to the identification of completely novel and unique peptides. Multiple AL loops were used to expand the experimental dataset and retrain ML models at every step, and finally, nearly 100 novel peptide chemistries were discovered that form β-sheet assemblies. This work is a wonderful example of how AI/ML combined with systematic experiments can dramatically reduce the materials discovery time without compromising on prediction accuracy. In the spirit of open-science, the authors have released their final ML models as a web-interface available to the community.”

Prof. Tell Tuttle, who is Head of Department, Pure and Applied Chemistry, at the University of Strathclyde, Glasgow, United Kingdom, also acknowledged the importance of the authors’ work with the following comments: “This paper presents a compelling advancement in peptide material discovery by creatively combining machine learning with iterative experimental validation. Its central achievement lies in uncovering non-intuitive pentapeptide sequences that defy traditional design heuristics, highlighting the untapped potential in unconventional chemical spaces. The active learning framework not only strengthens the predictive capability of the machine learning models but also provides insights into previously overlooked physicochemical properties driving self-assembly. While there remains scope for further model refinement, particularly in handling valine-rich sequences, the contribution represents a meaningful leap forward in guiding rational peptide design. This approach will undoubtedly inspire continued innovation at the intersection of computational chemistry and experimental material science.”

Article by Akshay Anantharaman
Click here for the original link to the paper