Âé¶¹ÒùÔº

April 16, 2025

AI uses extrapolative learning to master materials prediction beyond existing data

Learning the learning method for extrapolative prediction using the E2T algorithm. Credit: The Institute of Statistical Mathematics
× close
Learning the learning method for extrapolative prediction using the E2T algorithm. Credit: The Institute of Statistical Mathematics

A research group has developed an innovative machine learning technology that enables predictions beyond the distribution of training data and demonstrated its effectiveness in materials research. The team includes Kohei Noda, a researcher at JSR Corporation, and Professor Ryo Yoshida at the Institute of Statistical Mathematics.

The ultimate goal of materials science is to discover new materials in unexplored domains where no data exists. However, predictions made by are generally interpolative, with their applicability typically limited to regions close to the distribution of existing data. Additionally, in materials research, the high cost of data acquisition makes it difficult to obtain sufficient training data, necessitating exploration beyond the range of available data.

To address this challenge, the research group developed a machine learning algorithm called E2T (extrapolative episodic training). In E2T, a model known as a meta-learner is trained using a large number of artificially generated extrapolative tasks derived from the available dataset. As a result, the model autonomously learns a learning method to perform extrapolative predictions.

In this study, E2T was applied to material property prediction tasks, demonstrating high predictive accuracy even for materials with elemental and structural features not present in the training data. Furthermore, it was revealed that models trained on a large number of extrapolative tasks could rapidly acquire predictive capabilities in unknown domains with only a small amount of additional data.

These research findings were in Communications Materials on February, 22 2025.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

Research outcomes

In recent years, the application of machine learning has led to remarkable progress on the discovery and development of new materials. At the core of this progress lies property prediction technology driven by machine learning. By leveraging , we can explore millions or even billions of candidate materials to identify those with desired properties from vast search spaces.

However, many studies face the challenge of limited data availability, which restricts the range of applications for machine learning. Furthermore, the ultimate goal of materials science is to uncover unknown materials with groundbreaking properties.

Despite this, machine learning's predictive capabilities are generally confined to regions near the training data, making it difficult to explore uncharted territories. For instance, even generative AI, such as large language models that have revolutionized AI in recent years, are inherently interpolative—they replicate tasks that humans have encountered before. Developing AI technologies capable of predicting beyond existing data represents a grand challenge not only for materials science but also for advancing next-generation AI.

Band gap prediction of organic-inorganic hybrid perovskites using E2T. Credit: The Institute of Statistical Mathematics
× close
Band gap prediction of organic-inorganic hybrid perovskites using E2T. Credit: The Institute of Statistical Mathematics

In the field of machine learning, various methodologies have been explored to achieve extrapolative predictions, including:

This study introduces a novel meta-learning approach that enables models to directly acquire broadly applicable learning methods for extrapolative predictions.

In this study, a neural network equipped with an attention mechanism was employed to train a model capable of learning the methods required for achieving extrapolative predictions. Specifically, a training dataset and an input-output pair ( , ), extrapolatively related to , were sampled from a given dataset. Here, represents a material, and represents its properties. These three components together form an "episode," which can be generated arbitrarily.

Using a large number of artificially generated episodes, a meta-learner = ( , ) was trained to predict from. The trained model learns what function is required to predict ( , ) in an extrapolative relationship with any training dataset. The research group named this novel learning algorithm E2T (extrapolative episodic training).

The research group applied E2T to over 40 property prediction tasks involving polymeric and inorganic materials to evaluate its performance. The results showed that, in almost all cases, models trained with E2T outperformed conventional machine learning models in terms of extrapolative accuracy. Additionally, in predictive performance near the training data, E2T demonstrated accuracy equivalent to or greater than that of traditional machine learning.

However, the extrapolative performance of E2T did not reach that of an ideal model (called oracle) trained on the entire dataset including the extrapolative region. In other words, while E2T consistently improved prediction accuracy in extrapolative regions, it fell short of achieving "ultimate extrapolative capability."

A particularly noteworthy finding was that models trained on a large number of extrapolative tasks demonstrated the ability to quickly adapt to new extrapolative tasks through fine-tuning with a limited amount of data. Remarkably, these models achieved comparable performance to an oracle model trained on extrapolative regions, despite requiring significantly less data.

In humans, rapid adaptability in humans is believed to result not only from innate traits but also from extensive training and experience. This study revealed that a similar phenomenon may occur in the learning processes of AI, where adaptability is enhanced through systematic exposure to diverse tasks.

Future outlook

The ultimate goal of lies in exploring uncharted material spaces where no data currently exists. For instance, researchers aim to investigate the properties of materials formed by combinations of elements or raw materials that have never been tested before or when sample fabrication protocols are significantly altered.

This study began with a fundamental question: Can models trained to achieve extrapolation with existing datasets acquire extrapolative capabilities and adaptability to unknown environments? The researchers presented a remarkably simple solution to this question. While the current evidence is limited to specific cases, if the learning capability of E2T proves to be universal, its impact could extend beyond materials science, influencing a wide range of fields within AI for Science.

One particularly exciting prospect is the application of E2T to the development of foundation models. Foundation models are trained on large-scale, versatile datasets and are expected to exhibit the ability to adapt to a wide variety of downstream tasks.

By fine-tuning these models for specific downstream tasks, it is possible to reduce the amount of data required while achieving high predictive accuracy. The extrapolative performance and domain adaptability of E2T have the potential to drive groundbreaking innovations in the development of foundation models, significantly advancing the broader scientific landscape.

More information: Kohei Noda et al, Advancing extrapolative predictions of material properties through learning to learn using extrapolative episodic training, Communications Materials (2025).

Source code for E2T:

Journal information: Communications Materials

Provided by Research Organization of Information and Systems

Load comments (0)

This article has been reviewed according to Science X's and . have highlighted the following attributes while ensuring the content's credibility:

fact-checked
peer-reviewed publication
proofread

Get Instant Summarized Text (GIST)

A new machine learning algorithm, E2T (extrapolative episodic training), has been developed to predict material properties beyond the distribution of existing data. E2T uses a meta-learning approach, training on artificially generated extrapolative tasks, allowing it to achieve high predictive accuracy even for materials with features not present in the training data. While E2T improves extrapolative predictions, it does not reach the accuracy of an ideal model trained on the entire dataset. However, it shows rapid adaptability to new tasks with minimal data, suggesting potential applications in developing foundation models across various scientific fields.

This summary was automatically generated using LLM.