以材料基因工程为代表的材料信息学将人工智能引入材料研究,有望变革性地加速材料发掘的进程,是当前研究的热点。然而,目前数据库的缺乏是材料信息学发展最大的瓶颈。如何快速获得海量、标准化、面向材料不同性质的材料数据是一个重要挑战。
来自英国剑桥大学的Cole教授等将自然语言处理和机器学习相结合,提出了一种有效方法,可以从海量科研文献中获取庞大的数据,并将其集成到机器学习工具中,从而实现新材料的性能预测。该方法首先利用先进的自然语言处理工具包从科研文献中获取材料的化学名称和相变温度等信息;然后自动将这些数据标准化,形成数据库;最后通过机器学习方法基于构建数据库预测新材料的性质。令人印象深刻的是,他们利用自然语言处理方法从数量庞大(74000篇)的科研文献中自动获取并生成了多达20400条关于磁性和超导材料的有效数据库条目。基于这些数据库,他们以氧化物钙钛矿和磷化物材料为例,不仅重构了相关材料的铁磁、反铁磁和超导的相图,而且根据已有数据预测了新材料的超导转变温度。这些结果与直接测量得到的相图很好的吻合,证明了该方法的有效性。作者已将获得的磁性材料数据库免费在线公布于http://magneticmaterials.org。本文提出的方法框架可以从海量的科研文献中直接获取数据库并应用于机器学习中,可谓材料基因工程方法上的一次重要变革,有望推广到其他材料性能数据库构建和性能预测中。
该文近期发表于npj Computational Materials 6: 18 (2020),英文标题与摘要如下,点击https://www.nature.com/articles/s41524-020-0287-8可以自由获取论文PDF。
Magnetic and superconducting phase diagrams and transition temperatures predicted using text mining and machine learning
Callum J. Courtand&Jacqueline M. Cole
Predicting the properties of materials prior to their synthesis is of great importance in materials science. Magnetic and superconducting materials exhibit a number of unique properties that make them useful in a wide variety of applications, including solid oxide fuel cells, solid-state refrigerants, photon detectors and metrology devices. In all these applications, phase transitions play an important role in determining the feasibility of the materials in question. Here, we present a pipeline for fully integrating data extracted from the scientific literature into machine-learning tools for property prediction and materials discovery. Using advanced natural language processing (NLP) and machine-learning techniques, we successfully reconstruct the phase diagrams of well-known magnetic and superconducting compounds, and demonstrate that it is possible to predict the phase-transition temperatures of compounds not present in the database. We provide the tool as an online open-source platform, forming the basis for further research into magnetic and superconducting materials discovery for potential device applications.
评论留言