PREDICTION OF FORMATION ENERGY USING TWO-STAGE MACHINE LEARNING BASED ON CLUSTERING

  • Xingyue Fan School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Keywords: ABO3-type perovskites, formation energy, hierarchical clustering, regression model

Abstract

The formation energy (Hf) is one of the important properties associated with the thermodynamic stability of ABO3-type perovskite. In this work, two-stage machine learning based on hierarchical clustering and regression was designed for improving the prediction values of the density-functional theory (DFT) Hf of ABO3-type perovskites. A global dataset was clustered into Cluster 1 and Cluster 2 using the CHI (the Calinski-Harabasz index). To compare the prediction performances of Hf, DTR (decision tree regression), GBRT (gradient boosted regression trees), RFR (random forest regression) and ETR (extra tree regression) were applied to build models of Cluster 1, Cluster 2 and the global dataset, respectively. The results showed that all four different regression models of Cluster 1 had a higher R2, and lower MSE and MAE than those of the global dataset, while the models of Cluster 2 were poorer. Meanwhile, the GBRT model of Cluster 1 achieved a higher R2 of 0.917, and lower MSE and MAE of 0.033 eV/atom and 0.125 eV/atom. We further validated and compared the generalization ability of the models by predicting the Hf of ABO3-type perovskite previously unseen in the training set. The two-stage machine-learning models proposed here can provide useful guidance for accelerating the exploration of materials with desired properties.

References

1 W. Li, R. Jacobs, D. Morgan, Predicting the thermodynamic stability of perovskite oxides using machine learning models, Computational Materials Science, 150 (2018) 454–463, doi:10.1016/j.commatsci. 2018.0-4.033
2 W. K. Ye, C. Chen, Z. B. Wang, I. H. Chu, S. P. Ong, Deep neural networks for accurate predictions of crystal stability, Nature Communications, 9 (2018) 1, doi:10.1038/s41467-018-06322-x
3 A. A. Emery, C. Wolverton, High-throughput DFT calculations of formation energy, stability and oxygen vacancy formation energy of ABO3 perovskites, Scientific Data, 4 (2017) 170153, doi:10.1038/ sdata.20-17.153
4 S. Kirklin, J. E. Saal, B. Meredig, A. Thompson, J. W. Doak, M. Aykol, S. Ruhl, C. Wolverton, The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies, Computational Materials, 1 (2015) 1, doi:10.1038/npjcompumats. 2015.10
5 J. Im, S. Lee, T. W. Ko, H. W. Kim, Y. Hyon, H. Chang, Identifying Pb-free perovskites for solar cells by machine learning, Computational Materials, 5 (2019) 1, doi:10.1038/s41524-019-0177-0
6 S. H. Lu, Q. H. Zhou, Y. X. Ouyang, Y. Guo, Q. Li, J. L. Wang, Accelerated discovery of stable lead-free hybrid organic-inorganic perovskites via machine learning, Nature Communications, 9 (2018) 1, doi:10.1038/s41467-018-05761-w
7 P. V. Balachandran, B. Kowalski, A. Sehirlioglu, T. Lookman, Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning, Nature Communications, 9 (2018) 1, doi:10.1038/s41467-018-03821-9
8 R. Yuan, Z. Liu, P. V. Balachandran, D. Q. Xue, Y. M. Zhou, X. D. Ding, J. Sun, D. Z. Xue, T. Lookman, Accelerated Discovery of Large Electrostrains in BaTiO3-Based Piezoelectrics Using Active Learning, Advanced Materials, 30 (2018) 7, 1702884, doi:10.1002/ adma.-201702884
9 L. Shi, D. P. Chang, X. B. Ji, W. C. Lu, Using Data Mining to Search for Perovskite Materials with Higher Specific Surface Area, Journal of Chemical Information and Modeling, 58 (2018) 12, doi:10.1021/acs.jcim.8b00436
10 F. Faber, A. Lindmaa, R. Armiento, Crystal structure representations for machine learning models of formation energies, International Journal of Quantum Chemistry, 115 (2015) 16, 1094–1101, doi:10.1002/qua.24917
11 H. Yuan, Y. Y. Wang, Y. Y. Cheng, Local and Global Quantitative Structure – Activity Relationship Modeling and Prediction for the Baseline Toxicity, Journal of Chemical Information and Modeling, 47 (2017) 1, 159–169, doi:10.1021/ci600299j
12 E. Stevens, D. R. Dixon, M. N. Novack, D. Granpeesheh, T. Smith, E. Linstead, Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning, International Journal of Medical Informatics, 29 (2019) 29–36, doi:10.1016/j.ijmedinf.2019.05.006
13 Y. Liu, J. M. Wu, Z. C. Wang, X. G. Lu, M. Avdeev, S. Q. Shi, C. Y. Wang, T. Yu, Predicting creep rupture life of Ni-based single crystal superalloys using divide-and-conquer approach based machine learning, Acta Materialia, doi:10.1016/j.actamat.2020.05.001
14 P. V. Balachandran, A. A. Emery, J. E. Gubernatis, T. Lookman, C. Wolverton, A. Zunger, Predictions of new ABO3 perovskite compounds by combining machine learning and density functional theory, 2 (2018) 4, doi:10.1103/PhysRevMaterials.-2.043802
15 K. W. Johnson, P. M. Langdon, M. F. Ashby, Grouping materials and processes for the designer: an application of cluster analysis, Materials and Design, 23 (2002) 1, 1–10, doi:10.1016/s0261-¬3069(01) 000358
16 H. Abdi, L. J. Williams, Principal component analysis, Wiley Interdisciplinary Reviews: Computational Statistics, 2 (2010) 4, 433–459, doi:10.1002/wics.101
17 J. H. Ward, Hierarchical Grouping to Optimize an Objective Function Journal of the American Statistical Association, 58 (1963) 301, 236, doi:10.2307/2282967
18 T. Calinski, J. Harabasz, A dendrite method for cluster analysis, Communications in Statistics, 3 (1974) 1, 1–27, doi:10.1080/ 03610927408827101
19 J. R. Quinlan, Induction on decision tree, Machine Learning, 1 (1986) 1, 81–106, doi:10.1007/BF00116251
20 J. H. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29 (2001) 5, 1189–1232, doi:10.2307/2699986
21 L. Breiman, Random Forests, Machine Learning, 45 (2001) 1, 5–32, doi:10.1023/A:101093-3404324
22 P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees, Machine Learning, 63 (2006) 1, 3–42, doi:10.1007/s10994-006-6226-1
23 J. Zhao, R. Plagge, N. M. Ramos, M. L. Simoes, J. Grunewald, Application of clustering technique for definition of generic objects in a material, Journal of Building Physics, 39 (2015) 2, 124–146, doi:10.1177/1744259115588013
24 G. M. Downs, J. M. Barnard, Clustering Methods and Their Uses in Computational Chemistry, John Wiley & Sons, Inc., New Jersey 2002, 1–40
25 G. Piir, S. Sild, U. Maran, Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of bioconcentration factor, SAR and QSAR in Environmental Research, 24 (2013) 3, 175–199, doi:10.1080/1062936x.2012. 762426
Published
2021-04-15
How to Cite
1.
Fan X. PREDICTION OF FORMATION ENERGY USING TWO-STAGE MACHINE LEARNING BASED ON CLUSTERING. MatTech [Internet]. 2021Apr.15 [cited 2024Nov.9];55(2):263-8. Available from: https://mater-tehnol.si/index.php/MatTech/article/view/127