March 9th, 2022

Application of Machine Learning (ML) and Deep Learning (DL) Techniques in Accelerating Materials Discovery

In recent times, the application of ML and DL techniques in the various fields of science have enabled scientists to uncover interesting and useful insights. Specifically, in the field of materials science, scientists are constantly putting effort to design new materials for various end-use applications. There are enormous amounts of data related to different variety of materials available in the public domain. These data can be used appropriately for designing accelerated materials discovery processes. It is to be noted that in order to improve the performance of a specific material for a given application, one needs to optimize several factors or parameters that influence the performance. This process can be tedious for every new material that is being designed. Moreover, in some cases, the number of materials that can be generated is infinitely large. In such a situation, it is not possible to screen through this large number of materials and identify the best performing material for the given application. This is where the ML and/or DL techniques become handy. In here, it is attempted to discuss such an application of ML techniques to predict gas separation and gas storage performance of porous materials.

Metal-organic frameworks (MOF) are a class of porous materials that are highly promising for various industrial applications such as separation of gas mixtures, storage of natural gas and hydrogen, carbon capture, and catalysis. These materials can be tailor-made using a variety of secondary building blocks which are basically metal nodes and organic linkers. In principle, the recombination of these building blocks can give rise to an infinitely large number of MOF structures. In the literature, there are several such databases of MOF structures that were generated computationally and more can be generated. The increasing sizes of the databases make them difficult to screen employing molecular simulations due to the huge computational cost. Instead, it will be indeed helpful if predictive models can fairly accurately predict the performance of a newly designed MOF structure. A typical process for predicting new MOF structures for a specific targeted application can be seen in Figure 1.

Storing hydrogen sufficiently for running vehicles is one of the challenging problems. MOFs are potential candidates for storing hydrogen. However, to get to that sufficient storage limit, one needs to design the appropriate MOF structure. In an early investigation, Anderson and coworkers have demonstrated how to predict the hydrogen storage capacity of a MOF structure by training an artificial neural network [1]. They could fairly accurately predict the loadings in MOFs at different conditions of temperature and pressure. They trained the neural network with the simulated data of a MOF database that they generated. The descriptors that they used are the various structural properties and chemistry of the MOFs such as void fraction, framework density, largest cavity diameter, pore limiting diameter, volumetric surface area, alchemical catecholate site number density, and the epsilon for the interaction of hydrogen with the alchemical sites. These descriptors influence the hydrogen storage capacity of a MOF significantly. They explored different network architectures using a systematic grid search and selected an architecture that contains two hidden layers and twenty nodes per hidden layer for their study. This network was then trained and validated for accuracy using a 5-fold, 10-times repeated cross-validation procedure. This neural network could predict the accurate molecular simulation data with reasonable accuracy.

Another important gas storage problem that researchers have been constantly focussing on is the sufficient storage of methane for automobile fuel. In this case, also MOFs act as promising adsorbent material. Researchers have been trying to identify or design the best MOF structure that can sufficiently store methane to run a vehicle for a sufficiently large distance. This again is a challenging task as various factors influence the methane storage capacity of a MOF structure. Researchers have developed ML models to predict the performance of a newly designed MOF structure. In an investigation, Fernandez and coworkers have employed ML tools to predict the performance of a MOF material given its appropriate descriptors which are basically the various structural and chemical properties of the MOF structures [2]. They have considered a hypothetical MOF database containing ~130000 structures. In this work, they developed several models: multilinear regression (MLR) models, decision trees (DTs), and nonlinear support vector machines (SVMs). In order to develop these models, the authors have used ~10000 MOFs for training the models and the rest for testing the model and evaluating the accuracy of the models. The MLR models were constructed using six descriptors: volumetric surface area, void fraction, dominant pore diameter, gravimetric surface area, the maximum pore size, and the density of the structure. However, it was found that the methane storage at 35 bar and 100 bar can be well predicted by only the three most influential predictors – void fraction (VF), dominant pore diameter (DP), and gravimetric surface area (Sg). The model that they constructed to predict the methane storage at 35 bar is given by

U35=391.6180VF–9.3361DP–0.0161Sg+1.4954 ——————————— (1)

where *U**35 *is the uptake loading at pressure 35 bar.

The mean square error of this model was R2 = 0.795. Similarly, the model to predict the methane storage at 100 bar is given by

U100 = 390.9582VF – 6.1908DP – 0.0044Sg – 3.2607 ————————————- (2)

where U100 is the uptake loading at pressure 100 bar.

The mean square error of this model is R2 = 0.917. These MLR models however could not provide many insights into the rational design of new MOFs. Therefore, Fernandez et al. constructed a decision tree regression model from simple binary rules that can be usually followed in a MOF design process to achieve the desired property target. From the binary graphs of optimum DT regressions at 35 bar and 100 bar, the authors have derived two general rules for high methane storage capacity in MOFs. According to their DT model, the thumb rule to have high methane storage at 35 bar is to have densities greater than 0.43 g cm-3 and void fraction greater than 0.52; and the thumb rule to have high methane storage at 100 bar is to have densities greater than 0.33 g cm-3 and void fraction greater than 0.62. Although the MLR and DT models are reasonably accurate in prediction, the nonlinear multivariate regression is a more powerful predictive model. The authors have trained the nonlinear SVMs with descriptors as dominant pore diameter, void fraction, and gravimetric surface area. These models could predict the methane storage at various pressures – 1 bar, 35 bar, and 100 bar with R2 values as 0.721, 0.851, and 0.941, respectively.

Hexane isomer separation has been a challenging problem for the petrochemical industries for a very long time. This separation is crucial to enhance the quality of the automobile fuel by increasing the octane number with the doubly branched isomers. ML can play a crucial role to identify the best membrane that can separate the hexane isomers into single components achieving a complete separation. An ML-based screening process for the separation of hexane isomers can be schematically seen in Figure 2. There are many such instances where ML and DL models can accelerate the discovery process of new materials.

**References:**

[1] Anderson G., Schweitzer B., Anderson R., Gómez-Gualdrón D. A., *J. Phys. Chem. C*, 2019, **123**, 120 – 130

[2] Fernandez M., Woo T. K., Wilmer C. E., Snurr R. Q., *J. Phys. Chem. C*, 2013, **117**, 7681 – 7689