Trihalomethane prediction model for water supply system based on machine learning and Log-linear regression

This study explored the modeling methods of the random forest regression (RFR) model, support vector regression (SVR) model, and Log-linear regression model to predict the concentration of total-trihalomethanes (T-THMs), bromodichloromethane (BDCM), and dibromochloromethane (DBCM), using nine water quality parameters as input variables. The models were developed and tested using a dataset of 175 samples collected from a water treatment plant. The results showed that the RFR model, with the optimal parameter combination, outperformed the Log-linear regression model in predicting the concentration of T-THMs (N25 = 82–88%,rp = 0.70–0.80), while the SVR model performed slightly better than the RFR model in predicting the concentration of BDCM (N25 = 85–98%,rp = 0.70–0.97). The RFR model exhibited superior performance compared to the other two models in predicting the concentration of T-THMs and DBCM. The study concludes that the RFR model is superior overall to the SVR model and Log-linear regression models and could be used to monitor THMs concen tration in water supply systems.
Source: Environmental Geochemistry and Health - Category: Environmental Health Source Type: research