Comparative Analysis of Machine Learning Models and Explainable Artificial Intelligence for Predicting Wastewater Treatment Plant Variables
-
Department of Civil and Environmental Engineering, University of Wisconsin-Milwaukee, WI 53211, USA
† These authors contributed equally to this work.
Academic Editor: José L. Segovia-Juárez
Special Issue: Artificial Intelligence in Environmental Research
Received: July 30, 2024 | Accepted: October 13, 2024 | Published: October 17, 2024
Adv Environ Eng Res 2024, Volume 5, Issue 4, doi:10.21926/aeer.2404020
Recommended citation: Nasir FB, Li J. Comparative Analysis of Machine Learning Models and Explainable Artificial Intelligence for Predicting Wastewater Treatment Plant Variables. Adv Environ Eng Res 2024; 5(4): 020; doi:10.21926/aeer.2404020.
© 2024 by the authors. This is an open access article distributed under the conditions of the Creative Commons by Attribution License, which permits unrestricted use, distribution, and reproduction in any medium or format, provided the original work is correctly cited.
Abstract
Increasing urban wastewater and rigorous discharge regulations pose significant challenges for wastewater treatment plants (WWTP) to meet regulatory compliance while minimizing operational costs. This study explores the application of several machine learning (ML) models specifically, Artificial Neural Networks (ANN), Gradient Boosting Machines (GBM), Random Forests (RF), eXtreme Gradient Boosting (XGBoost), and hybrid RF-GBM models in predicting important WWTP variables such as Biochemical Oxygen Demand (BOD), Total Suspended Solids (TSS), Ammonia (NH₃), and Phosphorus (P). Several feature selection (FS) methods were employed to identify the most influential WWTP variables. To enhance ML models’ interpretability and to understand the impact of variables on prediction, two widely used explainable artificial intelligence (XAI) methods-Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) were investigated in the study. Results derived from FS and XAI methods were compared to explore their reliability. The ML model performance results revealed that ANN, GBM, XGBoost, and RF-GBM have great potential for variable prediction with low error rates and strong correlation coefficients such as R2 value of 1 on the training set and 0.98 on the test set. The study also revealed that XAI methods identify common influential variables in each model’s prediction. This is a novel attempt to get an overview of both LIME and SHAP explanations on ML models for a WWTP variable prediction.
Keywords
Machine learning; wastewater; explainable artificial intelligence; local interpretable model-agnostic explanations; Shapley additive explanations
1. Introduction
Wastewater treatment plants (WWTPs) play an essential role in safeguarding the aquatic environment by processing municipal and industrial sewage. Increasing amount of urban wastewater and demands for clean water present substantial challenges to WWTP operators in meeting regulatory effluent standards and reducing operating costs [1,2,3,4]. Moreover, the complexity of the treatment process demands a high level of precision to achieve the desired standard limits of various variables. To enhance effluent quality and comply with regulatory standards at WWTP while minimizing operation and maintenance cost, the implementation of advanced technologies is crucial. There is a potential for WWTPs to improve decision-making process and to optimize resource allocation by utilizing machine learning (ML), a subfield of artificial intelligence (AI) that can ultimately assist in achieving sustainable treatment system. The application of ML in predicting WWTP variables has been effective [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]. ML models were also used to regulate WWTP operation that resulted in notable amount of energy savings [19]. According to studies [20,21], ML can process substantial datasets with impressive precision.
As WWTPs are complex and comprise several concurrent nonlinear mechanisms, researchers investigated a wide range of variables, such as water quality, water quantity, and meteorological data, in predicting WWTP variables using various ML models [14]. Biochemical Oxygen Demand (BOD) and Total Suspended Solids (TSS) are among the most influential variables in a WWTP. They were commonly investigated together because they share many similarities, including their hardness to measure, lack of information that may be obtained, the potential for complex model nonlinearity, and importance in prediction models [10]. Other common pollutants in wastewater are ammonia (NH3) and phosphorus (P), both need to be reduced to the required level before being released into the environment [22]. A thorough understanding of influent and effluent nutrient characteristics is essential for the optimization of treatment operations [23,24]. Therefore, accurate influent and effluent variable (BOD, NH3, P, and TSS) prediction through ML can facilitate efficient adjustment of operational parameters such as aeration rates or chemical dosages to effectively meet effluent quality standards.
ML-based approaches are specifically being employed for the monitoring and design of complex non-linear issues at WWTPs [25]. Traditional methods of variable measurements in WWTP involve time-consuming laboratory analysis. Advancements in sensor technologies and online monitoring systems have introduced real-time and alternative approaches. The difficulty of measuring BOD online and the length of time required for laboratory measurements highlight the significance of developing predictive models that can eliminate the requirement for measurements performed by humans. ML methods can rely on the connection created between the input and output datasets by extracting correlations between variables from historical data. Previous studies on various ML models to predict WWTP variables have a large variability in results, with R2 for BOD ranging from 0.48 to 0.99, TSS ranging from 0.63 to 0.98, (NH₃) ranging from 0.32-0.84, and P ranging from 0.28-0.93 [2,6,13,16,26,27,28,29,30,31,32,33,34]. Moreover, relying solely on ML models without an understanding of the contexts of the predictions is not ideal. Recent trend towards the practice of ML models in variable prediction requires explainability in addition to prediction accuracy. This is especially important in WWTPs where operators need to understand the reasons behind model predictions to increase their confidence in real-world application. Questions on rationale behind ML predictions, the basis for trust in these predictions, and methods for error correction are some of the concerns especially relevant in WWTP, where the reliability of ML practices is critical. While many studies have focused on predicting variables in WWTP using ML, research on implementing explainable artificial intelligence (XAI) is still developing. Some recent studies integrated XAI to interpret ML output [14,35,36]. However, investigation of XAI methods with various ML models is lacking. Therefore, it is a novel attempt to investigate multiple XAI approaches to enhance the interpretability of ML models applications in WWTP.
This study applied XAI methods to improve the interpretability of ML models in predicting influential variables of a WWTP. Various feature selections and XAI methods were employed to identify the importance of input variables in ML models performance. We collected a broad range of WWTP variables, encompassing water quality, water quantity, and electrical data. Several standalone ML models i.e. artificial neural network (ANN), gradient boosting machine (GBM), random forest (RF), eXtreme gradient boosting (XGBoost), and hybrid model RF-GBM performance were tested and compared with historical datasets in predicting influent and effluent BOD, NH3, P, and TSS. This study provides a better understanding of ML model performance in predicting WWTP variables with the help of XAI, which aids in making informed decisions to optimize treatment plant performance.
2. Materials and Methods
2.1 Data Collection
The data were collected from a WWTP in Milwaukee, Wisconsin, USA that treats wastewater from industrial, municipal, and domestic sources. Water quality, water quantity, and electrical data (daily and hourly) were collected from 1st January 2019 to 31st December 2023. After data processing, the following variables were considered in the study: Influent BOD (BODᵢ), Effluent BOD (BODₑ), Influent Flow (Flowᵢ), Effluent Flow (Flowₑ), Influent Ammonia (NH₃)ᵢ, Effluent Ammonia (NH₃)ₑ, Influent TSS (TSSᵢ), Effluent TSS (TSSₑ), Influent Phosphorus (Pᵢ), Effluent Phosphorus (Pₑ), TSSₑ Removed, BODₑ Removed, Primary Sludge, Iron Dose, Detention Time, Aeration (Aer) Basin Temp, DO Set Pt, Sludge Volume Index (SVI), Mean Cell Residence Time (MCRT), Waste Activated Sludge (WAS), WAS Flow, pHₑ, Tempₑ, Total Residual Chlorine (TRC), Gravity Belt Thickening (GBT) Polymer Used, Fecal Coliforms, E.coli, Total Electricity (Elec) Generated, and Total Blower Elec Used. Time series of variables can be found in Figure 1. A list of abbreviations is shown in Table S1 of additional materials.
Figure 1 Time series of variables (top left: Flow; top right: (NH₃); middle left: BOD; middle right: TSS; bottom left: P; bottom right: BOD and TSS removed (%)).
2.2 Data Pre-Processing
Typically, sensor-collected data contain anomalies related to the recording process. During examination of the dataset for missing or inaccurate data, several anomalies were identified through human observation and subsequently replaced with average values. Additionally, any missing values in the dataset were filled in using the average value of the respective variable. We converted hourly variables to daily variables. The variables Flowᵢ and Flowₑ exhibited a high correlation (0.9). To minimize multicollinearity, only Flowᵢ was included in the study. Consequently, 28 out of the 29 collected variables consisting of 51128 data entries were considered for the analysis. Statistical properties of data are presented in Table 1. When eliminating redundant or irrelevant features that do not significantly affect the prediction, lowers noise, and enhances model performance [28], it is crucial to consider the context in which the model is used. Variables such as DO set points controlled by blowers may have a more indirect impact on the prediction accuracy, as they influence the performance of the overall treatment process rather than directly correlating with target variables. Moreover, we did not identify or remove outliers in the dataset to understand the whole picture of the analysis as suggested by other studies [37]. Therefore, in the study, we considered the full dataset of WWTP that includes the most common input variables found in relevant papers to run the ML models [38].
Table 1 Data set statistical properties.
2.3 Feature Selection
Several feature selection (FS) methods were employed to identify the most significant variables for predicting target variables, including analysis of variance (ANOVA), least absolute shrinkage and selection operator (LASSO), mutual information (MI), random forest (RF), and Pearson correlation (PC) [14,22]. ANOVA F-values are non-negative and can theoretically range from 0 to infinity. LASSO scores can be negative or positive, whereas PC scores range from -1 to 1. The MI scores range from 0, indicating no shared information, to positive values. RF generates a feature importance score from 0 to 1, where 0 means the feature was not used in the prediction, and 1 means the feature perfectly predicted the output. Traditional FS methods were chosen to compare their derived results with XAI method outputs.
2.4 SHapley Additive exPlanations
SHapley Additive exPlanations (SHAP) analysis is a recently developed XAI method based on game theory that interprets the behavior of ML models [14,38,39,40]. It explains the models’ predictions by showcasing the relative influence of input variables on model performance [35]. Using Shapley values from game theory, each feature is attributed values, as described by [41,42,43] as follows:
Where ɸi is the SHAP value of ith input feature, n is the number of all input features, s is the subset of feature subsets, |s| is the feature subsets element number, fx(sU{i}) is trained with that feature present, and fx(s) is trained with feature withheld.
SHAP values at higher positions signify a greater importance of input variables on the models’ performance. A positive weight indicates that increasing the feature’s value typically boosts the models’ prediction, whereas a negative weight implies that increasing the feature’s value tends to reduce the model’s prediction. SHAP summary plots are being used in WWTP to interpret models’ output [14]. In this study, we chose the commonly used function, SHAP summary plot, to investigate how the top features in a dataset impact the models’ output.
2.5 Local Interpretable Model-Agnostic Explanation
Local Interpretable Model-Agnostic Explanation (LIME) is an XAI tool that interprets black-box ML models by using a local, interpretable model to clarify each prediction [36]. LIME is obtained by following equation:
\[ \xi(x)=\begin{smallmatrix}\mathrm{argmin}\\g\in G\end{smallmatrix}\mathcal{L}(f,g,\pi_x)+\Omega(g) \tag{2} \]
Where, $\mathbf{\mathcal{L}}$ indicates fidelity function, G indicates explanation families, and Ω indicates complexity measure. The explanation model for instance x is the model g, πx indicates proximity measure and f indicates original model.
LIME identifies the top features contributing most to the model's predictions, associating each feature with a weight that indicates its impact on the prediction. Features with positive weights have a positive effect on the prediction, while those with negative weights have a negative effect. The magnitude of the weight reflects the strength of the feature's influence on the prediction. Features are ranked by their importance, with the most influential ones listed first. A detailed explanation of LIME is provided by [44].
2.6 ML Models
To predict BODᵢ, BODₑ, (NH₃)ᵢ, (NH₃)ₑ, Pᵢ, Pₑ, TSSᵢ, and TSSₑ, several ML models, i.e., ANN, GBM, RF, RF-GBM, and XGBoost were applied. These models were chosen because of their widespread application in water quality variable prediction. ANN consists of layered networks of interconnected nodes, with multiple hidden layers that allow the identification of intricate relationships and patterns in the data [45,46]. However, it requires substantial data and careful hyperparameter tuning. In this study, we explored different configurations of hidden layers, activation functions, and optimization strategies to train ANN model. A boosting approach called GBM combines several weak prediction models, typically decision trees, to produce a powerful predictive model [47]. To fix the errors created by the previous trees, GBM iteratively adds new models. In this research, we used different values for learning rate, number of trees, tree depth, min sample leaf, and minimum sample split to identify the best combination that optimize model performance on the training data. An ensemble learning technique called RF uses several decision trees to produce predictions [48,49,50,51,52]. In this study, we used various values for number of trees, tree depth, min sample leaf, and minimum sample split to identify the best combination that optimizes model performance on the training data. RF-GBM combines the principles of RF and GBM. This hybrid model combines the advantages of RF and GBM to improve prediction performance. A newly developed version of the gradient boosting decision tree algorithm called XGBoost has the potential to reduce overfitting and increase robustness [38]. In XGBoost, several hyperparameters were also tuned to find that optimal configuration. In all ML models’, the GridSearchCV is employed to identify the best combination of hyperparameters by testing multiple combinations using cross-validation.
2.7 Model Training and Evaluation
The dataset was divided into training and testing sets to ensure that the models were trained on a representative subset and evaluated on unseen data, providing a reliable measure of their generalization capability [53]. Two commonly recommended splits of training and test set ratios (90:10 and 80:20) were used as suggested by other relevant studies [14,37,52]. The testing set acts as an independent dataset to assess the performance of the models, while the training set was utilized to train multiple ML models. Validation is a crucial step of the model development process to ensure that the developed model is accurate enough for the intended use [54,55,56]. For validation purposes, splitting the data guarantees that the models are trained on a representative subset of the data and evaluated on unseen data, giving a trustworthy assessment of their generalization ability [53]. A 5-fold cross-validation was implemented to confirm the model's accuracy for its intended application by dividing the dataset into five equal parts [57]. In each iteration, a different fold was used as the test set, while the remaining folds constituted the training set. The model's performance is dependent on the hyperparameters used during training. To identify the best hyperparameter configuration, a grid search method was employed to find the optimal set that delivered the best performance.
To evaluate the regression model’s performance, several model metrics can be used depending on the specific tasks, data characteristics, and circumstances [58,59,60]. In this regression study, three widely used assessment metrics-R-squared (R2), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)-were used to evaluate the performances of the ML models. MAE measures the average magnitude of the errors between predicted and actual values (eq 3). R2 quantifies the percentage of variance that is explained by the models (eq 4), whereas RMSE denotes the average size of the residuals (eq 5). These metrics reveal information about the produced ML models’ precision, goodness-of-fit, and accuracy. Higher values of R2 and lower values of the error measures indicate better prediction performance and accuracy [61].
\[ MAE=\frac{\sum_{i=1}^n|\hat{y}_i-y_i|}n \tag{3} \]
\[ R^2=1-\sqrt{\frac{\sum_{i=1}^n(\hat{y}_i-y_i)^2}{\sum_{i=1}^n(\hat{y}_i-\bar{y})^2}} \tag{4} \]
\[ RMSE=\sqrt{\frac1n\sum\nolimits_{i=1}^n(\hat{y}_i-y_i)^2} \tag{5} \]
3. Results
3.1 ML Model Performance
The performance of ML models, including ANN, GBM, RF, XGBoost, and a hybrid RF-GBM, was evaluated using 90:10 and 80:20 train-test splits. The comparison between training and test performance helps to evaluate the models’ generalization ability. An exceptionally high training performance relative to the test performance could be a sign of overfitting. Table 2 shows the model performance metrics for BOD prediction. The performance metrics for all target variables for 90:10 and 80:20 train-test splits are shown in Table S2 of additional materials.
Table 2 Model performance metrices for 90:10 and 80:20 train-test splits.
3.1.1 Train-Test Split (90:10)
ANN model showed good performance for BODᵢ on the training set but higher errors on the test set. The GBM model exhibited excellent training and test performance. The RF model demonstrated good training performance but had higher test set errors (MAE of 35.51, R² of 0.88, RMSE of 51.07). XGBoost and RF-GBM showed good training and test performance. ANN model achieved nearly perfect results for BODₑ on the training set (MAE of 0.09, R² of 1.00, and RMSE of 0.14) and maintained strong performance on the test set (MAE of 0.60, R² of 0.96, and RMSE of 1.25). The GBM model had slight errors on the test set compared to ANN. The RF model showed higher errors on the test set compared to other models. XGBoost and RF-GBM maintained good performance on the test set.
ANN model had reasonable training performance for (NH₃)ᵢ but higher test set errors (MAE of 10.22, R² of 0.82, and RMSE of 3.20). The GBM model showed excellent training performance and moderate test set errors (MAE of 7.52, R² of 0.87, and RMSE of 2.74). The RF model had higher test set errors compared to GBM. XGBoost and RF-GBM had lower test set errors compared to other models. For (NH₃)ₑ, the ANN model had good training performance but poor test set performance (MAE of 0.53, R² of 0.33, and RMSE of 1.06). The GBM model showed better set results (MAE of 0.48, R² of 0.60, and RMSE of 0.82). The RF, XGBoost and RF-GBM showed similar results as GBM.
For Pᵢ, the ANN model had good training results (MAE of 0.45, R² of 0.94, and RMSE of 0.62) and moderate test set errors (MAE of 0.64, R² of 0.84, and RMSE of 0.50). The GBM model maintained good performance on both sets (test MAE of 0.57, R² of 0.86, and RMSE of 0.89). RF, XGBoost, and RF-GBM showed moderate test set performance. For Pₑ, the ANN model had reasonable training performance, but poor test set performance (test MAE of 0.14, R² of 0.42, and RMSE of 0.18). The GBM model had better test set results (MAE of 0.10, R² of 0.65, and RMSE of 0.14). The RF model had moderate performance with slightly higher test set errors (MAE of 0.11, R² of 0.61, and RMSE of 0.15). XGBoost, and RF-GBM showed good performance on both sets.
For TSSᵢ, the ANN model showed good performance on both sets (test MAE of 5.75, R² of 0.99, and RMSE of 9.65). The GBM model had higher test set errors (MAE of 12.20, R² of 0.96, and RMSE of 20.43). The RF model showed significantly higher errors on the test set (MAE of 26.71, R² of 0.82, and RMSE of 41.55). For TSSₑ, the ANN model showed excellent training performance and strong test set results (test MAE of 0.52, R² of 0.95, and RMSE of 0.91). The GBM model had better test set results (MAE of 0.38, R² of 0.97, and RMSE of 0.70). The RF model showed moderate performance with higher test set errors (MAE of 0.60, R² of 0.90, and RMSE of 1.27). XGBoost and RF-GBM maintained good performance for both TSSᵢ and TSSₑ.
3.1.2 Train-Test Split (80:20)
For BODᵢ, ANN model performed well on the training set with an MAE of 5.70, R² of 1.00, and RMSE of 8.61, but exhibited higher errors on the test set with an MAE of 12.21, R² of 0.97, and RMSE of 25.73. The GBM model also showed higher errors on the test set. The RF model demonstrated good training performance but significantly higher errors on the test set (MAE of 38.46, R² of 0.86, and RMSE of 56.95). The XGBoost and RF-GBM models, similar to GBM, had excellent training performance but higher errors on test set results. For BODₑ, ANN model achieved almost perfect results on the training set (MAE of 0.09, R² of 1.00, and RMSE of 0.14) and maintained strong performance on the test set (MAE of 0.60, R² of 0.96, and RMSE of 1.25). The GBM model exhibited slight errors on the test set (MAE of 0.68, R² of 0.94, and RMSE of 1.35). The RF model had higher errors on the test set compared to other models. The XGBoost and RF-GBM models maintained good performance on the test set.
For (NH₃)ᵢ, ANN model had reasonable training performance but higher test set errors (MAE of 2.30, R² of 0.81, and RMSE of 3.16). The GBM model showed excellent training performance and moderate test set errors (MAE of 1.99, R² of 0.85, and RMSE of 2.82). The RF model had moderate performance with higher test set errors (MAE of 2.09, R² of 0.84, and RMSE of 2.91). The XGBoost and RF-GBM models, similar to GBM, had lower test set errors. For (NH₃)ₑ, ANN model had good training performance (MAE of 0.11, R² of 0.99, and RMSE of 0.16) but poor test set performance (MAE of 0.46, R² of 0.35, and RMSE of 0.89). The GBM model had better test set results (MAE of 0.47, R² of 0.45, and RMSE of 0.82). The RF, XGBoost, and RF-GBM models had good performance on the test set.
For Pᵢ, ANN model had good training results (MAE of 0.45, R² of 0.95, and RMSE of 0.62) and moderate test set errors (MAE of 0.62, R² of 0.86, and RMSE of 0.89). The GBM model maintained good performance on both sets (test MAE of 0.57, R² of 0.87, and RMSE of 0.85). The RF, GBM, XGBoost, and RF-GBM models showed performance similar to GBM. For Pₑ, ANN model had reasonable training and test performance. The GBM model had better test set results (test MAE of 0.11, R² of 0.55, and RMSE of 0.17). The RF, XGBoost and RF-GBM models had good performance on both sets.
For TSSᵢ, ANN model had good performance on both sets. The GBM model had higher test set errors (MAE of 11.97, R² of 0.95, and RMSE of 20.17). The RF model had significantly higher errors on the test set (MAE of 23.06, R² of 0.85, and RMSE of 36.11). The XGBoost model maintained good performance (test MAE of 10.82, R² of 0.96, and RMSE of 18.31). The RF-GBM model showed balanced results with similar errors to GBM and XGBoost. For TSSₑ, ANN model had excellent training performance and strong test set results (MAE of 0.56, R² of 0.94, and RMSE of 0.96). The GBM model had better test set results (MAE of 0.33, R² of 0.97, and RMSE of 0.71). The RF model had moderate performance with higher test set errors (MAE of 0.54, R² of 0.91, and RMSE of 1.24). The XGBoost and RF-GBM models maintained good performance on both sets (MAE of 0.42, R² of 0.96, and RMSE of 0.82).
3.2 Feature Selection Methods
Various FS methods were employed to identify the most significant variables impacting the concentrations of BODᵢ, BODₑ, (NH₃)ᵢ, (NH₃)ₑ, Pᵢ, Pₑ, TSSᵢ, and TSSₑ in WWTP. Table 3 shows common features shared by FS methods for various target variables. For BODᵢ, Pᵢ consistently emerges as the most influential variable across various methods. Other important variables include BOD Removed (%), TSSᵢ, and (NH₃)ᵢ, which are consistently identified in multiple methods. For BODₑ, TSSₑ is identified as the most significant variable across multiple methods, with BOD Removed (%) frequently highlighted as important. For (NH₃)ₑ, Flowᵢ is most significant across methods and BODₑ is constantly significant in all methods. For Pᵢ, BODᵢ and TSSᵢ, are the topmost and secondmost across all the FS methods. For Pₑ, TSSₑ and BODₑ are topmost and secondmost across all the methods. For TSSᵢ, Pᵢ is most significant in multiple methods and BODᵢ is consistently identified in all methods. For TSSₑ, BOD and TSS Removed (%) are most significant in all methods. The consistency across different FS methods strengthens the reliability of these findings providing a robust basis for further research and practical applications. The top five strongly correlated variables related to target variables based on five FS methods are shown in Figure S1 of additional materials.
Table 3 Common features selected by FS methods.
3.3 XAI
The results of the LIME and SHAP analyses for various target variables revealed the order of feature influence and their effects on ML models. Figure 2 shows one of the LIME plots. Figure 2(i) and Figure 2(ii) show the variables and their contributions (blue as negative, orange as positive) to BODᵢ and BODₑ respectively for RF-GBM model for 50th instance. Predicted values of 50th instance for BODᵢ and BODₑ are 305.95 mg/L and 15.29 mg/L respectively. According to the figure, BODₑ and TSSₑ show strongest positive effect on BODᵢ and BODₑ prediction respectively. Figure 3 shows one of the SHAP summary plots. Figure 3(i) and Figure 3(ii) show the variables and their contributions to BODᵢ and BODₑ respectively for RF-GBM model. The figure shows that higher values of TSSᵢ (red dots) tend to contribute positively to the BODᵢ prediction. In comparison, the lower values (blue dots) have negative contributions. While BODₑ and (NH₃)ₑ have highest importance on BODᵢ and BODₑ prediction respectively, SHAP summary plots also show it is indecisive for multiple variable regarding direction of prediction.
Figure 2 LIME explanation for RF-GBM model (i) BODᵢ; LIME predicted value is 305.95 with a range between 93.84 and 854.07. BODₑ with a value of 16.00 mg/L significantly contributes to the predicted BODᵢ. BOD Removed % (94.48), Pᵢ (6.60 mg/L), and TSSᵢ (240.00 mg/L) all play a role in reducing the predicted BODᵢ value. (ii) BODₑ; LIME predicted 15.29 with a range between 4.20 and 57.02. TSSₑ with a value of 16.00 mg/L is the most significant feature positively influencing the BODₑ prediction. (NH₃)ₑ, BODᵢ, and Aer Basin Temp negatively affect the prediction.
Figure 3 SHAP explanation for RF-GBM model. (i) BODᵢ (ii) BODₑ.
Multiple variables were shared by both LIME and SHAP. Full variable list for LIME and SHAP in order of influence shown in Figure S1 of additional materials. Since LIME provided positive and negative impacts of variables explicitly, signs (positive or negative) were provided next to the variable’s name. The SHAP summary plot was indecisive regarding positive or negative influence in many cases. Therefore, no signs were provided next to the SHAP identified variables. For predicting BODᵢ, LIME and SHAP both identified BODₑ, BOD Removed (%), Pᵢ, and TSSᵢ as key variables in all models, although the specific order varied slightly. For BODₑ, LIME and SHAP analyses consistently identified TSSₑ and BODₑ Removed (%) as influential variables across all models, with some discrepancies in the order of influence. In the case of TSSᵢ, LIME and SHAP analyses identified similar key variables but with differences in the order of influence. For TSSₑ, both LIME and SHAP analyses indicated TSS Removed (%), BODₑ, and Pₑ as significant variables. For (NH₃)ᵢ, both LIME and SHAP identified Flowᵢ, Pᵢ, and GBT Polymer Used as influential variables. For (NH₃)ₑ, LIME consistently highlighted BODₑ and E. Coli as significant variables, with varying impact directions across the models. SHAP’s results were less consistent, with BODₑ and E. Coli appearing in differing orders of importance. In the prediction of Pᵢ, both LIME and SHAP analyses identified BODᵢ, (NH₃)ᵢ, and TSS Removed (%) as the key variables, though the order and direction of influence differed. For Pₑ, LIME and SHAP both identified TSSₑ, BODₑ, Tempₑ and Aer Basin Temp as important features, with varying order or influence.
4. Discussion
The study investigated the performance of multiple ML models, i.e., ANN, GBM, RF, XGBoost, and RF-GBM, in predicting several influential influent and effluent water quality variables in a WWTP. ANN, GBM, and XGBoost demonstrated significant potential for variable prediction as they produced low error rates and strong correlation coefficients (R2). Table 4 shows the models with the highest R² for each target variable, including cases where multiple models achieved the same performance.
Table 4 Model(s) with the highest R² for each target variable.
Based on our findings, the complex interactions among various WWTP variables can be captured by GBM. For example, GBM performed particularly well in predicting variables such as BOD and NH₃. This agrees with other study that GBM performed better than ANN in WWTP variable prediction [22]. Although RF performed very well on training data, overfitting caused poor performance on the test set (unseen data). As an alternative to RF model, hybrid RF-GBM model was able to increase the models’ accuracy particularly for predicting BOD and P levels, by utilizing the advantages of both models. Overall, hybrid RF-GBM model provided a flexible approach that can be tailored to specific prediction challenges within WWTPs. ANN provided a competitive alternative, while GBM, XGBoost, and RF-GBM stood out as superior performers. The performance of XGBoost is consistent with other researchers' findings [38,62]. XGBoost utilizes gradient-boosting methods to sequentially create an ensemble of weak prediction models and fix errors, leading to greater overall performance [63].
The LIME and SHAP analyses produced strong agreement with the FS results. Table 5 compares the shared feature(s) chosen by the FS methods with the features chosen by LIME and SHAP for the ML models. Traditionally FS methods are used in various studies to identify the most suitable input data from a dataset to increase model accuracy [22]. While FS methods does not consider ML models in selecting influential variables for target variables, XAI tools i.e. LIME and SHAP show influential variables significance on each models’ prediction. Our study revealed that FS and XAI have identified several common influential variables regardless of choice of model or FS methods in predicting target variables.
Table 5 Comparison of the shared feature(s) chosen by the FS methods with the features chosen by LIME and SHAP.
It is also interesting to find that although ML models perform without knowledge of real-world impact of input variables on target variable, some of the common variables significantly impact certain models according to XAI. For instance, BODₑ, TSSᵢ, and Pᵢ were all shown to be significant to BODᵢ predictions by both LIME and SHAP. LIME explicitly reported positive and negative impacts while SHAP summary plot displayed varying importance without an apparent direction of influence. Based on our findings, LIME and SHAP can help in understanding the variables' importance in ML-based prediction, thereby can support targeted interventions in WWTP operation.
5. Conclusions
This study compared several XAI tools in predicting key WWTP variables using various ML models. Based on the findings of this study, the following conclusions are reached:
- ML models, ANN, GBM, XGBoost, and RF-GBM consistently outperform the others, exhibiting strong prediction abilities with reduced errors and higher R2 values.
- The use of SHAP and LIME enhances the interpretability of ML models by providing the impact of input variables on the model outputs.
- The reliability of XAI tools in identifying important WWTP factors is supported by the agreement of results between FS approaches and XAI tools.
Future research should focus on incorporating diverse case studies from various WWTPs and operational conditions to enhance the adaptability and generalization of the models. The effects of various variable sets on model performance or dimension reduction strategies can also be further investigated. WWTP can optimize operations and reduce costs while mitigating environmental impacts by leveraging the interpretation provided by XAI and using robust ML models.
Acknowledgments
The authors acknowledge the Milwaukee Metropolitan Sewage District (MMSD) for providing data for the study.
Author Contributions
The original concept and supervision of the research and editing of the article by Jin Li; figures, data analysis, and writing of the article by Fuad Bin Nasir.
Competing Interests
The authors have declared that no competing interests exist.
Additional Materials
The following additional materials are uploaded at the page of this paper.
- Table S1: List of abbreviations.
- Table S2: Model performance metrices for 90:10 and 80:20 train-test splits.
- Figure S1: (i, ii, iii, iv) Top five strongly correlated variables related to target variables.
- Table S3: LIME and SHAP selected top five influential variables in descending order of influence.
References
- Torregrossa D, Schutz G, Cornelissen A, Hernández-Sancho F, Hansen J. Energy saving in WWTP: Daily benchmarking under uncertainty and data availability limitations. Environ Res. 2016; 148: 330-337. [CrossRef]
- Abba SI, Elkiran G. Effluent prediction of chemical oxygen demand from the astewater treatment plant using artificial neural network application. Procedia Comput Sci. 2017; 120: 156-163. [CrossRef]
- Bernardelli A, Marsili-Libelli S, Manzini A, Stancari S, Tardini G, Montanari D, et al. Real-time model predictive control of a wastewater treatment plant based on machine learning. Water Sci Technol. 2020; 81: 2391-2400. [CrossRef]
- Zhang S, Wang H, Keller AA. Novel machine learning-based energy consumption model of wastewater treatment plants. ACS ES T Water. 2021; 1: 2531-2540. [CrossRef]
- Guo H, Jeong K, Lim J, Jo J, Kim YM, Park JP, et al. Prediction of effluent concentration in a wastewater treatment plant using machine learning models. J Environ Sci. 2015; 32: 90-101. [CrossRef]
- Wang D, Thunéll S, Lindberg U, Jiang L, Trygg J, Tysklind M, et al. A machine learning framework to improve effluent quality control in wastewater treatment plants. Sci Total Environ. 2021; 784: 147138. [CrossRef]
- El-Rawy M, Abd-Ellah MK, Fathi H, Ahmed AK. Forecasting effluent and performance of wastewater treatment plant using different machine learning techniques. J Water Process Eng. 2021; 44: 102380. [CrossRef]
- Li G, Ji J, Ni J, Wang S, Guo Y, Hu Y, et al. Application of deep learning for predicting the treatment performance of real municipal wastewater based on one-year operation of two anaerobic membrane bioreactors. Sci Total Environ. 2022; 813: 151920. [CrossRef]
- Zhu J, Jiang Z, Feng L. Improved neural network with least square support vector machine for wastewater treatment process. Chemosphere. 2022; 308: 136116. [CrossRef]
- Zhu JJ, Borzooei S, Sun J, Ren ZJ. Deep learning optimization for soft sensing of hard-to-measure wastewater key variables. ACS ES T Eng. 2022; 2: 1341-1355. [CrossRef]
- Aghdam E, Mohandes SR, Manu P, Cheung C, Yunusa-Kaltungo A, Zayed T. Predicting quality parameters of wastewater treatment plants using artificial intelligence techniques. J Clean Prod. 2023; 405: 137019. [CrossRef]
- Shyu HY, Castro CJ, Bair RA, Lu Q, Yeh DH. Development of a soft sensor using machine learning algorithms for predicting the water quality of an onsite wastewater treatment system. ACS Environ Au. 2023; 3: 308-318. [CrossRef]
- Wei X, Yu J, Tian Y, Ben Y, Cai Z, Zheng C. Comparative performance of three machine learning models in predicting influent flow rates and nutrient loads at wastewater treatment plants. ACS ES T Water. 2023; 4: 1024-1035. [CrossRef]
- Xu Y, Wang Z, Nairat S, Zhou J, He Z. Artificial intelligence-assisted prediction of effluent phosphorus in a full-scale wastewater treatment plant with missing phosphorus input and removal data. ACS ES T Water. 2023; 4: 880-889. [CrossRef]
- Yu J, Tian Y, Jing H, Sun T, Wang X, Andrews CB, et al. Predicting regional wastewater treatment plant discharges using machine learning and population migration big data. ACS ES T Water. 2023; 3: 1314-1328. [CrossRef]
- Alsulaili A, Refaie A. Artificial neural network modeling approach for the prediction of five-day biological oxygen demand and wastewater treatment plant performance. Water Supply. 2021; 21: 1861-1877. [CrossRef]
- Nasir FB, Li J. Understanding machine learning predictions of wastewater treatment plant sludge with explainable artificial intelligence. Water Environ Res. 2024; 96: e11136. [CrossRef]
- Fan M, Hu J, Cao R, Ruan W, Wei X. A review on experimental design for pollutants removal in water treatment with the aid of artificial intelligence. Chemosphere. 2018; 200: 330-343. [CrossRef]
- Adibimanesh B, Polesek-Karczewska S, Bagherzadeh F, Szczuko P, Shafighfard T. Energy consumption optimization in wastewater treatment plants: Machine learning for monitoring incineration of sewage sludge. Sustain Energy Technol Assess. 2023; 56: 103040. [CrossRef]
- Keerio HA, Shah SA, Ali Z, Panhwar S, Solangi GS, Ali A, et al. A fascinating exploration into nitrite accumulation into low concentration reactors using cutting-edge machine learning techniques. Process Biochem. 2024; 146: 160-168. [CrossRef]
- Solangi GS, Ali Z, Bilal M, Junaid M, Panhwar S, Keerio HA, et al. Machine learning, water quality index, and GIS-based analysis of groundwater quality. Water Pract Technol. 2024; 19: 384-400. [CrossRef]
- Bagherzadeh F, Mehrani MJ, Basirifard M, Roostaei J. Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance. J Water Process Eng. 2021; 41: 102033. [CrossRef]
- Wu Z, Duan H, Li K, Ye L. A comprehensive carbon footprint analysis of different wastewater treatment plant configurations. Environ Res. 2022; 214: 113818. [CrossRef]
- Keerio HA, Bae W, Park J, Kim M. Substrate uptake, loss, and reserve in ammonia-oxidizing bacteria (AOB) under different substrate availabilities. Process Biochem. 2020; 91: 303-310. [CrossRef]
- Singh NK, Yadav M, Singh V, Padhiyar H, Kumar V, Bhatia SK, et al. Artificial intelligence and machine learning-based monitoring and design of biological wastewater treatment systems. Bioresour Technol. 2023; 369: 128486. [CrossRef]
- Zhao LJ, Chai TY, Yuan DC. Selective ensemble extreme learning machine modeling of effluent quality in wastewater treatment plants. Int J Autom. Comput. 2012; 9: 627-633. [CrossRef]
- Bagheri M, Mirbagheri SA, Ehteshami M, Bagheri Z, Kamarkhani AM. Analysis of variables affecting mixed liquor volatile suspended solids and prediction of effluent quality parameters in a real wastewater treatment plant. Desalin Water Treat. 2016; 57: 21377-21390. [CrossRef]
- Sharghi E, Nourani V, AliAshrafi A, Gökçekuş H. Monitoring effluent quality of wastewater treatment plant by clustering based artificial neural network method. Desalin Water Treat. 2019; 164: 86-97. [CrossRef]
- Khatri N, Khatri KK, Sharma A. Prediction of effluent quality in ICEAS-sequential batch reactor using feedforward artificial neural network. Water Sci Technol. 2019; 80: 213-222. [CrossRef]
- Al-Ghazawi Z, Alawneh R. Use of artificial neural network for predicting effluent quality parameters and enabling wastewater reuse for climate change resilience-A case from Jordan. J Water Process Eng. 2021; 44: 102423. [CrossRef]
- Elmaadawy K, Abd Elaziz M, Elsheikh AH, Moawad A, Liu B, Lu S. Utilization of random vector functional link integrated with manta ray foraging optimization for effluent prediction of wastewater treatment plant. J Environ Manage. 2021; 298: 113520. [CrossRef]
- Nourani V, Asghari P, Sharghi E. Artificial intelligence based ensemble modeling of wastewater treatment plant using jittered data. J Clean Prod. 2021; 291: 125772. [CrossRef]
- Ly QV, Truong VH, Ji B, Nguyen XC, Cho KH, Ngo HH, et al. Exploring potential machine learning application based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants. Sci Total Environ. 2022; 832: 154930. [CrossRef]
- Dantas MS, Christofaro C, Oliveira SC. Artificial neural networks for performance prediction of full-scale wastewater treatment plants: A systematic review. Water Sci Technol. 2023; 88: 1447-1470. [CrossRef]
- Mahanna H, El-Rashidy N, Kaloop MR, El-Sapakh S, Alluqmani A, Hassan R. Prediction of wastewater treatment plant performance through machine learning techniques. Desalin Water Treat. 2024; 319: 100524. [CrossRef]
- Park J, Lee WH, Kim KT, Park CY, Lee S, Heo TY. Interpretation of ensemble learning to predict water quality using explainable artificial intelligence. Sci Total Environ. 2022; 832: 155070. [CrossRef]
- Hu Y, Wei R, Yu K, Liu Z, Zhou Q, Zhang M, et al. Exploring sludge yield patterns through interpretable machine learning models in China's municipal wastewater treatment plants. Resour Conserv Recycl. 2024; 204: 107467. [CrossRef]
- Shao S, Fu D, Yang T, Mu H, Gao Q, Zhang Y. Analysis of machine learning models for wastewater treatment plant sludge output prediction. Sustainability. 2023; 15: 13380. [CrossRef]
- Shafighfard T, Kazemi F, Asgarkhani N, Yoo DY. Machine-learning methods for estimating compressive strength of high-performance alkali-activated concrete. Eng Appl Artif Intell. 2024; 136: 109053. [CrossRef]
- Shafighfard T, Kazemi F, Bagherzadeh F, Mieloszyk M, Yoo DY. Chained machine learning model for predicting load capacity and ductility of steel fiber-reinforced concrete beams. Comput Aided Civ Infrastruct Eng. 2024. doi: 10.1111/mice.13164. [CrossRef]
- Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017. doi: 10.48550/arXiv.1705.07874.
- Lundberg SM, Erion GG, Lee SI. Consistent individualized feature attribution for tree ensembles. ArXiv. 2018. doi: 10.48550/arXiv.1802.03888.
- Li R, Feng K, An T, Cheng P, Wei L, Zhao Z, et al. Enhanced insights into effluent prediction in wastewater treatment plants: Comprehensive deep learning model explanation based on SHAP. ACS ES T Water. 2024; 4: 1904-1915. [CrossRef]
- Ribeiro MT, Singh S, Guestrin C. "Why should I trust you?" Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016 August 13-17; San Francisco, CA, USA. New York, NY: Association for Computing Machinery. pp. 1135-1144. [CrossRef]
- Ye Z, Yang J, Zhong N, Tu X, Jia J, Wang J. Tackling environmental challenges in pollution controls using artificial intelligence: A review. Sci Total Environ. 2020; 699: 134279. [CrossRef]
- Matheri AN, Ntuli F, Ngila JC, Seodigeng T, Zvinowanda C. Performance prediction of trace metals and cod in wastewater treatment using artificial neural network. Comput Chem Eng. 2021; 149: 107308. [CrossRef]
- Konstantinov AV, Utkin LV. Interpretable machine learning with an ensemble of gradient boosting machines. Knowl Based Syst. 2021; 222: 106993. [CrossRef]
- Tyralis H, Papacharalampous G, Langousis A. A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water. 2019; 11: 910. [CrossRef]
- Nafsin N, Li J. Prediction of total organic carbon and E. coli in rivers within the Milwaukee River basin using machine learning methods. Environ Sci Adv. 2023; 2: 278-293. [CrossRef]
- Jiang M, Wang J, Hu L, He Z. Random forest clustering for discrete sequences. Pattern Recognit Lett. 2023; 174: 145-151. [CrossRef]
- Szomolányi O, Clement A. Use of random forest for assessing the effect of water quality parameters on the biological status of surface waters. GEM. 2023; 14: 20. [CrossRef]
- Sun Z, Wang G, Li P, Wang H, Zhang M, Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst Appl. 2024; 237: 121549. [CrossRef]
- Yadav P, Chandra M, Fatima N, Sarwar S, Chaudhary A, Saurabh K, et al. Predicting influent and effluent quality parameters for a UASB-based wastewater treatment plant in Asia covering data variations during COVID-19: A machine learning approach. Water. 2023; 15: 710. [CrossRef]
- Xie Y, Chen Y, Lian Q, Yin H, Peng J, Sheng M, et al. Enhancing real-time prediction of effluent water quality of wastewater treatment plant based on improved feedforward neural network coupled with optimization algorithm. Water. 2022; 14: 1053. [CrossRef]
- Sargent RG. Verification and validation of simulation models. Proceedings of the 2010 Winter Simulation Conference; 2010 December 05-08; Baltimore, MD, USA. Piscataway, NJ: IEEE. pp. 166-183. [CrossRef]
- Tsioptsias N, Tako A, Robinson S. Model validation and testing in simulation: A literature review. Proceedings of the 5th Student Conference on Operational Research (SCOR 2016); 2016 April 08-10; Nottingham, UK. Wadern, Germany: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
- Zhang X, Liu CA. Model averaging prediction by K-fold cross-validation. J Econom. 2023; 235: 280-301. [CrossRef]
- Kazemi F, Asgarkhani N, Shafighfard T, Jankowski R, Yoo DY. Machine-learning methods for estimating performance of structural concrete members reinforced with fiber-reinforced polymers. Arch Comput Methods Eng. 2024. doi: 10.1007/s11831-024-10143-1. [CrossRef]
- Bagherzadeh F, Shafighfard T, Khan RM, Szczuko P, Mieloszyk M. Prediction of maximum tensile stress in plain-weave composite laminates with interacting holes via stacked machine learning algorithms: A comparative study. Mech Syst Signal Process. 2023; 195: 110315. [CrossRef]
- Shafighfard T, Bagherzadeh F, Rizi RA, Yoo DY. Data-driven compressive strength prediction of steel fiber reinforced concrete (SFRC) subjected to elevated temperatures using stacked machine learning algorithms. J Mater Res Technol. 2022; 21: 3777-3794. [CrossRef]
- Safder U, Kim J, Pak G, Rhee G, You K. Investigating machine learning applications for effective real-time water quality parameter monitoring in full-scale wastewater treatment plants. Water. 2022; 14: 3147. [CrossRef]
- Zhang Y, Wu H, Xu R, Wang Y, Chen L, Wei C. Machine learning modeling for the prediction of phosphorus and nitrogen removal efficiency and screening of crucial microorganisms in wastewater treatment plants. Sci Total Environ. 2024; 907: 167730. [CrossRef]
- Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 August 13-17; San Francisco, CA, USA. New York, NY: Association for Computing Machinery. pp. 785-794. [CrossRef]