Share this post

Soft Sensing via XGBoost on Wastewater Treatment Plant data

Abstract

A set of indicators that incorporate environmental, societal, and economic sustainability were developed and used to investigate the sustainability of different waste water treatment technologies, for plant capacities of <5 million gallons per day (MGD) or 18.9×103 (m3/day). The technologies evaluated were mechanical (i.e., activiated sludge with secondary treatment), lagoon (facultative, anaerobic, and aerobic), and land treatment systems (e.g., slow rate irrigation, rapid infltration, and overland flow). The economic indicators selected were capital, operation and management, and user costs because they determine the economic affordability of a particular technology to a community. Environmental indicators include energy use, because it indirectly measures resource utilization, and performance of the technology in removing conventional wastewater constituents such as biomedical oxygen demand, ammonia nitrogen, phosphorus, and pathogens. These indicators also determine the reuse potential of the treated wastewater. Societal indicators capture cultural acceptance of the technology through public participation and also measure whether there is improvement in the community from the specific technology through increased job opportunities, better education, or an improved local environment. While selection of a set of indicators is dependent on the geographic and demographic context of a particular community, the overall results of this study show that there are varying degrees of sustainability with each treatment technology.

Introduction

Wastewater collection systems (i.e., sewer networks) and centralized and decentralized treatment systems are designed and managed primarily to protect human and environmental health. Though their benefits are widely recognized, there are other aspects of this infrastructure and associated technologies that are not so obvious and hence less acknowledged, yet they impact communities and the surrounding environment. For example a positive aspect of the sewer network is the collection and transport of wastewater to appropriate treatment facilities, whereby pathogens and chemical constituents such as oxygen-depleting organic matter and phosphorus are removed before the treated water is returned to the environment. A negative aspect of such a network is that it can create an imbalance in water and nutrient fluxes and therefore distort natural hydrological and ecological regimes. For instance the discharge of large volumes of treated wastewater that contains low concentrations of chemical constituents may still lead to an excessive input of nutrients in a receiving water body, thus, leading to a water quality problem.

Transport of water and wastewater across watershed boundaries not only increases the emodied energy of a material and requires extensive infrastructure needs, but it may also result in adverse changes in an ecosystem’s hydrology. In addition, treatment facilities, while they treat wastewater to a quality deemed safe for discharge, also consume considerable energy during their operational life, and consequently contribute to atmospheric CO2 emission These impacts, whether positive or negative, greatly affect local and global sustainability, be it in the construction, operation, or dismantling life stage and hence deserve discussion. In an era where there is growing concern of the local and global impact of our current environmental management strategies, and the need to reduce sanitation problems, disease, and poverty, there is a greater need to develop more environmentally responsible, appropriate waste water treatment technologies whose performance is balanced by environmental, economic, and societal sustainability.

Sustainability of wastewater treatment systems can be assessed through different assessment tools such exergy analysis economic analysis, and life cycle assessment. For this study, the use of a balanced set of indicators that provides a holistic assessment was chosen for evaluating the sustainability of the different wastewater treatment technologies. These wastewater treatment technologies include mechanical systems, lagoons systems, and land treatment systems. Mechanical systems such as activiated sludge utilize physical, chemical and biological mechanisms to remove nutrients, pathogens, metals and other toxic compounds. Lagoon systems primarily use physical and biological processes to treat wastewater, while land treatment systems utilize soil and plants, without significant need for reactors and operational labor, energy, and chemicals.

Methodolgy

Ensemble methods are often a crucial component of winning entries in online ML competitions such as those on Kaggle. Ensemble learning is based on a simple philosophy that committee wisdom can be better than an individual’s wisdom! In this chapter, we will look into how this works and what makes ensembles so powerful. We will study popular ensemble methods like random forests and XGBoost. The base constituent models in forests and XGBoost are decision trees which are simple yet versatile ML algorithms suitable for both regression and classification tasks. Decision trees can fit complex and nonlinear datasets yet enjoy the enviable quality of providing interpretable results. We will look at all these features in detail. Specifically, we will cover the following topics

  • Introduction to decision trees and random forests
  • Soft sensing application of random forests in concrete construction industry
  • Introduction to ensemble learning techniques (bagging, Adaboost, gradient boosting)
  • Effluent quality prediction using XGBoost in wastewater treatment plant

Key Performance Indicators

The following 38 variables have been identified as the global KPI for the Machine Learning-enabled project to predict the PH, BOD, COD, total suspended solid and volatile organic solid.

  1. Q-E (input flow to plant)
  2. ZN-E (input Zinc to plant)
  3. PH-E (input pH to plant)
  4. DBO-E (input Biological demand of oxygen to plant)
  5. DQO-E (input chemical demand of oxygen to plant)
  6. SS-E (input suspended solids to plant)
  7. SSV-E (input volatile supended solids to plant)
  8. SED-E (input sediments to plant)
  9. COND-E (input conductivity to plant)
  10. PH-P (input pH to primary settler)
  11. DBO-P (input Biological demand of oxygen to primary settler)
  12. SS-P (input suspended solids to primary settler)
  13. SSV-P (input volatile supended solids to primary settler)
  14. SED-P (input sediments to primary settler)
  15. COND-P (input conductivity to primary settler)
  16. PH-D (input pH to secondary settler)
  17. DBO-D (input Biological demand of oxygen to secondary settler)
  18. DQO-D (input chemical demand of oxygen to secondary settler)
  19. SS-D (input suspended solids to secondary settler)
  20. SSV-D (input volatile supended solids to secondary settler)
  21. SED-D (input sediments to secondary settler)
  22. COND-D (input conductivity to secondary settler)
  23. PH-S (output pH)
  24. DBO-S (output Biological demand of oxygen)
  25. DQO-S (output chemical demand of oxygen)
  26. SS-S (output suspended solids)
  27. SSV-S (output volatile supended solids)
  28. SED-S (output sediments)
  29. COND-S (output conductivity)
  30. RD-DBO-P (performance input Biological demand of oxygen in primary settler)
  31. RD-SS-P (performance input suspended solids to primary settler)
  32. RD-SED-P (performance input sediments to primary settler)
  33. RD-DBO-S (performance input Biological demand of oxygen to secondary settler)
  34. RD-DQO-S (performance input chemical demand of oxygen to secondary settler)
  35. RD-DBO-G (global performance input Biological demand of oxygen)
  36. RD-DQO-G (global performance input chemical demand of oxygen)
  37. RD-SS-G (global performance input suspended solids)
  38. RD-SED-G (global performance input sediments)

XGBoost Results

If you have been active in the ML world in the recent times, then you must have heard the term XGBoost. It stands for eXtreme Gradient Boosting and is a popular library that implements several tricks and heuristics to make gradient boosting-based model training very
effective, especially for large datasets. XGBoost uses DTs as base models.

Observing high % for the measurement of accuracy and measurement of predictability, the ML model has concluded its validity using XGBoost algorithm

Phase 1 Conclusion

In the presence of enough and reliable data, machine learning provides un-paralled opportunity for well trained and deep analytical models to which everyone can use for ad hoc and what-if scenarios. This conclude people from the same domain, but less experianced can use the ML-enabled model to continue their functions in observing, recording and making business recommendations.

 

Leave a Reply

Your email address will not be published. Required fields are marked *