Gå til innhold

Doktoravhandling

Synergy of Earth Observation and Machine Learning for Air Quality Monitoring in Europe

Air pollution is a major environmental and public health challenge, particularly in urban areas where nitrogen dioxide (NO2) and fine particulate matter (PM2.5) contribute significantly to adverse health effects. Accurate, high-resolution estimates of surface air pollution are essential for exposure assessments and relevant policy interventions. While regulatory air quality monitoring networks provide reliable measurements, their spatial coverage and density is limited, necessitating alternative data sources such as satellite retrievals, chemical transport models (CTMs), and low-cost sensors (LCS). However, these sources have inherent limitations in terms of spatial resolution, columnar measurements, data quality, and computational constraints when used individually. This thesis develops a synergistic approach under the “Satellite and Machine Learning (ML)-based Estimation of Surface air quality at High resolution” (S-MESH) framework, applying machine learning to integrate multiple data sources for improved NO2 and PM2.5 surface estimations across Europe at both urban and continental scales.

For NO2, the thesis explores the fusion of satellite observations, range of reanalysis meteorological variables, and ground-based measurements to generate daily 1 km resolution surface NO2 estimates over Europe, for the periods 2019-2021. Key satellite-derived inputs include tropospheric vertical column density (VCD) of NO2 from Sentinel-5 Precursor TROPOspheric Monitoring Instrument (TROPOMI), nightlight radiance from the Suomi National Polar-orbiting Partnership’s Visible Infrared Imaging Radiometer Suite (VIIRS) instrument, and vegetation indices from Moderate Resolution Imaging Spectroradiometer (MODIS), while meteorological variables are obtained from ERA5 and ERA5-Land. In addition, NO2 observations from reference-grade air quality monitoring stations serve as the ground-truth data for model training and evaluation. The ML modeling is conducted using an eXtreme Gradient Boosting (XGBoost) model within the S-MESH framework. The model achieves a median bias of 6%, with the most reliable NO2 predictions in the medium pollution range (10-40 μg/m3). A comprehensive Shapley Additive exPlanations (SHAP) analysis reveals the strong influence of TROPOMI NO2 column and VIIRS nightlight data in capturing both the spatial and temporal pollution variations, while also quantifying their impact on model uncertainty. The findings highlight the critical role of TROPOMI NO2 in high-resolution surface air quality mapping and in unexpected events, particularly when complemented by ground-based and ancillary datasets.

For PM2.5, the thesis investigates the downscaling of the Copernicus Atmosphere Monitoring Service (CAMS) regional (0.1×0.1 resolution) forecasts to 1 km resolution hindcasts through a synergy with satellite aerosol optical depth (AOD), meteorological data, and other variables. PM2.5 is estimated over the period 2021-2022 as daily means, which makes it relevant within the EU’s monitoring and regulatory framework. Unlike for the NO2 model, where TROPOMI NO2 tropospheric VCD was by far the most important predictor variable, satellite-based AOD exhibits lower predictive importance but remains essential for capturing episodic events such as dust transport. This downscaling approach improves CAMS forecast-based estimates, particularly over Poland and Eastern Europe. In these regions, it effectively corrects the known underestimations of winter months from the CAMS European interim reanalysis. For concentrations exceeding 20 μg/m³, S-MESH outperforms the interim reanalysis, achieving a lower median bias (-7.3 μg/m3 vs -10.3 μg/m3 for the reanalysis) while effectively capturing high pollution events with improved spatial and temporal accuracy. These findings demonstrate S-MESH’s utility as a timely alternative to the CAMS interim reanalysis, which otherwise becomes available with a significant delay of up to 15 months. A key innovation in this study is the use of stacked XGBoost models, which enable spatially continuous PM2.5 predictions even in the presence of satellite AOD retrieval spatiotemporal gaps.

The third component of this thesis evaluates the possible role of low(er) cost sensor data particularly from citizen science initiatives for PM2.5 estimation, focusing on Central Europe, where LCS networks are densest. Two integration strategies are studied: (i) training an ML model with LCS as a substitute for regulatory stations (LCST model), and (ii) incorporating LCS data as an additional feature via a convolutional layer approach (LCSI model). These are compared against a Baseline model trained without LCS data. Results show that the convolution-based LCSI model achieves the highest accuracy, outperforming the Baseline model, particularly in urban areas where LCS density is highest. The Baseline model, while not as locally accurate, demonstrates strong generalization across diverse settings. In contrast, the LCST model tends to underestimate absolute concentration but captures temporal pollution dynamics effectively at daily mean scales. The inclusion of LCS data as an input feature plays a crucial role in resolving small-scale and localized variations of PM2.5 that are often missed by satellites or coarser models. With advancements in sensor technology and growing adoption of LCS for PM2.5 monitoring, these networks provide increasingly dense observations that can further complement traditional monitoring approaches.

This thesis highlights the added value of multi-source synergies in ML-based monitoring of air pollution. By systematically exploring data fusion methodologies and integration strategies, it demonstrates the computational efficiency and scalability of ML-based approaches, particularly the suitability of XGBoost for large-scale applications. The findings provide new insights into the spatial representativity and applicability of diverse data sources for air quality assessment, supporting the development of enhanced monitoring strategies and data-driven policy frameworks across Europe and beyond.

Publikasjonsdetaljer

Tidsskrift: 2025

Doktoravhandling

År: 2025

Serier: Series of dissertations submitted to the Faculty of Mathematics and Natural Sciences, University of Oslo., 2569

Forlag: Universitetet i Oslo

Språk: Engelsk

This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.