Search for a project

Application of modern statistical techniques to air quality data

Dr David Carslaw (WACL, University of York), Dr Sarah Moller (WACL, University of York), Dr Andrea Wiencierz (Department of Mathematics, University of York)

Contact email:

The impacts of air pollution are many and wide-ranging including impacts on human health, ecosystems, historical buildings, food security and the economy. It is estimated that in the UK ~40,000 premature deaths per year can be attributed to air pollution with a cost to the economy of > £20 billion1. This is an issue that in recent years has gained increasing attention from Government, local authorities, media and the public. With this increased attention comes a demand for action to improve air quality, generating a need for better information from the research community to support decision-making in central Government and local authorities.

Air pollution measurement data is collected both routinely by long term monitoring networks and on specific short-term measurement campaigns. Much of this data is openly accessible but the current exploitation of the data is limited. Traditionally the way that the atmospheric chemistry community have utilised the available data has been constrained by the tools and techniques that they had the expertise to apply effectively. In some cases this has meant that time series of pollutant concentrations from specific field campaigns have been used primarily to constrain air quality models and to identify interesting periods that could be analysed in more detail as case studies.

Figure 1: A map of some of the Department for Environment, Food and Rural Affairs (Defra) air pollution monitoring sites. Taken from

Air pollution models are used to gain insights into the factors controlling pollutant concentrations at a specific location. However, reproducing measured atmospheric concentrations and interpreting the data is non-trivial due to the complexity of the processes that influence atmospheric concentrations. Traditional air pollution models aim to describe the emission or creation, transport, and destruction of pollutants using chemical and physical equations. This sort of model is dependent on precise measurements of chemical and physical fluxes as well as pollutant concentrations all of which can be difficult to obtain at sufficient temporal and spatial resolution. Statistical analysis of air pollution data offers an alternative way to explore the data, to investigate dependencies and dominant processes and to predict pollutant concentrations.

An area of particular interest is in the assessment of the efficacy of air pollution interventions. That is, to look at whether changes made to try and reduce air pollution have had any discernible effect on ambient pollutant concentrations. Conclusions about the ability of measures to reduce air pollutant concentrations are often based on results from modelling studies which assume a certain impact on emissions2. The ability to draw conclusions from measured data is limited by the complexity and interdependence of the factors controlling pollutant concentrations. For example, changes in pollutant concentration due to regional meteorology, local weather and seasonal cycles can mask the impacts of a specific intervention. Where measured data is used, simple statistical techniques are often applied which do not take sufficient account of the complexity of the system3. This limits the ability to draw meaningful conclusions from the data and leads to weak conclusions with considerable caveats. 

The Natural Environment Research Council invests heavily in the collection of environmental data and the British Atmospheric Data Centre manages around 2 PB of data4, while Defra and the Environment Agency have around 300 active monitoring stations around the country5. This represents a huge financial investment in producing air pollution data. Modern statistical techniques offer the opportunity to look at this data in a different way to gain new insight and increase the scientific value of the vast amounts of data collected. They have the necessary sophistication to deal with the complexity of atmospheric composition data and can provide the interpretability to improve our understanding of this complex system.


In this project, you will work with scientists from the chemistry and mathematics departments at the University of York to look at the potential of various modern data analytic methods in gaining valuable insights from air quality data. The project aims to more fully exploit the air pollution data collected, for example data from previous NERC field campaigns as well as long-term data sets collected at NERC’s atmospheric observatories and on the Defra air quality monitoring networks.

According to your particular research interests, the studentship could include:

  1. The use of machine-learning techniques for air quality data, for example using neural networks and classification and regression trees. Other statistical techniques could also be included (e.g. random forests, boosted GAM).
  2. The application of recent time series methods to gain additional insight from the available data.
  3. Use of statistical models to characterise the chemical climatology of field measurement sites.
  4. Application of statistical methods to assess the impact of events that are expected to alter air pollutant concentrations, for example interventions aimed at lowering traffic emissions.
  5. Evaluation of the applicability of the various techniques used to studies of air quality data.
  6. Exploring the use of statistical techniques to inform the design of air quality studies.
  7. Analysis of the impact of uncertainties, errors and necessary assumptions associated with the decisions made around air quality monitoring networks and monitoring study design, for example the species and parameters measured, the location of the nearest meteorological measurement station, the location of air quality instrumentation, the length of the monitoring period.

Potential for high impact outcome

Air quality management has become an important issue for many local authorities. It has been asserted that where action is taken to try and reduce pollutant concentrations it is very difficult to demonstrate an impact on ambient concentrations and therefore to assess what interventions are most effective (ref). Identifying methods that can assist in designing monitoring of interventions or demonstrating an effect has direct policy relevance, particularly for local authorities. Dr Sarah Moller works closely with Defra as part of her NERC Knowledge Exchange Fellowship and so work from this project would be discussed with and informed by members of the Air Quality Data and Analysis Team in Defra. If analysis techniques can be shown to be effective at demonstrating the impact (or lack of) of events and interventions on air quality this research could have applications well beyond the UK.

The research should provide high quality research papers.


The student will work under the supervision of Dr David Carslaw and Dr Sarah Moller within the Wolfson Atmospheric Chemistry Laboratories (WACL) and Dr Andrea Wiencierz in the Department of Mathematics.

This project provides specialist scientific training in the analysis of in-situ measurements of atmospheric trace gases and statistical modelling. The student would also develop skills and techniques to allow interpretation of air quality data and delivery of clearly expressed policy-relevant information to policymakers.

The successful PhD student will have access to a broad range of training workshops put on by the University of York. The studentship is offered as part of the SPHERES Doctoral Training Program which will provide additional training. Through the Department of Chemistry, University of York and SPHERES training there are a wide range of activities including courses aimed at specific scientific objectives, improving your transferrable skills, completing your PhD and putting your work into a wider scientific context. Dr Moller is part of the National Centre for Atmospheric Science (NCAS), and thus the student will have access to the wider resources that NCAS provides. You will also have access to training provided by NCAS such as the Arran Atmospheric Measurement Summer School and Introduction to Atmospheric Science course.

Student profile

The student should have a strong background in a quantitative science (e.g. mathematics, statistics, physics). The student should have knowledge of and a strong interest in statistical modelling techniques and machine learning. Experience of programming and scientific computing is essential. 


  1. Royal College of Physicians. Every breath we take: the lifelong impact of air pollution. Report of a working party. London: RCP, 2016.

  2. Holman, C, Harrison, R & Querol, X 2015, 'Review of the efficacy of low emission zones to improve urban air quality in European cities' Atmospheric Environment, vol 111, pp. 161-169., 10.1016/j.atmosenv.2015.04.009

  3. Air Quality Consultants and Aether 2013, Review of Effectiveness of Local Authority Action Plans and Future Policy Options for LAQM. Report produced for Defra.

  4. Centre for Environmental Data Archival (CEDA) Annual Report 2015. Available at


Related undergraduate subjects:

  • Chemistry
  • Mathematics
  • Physics
  • Statistics