There is nothing as practical as a good assessment of uncertainty

Today, I would like to give you my two cents on uncertainty quantification of environmental models, meaning the assessment of various model uncertainties. This is more or less an attempt to summarize what I’ve learned in the past 10 years, or so.

I would like to walk you through a didactical example on a really, really high level and then draw some conclusions regarding the role of uncertainties in environmental decision support. To make things even simpler for me, I refer to two of my favorite papers as often as I can (no, not my own papers, don’t worry):

Kenneth H. Reckhow, 1994, Importance of scientific uncertainty in decision making, Environmental Management, Volume 18, Issue 2, pp 161–166 [PDF ]

Peter Reichert (2012) Conceptual and Practical Aspects of Quantifying Uncertainty in Environmental Modelling and Decision Support, International Environmental Modelling and Software Society (iEMSs), 2012 International Congress on Environmental Modelling and Software, Sixth Biennial Meeting, Leipzig, Germany [PDF ]

Off we go, fasten your seatbelts.

How to best operate a complex system?

As an example adapted from Reckhow (1994), imagine your task is to optimally operate a wastewater treatment plant (WWTP), which produces “pollution” (Figure 2). Figure 1 shows how the cost are related with the “pollution” level: lower pollution levels requires a more costly treatment, but if you exceed the permissible pollution levels ‘2’ you have to pay a (large) fine.


Let’s assume two scenarios: In A (red line) treatment is comparably cheap, but the WWTP is shut down if the pollution exceeds ‘2’, which is represented as infinite cost (vertical segment of red line). In B, treatment (below ‘2’) is comparably more expensive, but still cheaper than getting fined (above ‘2’). The question is: How would you operate the WWTP in each scenario if our only goal is to minimize the costs?

If know the pollution level without uncertainty (blue arrow), obviously you would operate the WWTP exactly at pollution level ‘2’.

Uncertainty comes at a price

Unfortunately, in the real world, your observed pollution from the WWTP is not exact (Figure 2), but contains uncertainty from incomplete knowledge, for example from measurement or sampling errors, as well as natural variability, for example due to different weather conditions.

Therefore, your decision regarding operational cost depends on two different things: 1) how you value the pollution, the so-called decision “attribute”, as shown in Figure 1, and 2) the uncertainty of your predicted performance regarding this attribute. These two are fundamentally independent. As a result, for the so-called decision “alternative” A, you operate the WWTP more conservatively and invest more into reducing pollution, because you want to avoid the brutal shut-down when you exceed ‘2’ pollution levels. For alternative B, you would accept exactly that probability of fines which minimizes overall cost.

For me, Ken Reckhow’s example nicely illustrates four major points:

  1. Having an accurate estimate of your prediction uncertainty is relevant for practice. It is not only scientifically interesting, because the spread of the bell curve in Figure 2, i.e. the scientific uncertainty ultimately determines the cost of operation.
  2. Without an estimate of prediction uncertainty, we are not able to optimally operate our systems. This is simply, because it is difficult to justify whether we are investing too much or too little.
  3. Reducing the scientific uncertainty is valuable. This can be seen in Figure 2, where narrowing the curves will avoid excessive cost due to both over-treatment (left branch) and fines (right branch).
  4. It is important to separate the prediction of pollution from the quantification of your preferences for different pollution levels. As a scientist and environmental engineer, my task is rather to come up with reliable prediction for pollution levels for the different alternatives A and B. The valuation concerns constructing the cost function and, with multiple decision attributes, quantifying the relative preferences for each attribute. Although practical engineers do this implicitly for their clients all the time, this is not our traditional core discipline and could probably be performed better by stakeholders and environmental economists.

So far, so good. But, as we are happy to delegate the valuation to others, for us engineers the grand challenge still remains:

How can we construct reliable prediction intervals of environmental models?

To answer this, I’d like to summarize the main suggestions of Reichert (2012), beefed-up with my personal ideas, in the following statements:

  1. Scientific knowledge is currently best formulated as inter-subjective probabilities, using the mathematical framework of probability theory. Counter-arguments mainly concern the fact that humans are uncertain about their own beliefs and different people may quantify their beliefs differently. Unfortunately, alternative theories such as fuzzy logic, possibility theory, interval analysis, etc. also do not have a solution for this problem but they lack a similarly good mathematical foundation for being an ideal representation of uncertainty. While not perfect, we think that probability theory is currently the best game in town.
  2. More “informal” approaches to express uncertainty should be avoided. These informal approaches, some of which have been developed by hydrologists, are either based on arbitrary modifications of the likelihood function or on the construction of uncertainty intervals based on the coverage frequency of observations, such as runoff. Several formal arguments have been made why those should be avoided. The one which speaks loudest to me is the fact that those approaches have not been adopted by disciplines where learning from data is key: statistics, physics, mathematics, information theory, medical sciences or machine learning.
  3. We need (computer) models to predict the environmental outcome for different alternatives. Environmental systems are so complex, that we usually cannot guess “what-will-happen-if”. Models often give much better predictions than guessing as they are built on mathematical descriptions of the different physical phenomena. Nevertheless, by definition, any model is still a very simplified description of reality. The challenge is to find a compromise between a too simple model, which delivers uncertain predictions because it does not include all important mechanisms, and a too complex one, which has many parameters which are not known exactly and thus cause large prediction uncertainties.
  4. We should consider error-generating processes realistically and where they occur. With purely deterministic models, without random influences, we cannot construct prediction intervals when single best estimates are used as model parameters. Commonly, randomness has often only been considered as additive i.i.d. (independent and identically distributed.) Gaussian errors. This can be interpreted such that the error-generating processes only concern observation noise, for example in flow data. Unfortunately, our experience shows that this is not realistic.
    One step further is to capture the analyst’s incomplete knowledge on credible model parameter values as probability distributions, which leads to Bayesian inference. A logical next step is to also formulate model structure deficits explicitly as additive stochastic processes, representing the “model bias”. While it does not provide much information on the origin of uncertainties, this at least is a practically feasible “crutch” to deal with structurally deficient models and other “known unknowns”. More desirable is probably a so-called “total uncertainty analysis”, which takes into account uncertainty in i) the model inputs, ii) model structure deficits, iii) incomplete knowledge on the model parameters and iv) output observation errors. In this regard, as written above, we should start to think about considering the randomness where it is appropriate. Often, this will lead us to using stochastic models, possibly using time-dependent parameters and, in rainfall-runoff modelling, stochastic components for the “true” catchment-average rainfall. For me as an engineer, the concept of time-dependent parameters is even compelling, because it might very well be that many of “our” parameters depend on external drivers. Examples could be the imperviousness, the stock of particulate matter or the groundwater infiltration into sewers, which depend on time (of the year) or soil moisture or air temperature or other external drivers. Introducing this flexibility would make it possible to actually learn from data on those dependencies, which ultimately leads to better understand our systems. In a similar fashion, it is rather crude to assume that rainfall data, often observed by point rain gauges, are error free and representative of a catchment of several hectares or more. Multipliers, or even stochastic error processes, might lead to more reliable predictions and avoid over-confidence. In summary, if the variability in your model predictions is caused by rain gauge errors, or by seasonal variability of your groundwater levels, you better have a good description for this. If not, you incorrectly map the variability to other parameters (from both your mechanistic model and error model), consequently infer the wrong things from your data and have to accept large prediction uncertainties. And, ultimately, build too expensive systems (point 1).
  5. Always consider model structure deficits. This is, because we fundamentally need to simplify our models, as mentioned above. We can detect such mismatch by residual analysis, which often shows autocorrelation which cannot be explained by observation errors alone. To generate reliable prediction, it is important to consider this bias explicitly, for example using the statistical bias description technique (see Section 3.2 in Reichert, 2012).
  6. Employing a statistical bias description practically requires Bayesian inference. Modeling the system response with a mechanistic model and an additive stochastic process for bias creates an identifiability problem between these two components, because the data can be modeled with either one of them. This identifiability problem can be resolved by adopting Bayesian inference, where we constrain credible parameter values by assigning a “prior” parameter distribution based on our knowledge or belief.
  7. Separate the description of the error generating processes from the numerical algorithm used to estimate parameter values. Effective learning from the data, i.e. calibrating our models, or inferring parameters, requires a good description of mechanism and error-generating processes and efficient numerical techniques. The first concerns the mathematical formulation of the so-called likelihood function, which describes how likely it is to observe the measured data given the model hypothesis and a set of parameters (see Reichert, 2012), and the second is the algorithmic method how to find likely parameter values for the likelihood objective function. Mixing both leads to confusion, for example only reporting “Bayesian data analysis” as a calibration method, “Ensemble Kalman Filter”, or “Markov-Chain Monte Carlo” does not permit to evaluate whether the assumptions on the error-generating processes are reasonable or not.
  8. When your model is slow, adaptive samplers and emulators can help. With Bayesian parameter estimation, the goal is to estimate the posterior given the prior (which encodes our prior knowledge) and the assumed likelihood function (which “contains” the model and the data”). Numerically, this is often done by Markov Chain Monte Carlo (MCMC) techniques because it gives complete freedom in specifying prior distributions. MCMC methods can be challenging when the posterior has many dimensions and is highly correlated, and when the model is slow. Scale-invariant or adaptive samplers can increase the efficiency and emulators of either the mechanistic model or the posterior can speed-up the analysis.
  9. When your likelihood function is analytically intractable, Approximate Bayesian computation can help. With complicated error-generating processes, i.e. stochastic models, it might not be possible to directly sample from the posterior, because it takes too much time to evaluate the likelihood function (e.g. it may include high-dimensional integrals). Approximate Bayesian Computation, so-called “ABC”, can then be a strategy to get an approximate sample from the posterior. In contrast to several reports, ABC methods are not “likelihood-free”, because they still need an explicit formulation of the error-generating processes.
  10. We need more research on good likelihood functions. Last, but not least, it has been reported that (rainfall-runoff) models which have been calibrated manually with due care, have more reliable predictions than those models which have been calibrated by Bayesian inference with a statistically sound likelihood function that accounts for model bias… This is not a methodological problem, but rather reflects the fact that the engineer has different preferences, which she cannot formulate mathematically. A simple example would be that in manual calibration she instinctively values peak flows higher than base flows. In parameter estimation, she did not explicitly assign different uncertainties to the different regimes.

[Deep silence.]



If this was too fast, or too rough, I encourage you to take a plunge into the two short papers. You can also shoot me an email if you have specific questions. If you want to get hands-on training, the SIAM Summer School in Environmental Systems Analysis is the place to be.

Don’t hide (your) errors

In summary, I think we as scientists can support (cost-)effective decisions by always reporting an uncertainty estimate with our model predictions. And we should strive to be as transparent as possible about the error-generating processes. We should also find mathematicians who listen to our needs and help us to develop appropriate likelihood functions and numerical techniques. Unfortunately, there is no easy way out and we should also be very skeptical about informal approaches who are not adopted by disciplines which traditionally consider uncertainty in their data analysis. Finally, how to best use uncertain predictions in decision support (Figure 1), is the concern of decision theory, an entire field of research of its own, which has probably yet to be fully adapted for integrated catchment studies. At the end of the day, the sensitive influence factors in the real-world decision problem may not be those of our likelihood functions or numerical algorithms, but how we frame the decision problems and construct useful objective hierarchies. In a similar fashion, trying to deal with staged decisions or even “unknown unknowns” requires very different methods.

Happy holidays!

Jörg Rieckermann

There is nothing as practical as a good assessment of uncertainty

How extreme was Storm Angus?

On 21st November 2016 heavy rainfall battered the UK. This was storm Angus, which brought severe flooding in different parts of the UK, but in particular in the South West of England (SWE). According to the newspapers, nearly 70 flood warnings were in place, a river burst in Devon, heavy rain flooded the railway between Exeter and Bristol, several locations were flooded in and around Bristol (Backwell, Whitchurch, Fishponds), cars were reported under water in South Bristol, and flooding was reported in the Chew Valley. So, how extreme was this rainfall event and why did it cause so much damage in the SWE?
Figure 1a shows the average November rainfall for the last 25 years [1]. Figure 1b shows the daily rainfall amounts recorded by the Met Office raingauge network on 21st Nov 2016 [2]; the data have been interpolated to show the spatial distribution of precipitation across the UK. The actual rainfall depths recorded by the raingauges are also shown in the same figure. It is interesting to see that the rainfall depths recorded around Bristol (42.4 mm) and other places in the SWE on 21st Nov 2016 were equivalent to more than 50% the expected average rainfall in November. A rainfall sensor installed at the University of Bristol recorded 47.8 mm of rain during the same period. This rain sensor is a disdrometer, which is able to measure the raindrop size distribution that can be used to derive rainfall intensities with 1-minute temporal resolution. The largest rainfall depth was recorded in Devon with 63.2 mm of rain on the same day. Figure 1c shows the daily rainfall depths recorded by the Met Office weather radar network at 1km resolution on the same day [3]. There are similar patterns in the distribution of precipitation by radar rainfall and raingauge measurements, but radar clearly shows regions with very heavy localised rainfall (not observed by the raingauge network) that caused flash flooding in different places across the SWE.

Figure 1. Average November rainfall (a); Raingauge rainfall (b) and Radar rainfall (c)

Figure 2 shows the depth-duration-frequency (DDF) rainfall curves (colour solid lines) for storms with different return periods (T) for south Bristol using the DDF model proposed in the Flood Estimation Handbook [4].  The figure also shows an estimation of the storm return period for south Bristol using raingauge, disdrometer and radar rainfall for different storm durations. Note that the storm started in the evening of 20th Nov and finished in the afternoon of 22nd Nov and this is why some durations are longer than 24 hr. There are two important points to mention about Figure 2. The first one is related to the fact that the return period of the storm is sensitive to the rainfall measurement used in the analysis, with maximum return periods of 3, 5 and 5 years when using raingauge, disdrometer and radar observations respectively. It is fair to say that all the measurements were not taken exactly at the same location and that all the rainfall sensors have different error characteristics, but the results highlight the uncertainty in the estimation of the storm return period when there is such a high spatial variability of precipitation.  The second point is related to the fact that the storm return period changes depending upon the selected duration of the storm. For instance, with the disdrometer measurements, a duration of about 10h gives a storm with a return period of less than 2 years, but a duration of about 20h gives a 5-year return period storm.  This result highlights the uncertainty in the estimation of the storm return period when using a given storm duration.  Sewer urban systems are often designed to cope with storms with return periods of 1-2 years, and in many cases return periods of 5 years are adopted in areas vulnerable to flooding; more recent guidelines suggest the design of systems for storms with return periods of up to 30 years in order to prevent surface flooding [5]. The results from radar and disdrometer measurements indicate that the storm had a return period of 5 years for South Bristol. However, given the large spatial variability of the storm and the severe localised flooding that occurred in some locations in Bristol, it is very likely that the return period of this storm was actually higher. It is also fair to say that the catchment initial conditions might have increased the risk of surface flooding, with factors such as catchment wetness and drainage blockages (e.g. due to autumn dead leaves) having important effects.

Figure 2. Rainfall Depth-Duration-Frequency curves

We live in a world with environmental uncertainty and the EU QUICS project will tackle some of the important issues related to the quantification of uncertainty in integrated catchment studies.


[1] Keller, et al. (2015) CEH-GEAR: 1 km resolution daily and monthly areal rainfall estimates for the UK for hydrological and other applications, Earth Syst. Sci. Data, 7, 143-155,
[2] Met Office (2006): MIDAS UK Hourly Rainfall Data. NCAS British Atmospheric Data Centre, 2016.
[3] Met Office (2003): 1 km Resolution UK Composite Rainfall Data from the Met Office Nimrod System. NCAS BADC, 2016.
[4] Faulkner D (1999) Flood Estimation Handbook, Vol 2, Institute of Hydrology, ISBN 0948540907.
[5] Butler D and Davies JW (2011) Urban Drainage, Spon Press.

Miguel Angel Rico-Ramirez, University of Bristol.

How extreme was Storm Angus?

My lab experiments in Coimbra or: How I became healthier

‘Ohh..goodluck, but I would never think about doing lab experiments during a secondment!’

This was one of my colleague’s  “motivational” response when I told him about my plans for my secondment in University of Coimbra. Well, he was being pessimistic, but nevertheless he was not completely wrong. Completing lab experiments during a secondment can be tricky. Mainly because of limited time and unfamiliar lab environment, especially if you are new to the place and cannot speak a single word in the local language, which was exactly my situation.  To make it even harder, the one and only lab technician in the water lab can only speak Portuguese. But thanks to Rita, I already had most of the things prepared for my experiments even before I came to Coimbra, so my feelings in the first few days of my secondment here were mixed with excitement and nervousness.

Now it has been almost two and a half months. I am behind my original schedule, but now I am more familiar with all the lab facilities and the experimental setup, so the progress is faster now. I am hoping to finish all the remaining experiments in less- than- a-month’s time.

I have not mentioned about the aim of my lab experiments yet. I am investigating the sediment wash off from an impervious road surface using artificial rainfall. The main aim is to study the effect of (a) rainfall intensity, (b) slope of the surface, and (c) initial load on sediment wash off from the road surface.

Here are some pictures from my lab experiments

Me and my lab (batman) costume to not get wet in the artificial rain. P.S: My office mates call my experimental space Gotham city and Nazmul, who helps me a lot with my experiments, Robin!
Placing sand uniformly over the experimental surface of 1 sqm
Experimental setup during a test

Finally, some results. But do not be bored, it has nothing to do with sediment wash off. Thanks to the health application in my mobile, I found a good relationship between two parameters even before I start to analyse my experimental data.  Number of steps I walk per day has a good correlation with the number of experiments I carry out per day, so I finish with a health tip: Do lab experiments and stay healthy! 😉


My lab experiments in Coimbra or: How I became healthier