Today, I would like to give you my two cents on uncertainty quantification of environmental models, meaning the assessment of various model uncertainties. This is more or less an attempt to summarize what I’ve learned in the past 10 years, or so.

I would like to walk you through a didactical example on a really, really high level and then draw some conclusions regarding the role of uncertainties in environmental decision support. To make things even simpler for me, I refer to two of my favorite papers as often as I can (no, not my own papers, don’t worry):

Kenneth H. Reckhow, 1994, Importance of scientific uncertainty in decision making, Environmental Management, Volume 18, Issue 2, pp 161–166 [PDF ]

Peter Reichert (2012) Conceptual and Practical Aspects of Quantifying Uncertainty in Environmental Modelling and Decision Support, International Environmental Modelling and Software Society (iEMSs), 2012 International Congress on Environmental Modelling and Software, Sixth Biennial Meeting, Leipzig, Germany [PDF ]

Off we go, fasten your seatbelts.

#### How to best operate a complex system?

As an example adapted from Reckhow (1994), imagine your task is to optimally operate a wastewater treatment plant (WWTP), which produces “pollution” (Figure 2). Figure 1 shows how the cost are related with the “pollution” level: lower pollution levels requires a more costly treatment, but if you exceed the permissible pollution levels ‘2’ you have to pay a (large) fine.

Let’s assume two scenarios: In A (red line) treatment is comparably cheap, but the WWTP is shut down if the pollution exceeds ‘2’, which is represented as infinite cost (vertical segment of red line). In B, treatment (below ‘2’) is comparably more expensive, but still cheaper than getting fined (above ‘2’). The question is: How would you operate the WWTP in each scenario if our only goal is to minimize the costs?

If know the pollution level without uncertainty (blue arrow), obviously you would operate the WWTP exactly at pollution level ‘2’.

#### Uncertainty comes at a price

Unfortunately, in the real world, your observed pollution from the WWTP is not exact (Figure 2), but contains uncertainty from incomplete knowledge, for example from measurement or sampling errors, as well as natural variability, for example due to different weather conditions.

Therefore, your decision regarding operational cost depends on two different things: 1) how you value the pollution, the so-called decision “attribute”, as shown in Figure 1, and 2) the uncertainty of your predicted performance regarding this attribute. These two are fundamentally independent. As a result, for the so-called decision “alternative” A, you operate the WWTP more conservatively and invest more into reducing pollution, because you want to avoid the brutal shut-down when you exceed ‘2’ pollution levels. For alternative B, you would accept exactly that probability of fines which minimizes overall cost.

For me, Ken Reckhow’s example nicely illustrates four major points:

**Having an accurate estimate of your prediction uncertainty is relevant for practice.** It is not only scientifically interesting, because the spread of the bell curve in Figure 2, i.e. the scientific uncertainty ultimately determines the cost of operation.
**Without an estimate of prediction uncertainty, we are not able to optimally operate our systems.** This is simply, because it is difficult to justify whether we are investing too much or too little.
**Reducing the scientific uncertainty is valuable.** This can be seen in Figure 2, where narrowing the curves will avoid excessive cost due to both over-treatment (left branch) and fines (right branch).
**It is important to separate the prediction of pollution from the quantification of your preferences for different pollution levels.** As a scientist and environmental engineer, my task is rather to come up with reliable prediction for pollution levels for the different alternatives A and B. The valuation concerns constructing the cost function and, with multiple decision attributes, quantifying the relative preferences for each attribute. Although practical engineers do this implicitly for their clients all the time, this is not our traditional core discipline and could probably be performed better by stakeholders and environmental economists.

So far, so good. But, as we are happy to delegate the valuation to others, for us engineers the grand challenge still remains:

#### How can we construct reliable prediction intervals of environmental models?

To answer this, I’d like to summarize the main suggestions of Reichert (2012), beefed-up with my personal ideas, in the following statements:

**Scientific knowledge is currently best formulated as inter-subjective probabilities, using the mathematical framework of probability theory.** Counter-arguments mainly concern the fact that humans are uncertain about their own beliefs and different people may quantify their beliefs differently. Unfortunately, alternative theories such as fuzzy logic, possibility theory, interval analysis, etc. also do not have a solution for this problem but they lack a similarly good mathematical foundation for being an ideal representation of uncertainty. While not perfect, we think that probability theory is currently the best game in town.
**More “informal” approaches to express uncertainty should be avoided.** These informal approaches, some of which have been developed by hydrologists, are either based on arbitrary modifications of the likelihood function or on the construction of uncertainty intervals based on the coverage frequency of observations, such as runoff. Several formal arguments have been made why those should be avoided. The one which speaks loudest to me is the fact that those approaches have not been adopted by disciplines where learning from data is key: statistics, physics, mathematics, information theory, medical sciences or machine learning.
**We need (computer) models to predict the environmental outcome for different alternatives.** Environmental systems are so complex, that we usually cannot guess “what-will-happen-if”. Models often give much better predictions than guessing as they are built on mathematical descriptions of the different physical phenomena. Nevertheless, by definition, any model is still a very simplified description of reality. The challenge is to find a compromise between a too simple model, which delivers uncertain predictions because it does not include all important mechanisms, and a too complex one, which has many parameters which are not known exactly and thus cause large prediction uncertainties.
**We should consider error-generating processes realistically and where they occur.** With purely deterministic models, without random influences, we cannot construct prediction intervals when single best estimates are used as model parameters. Commonly, randomness has often only been considered as additive i.i.d. (independent and identically distributed.) Gaussian errors. This can be interpreted such that the error-generating processes only concern observation noise, for example in flow data. Unfortunately, our experience shows that this is not realistic.

One step further is to capture the analyst’s incomplete knowledge on credible model parameter values as probability distributions, which leads to Bayesian inference. A logical next step is to also formulate model structure deficits explicitly as additive stochastic processes, representing the “model bias”. While it does not provide much information on the origin of uncertainties, this at least is a practically feasible “crutch” to deal with structurally deficient models and other “known unknowns”. More desirable is probably a so-called “total uncertainty analysis”, which takes into account uncertainty in i) the model inputs, ii) model structure deficits, iii) incomplete knowledge on the model parameters and iv) output observation errors. In this regard, as written above, we should start to think about considering the randomness where it is appropriate. Often, this will lead us to using stochastic models, possibly using time-dependent parameters and, in rainfall-runoff modelling, stochastic components for the “true” catchment-average rainfall. For me as an engineer, the concept of time-dependent parameters is even compelling, because it might very well be that many of “our” parameters depend on external drivers. Examples could be the imperviousness, the stock of particulate matter or the groundwater infiltration into sewers, which depend on time (of the year) or soil moisture or air temperature or other external drivers. Introducing this flexibility would make it possible to actually learn from data on those dependencies, which ultimately leads to better understand our systems. In a similar fashion, it is rather crude to assume that rainfall data, often observed by point rain gauges, are error free and representative of a catchment of several hectares or more. Multipliers, or even stochastic error processes, might lead to more reliable predictions and avoid over-confidence. In summary, if the variability in your model predictions is caused by rain gauge errors, or by seasonal variability of your groundwater levels, you better have a good description for this. If not, you incorrectly map the variability to other parameters (from both your mechanistic model and error model), consequently infer the wrong things from your data and have to accept large prediction uncertainties. And, ultimately, build too expensive systems (point 1).
**Always consider model structure deficits.** This is, because we fundamentally need to simplify our models, as mentioned above. We can detect such mismatch by residual analysis, which often shows autocorrelation which cannot be explained by observation errors alone. To generate reliable prediction, it is important to consider this bias explicitly, for example using the statistical bias description technique (see Section 3.2 in Reichert, 2012).
**Employing a statistical bias description practically requires Bayesian inference.** Modeling the system response with a mechanistic model and an additive stochastic process for bias creates an identifiability problem between these two components, because the data can be modeled with either one of them. This identifiability problem can be resolved by adopting Bayesian inference, where we constrain credible parameter values by assigning a “prior” parameter distribution based on our knowledge or belief.
**Separate the description of the error generating processes from the numerical algorithm used to estimate parameter values.** Effective learning from the data, i.e. calibrating our models, or inferring parameters, requires a good description of mechanism and error-generating processes and efficient numerical techniques. The first concerns the mathematical formulation of the so-called likelihood function, which describes how likely it is to observe the measured data given the model hypothesis and a set of parameters (see Reichert, 2012), and the second is the algorithmic method how to find likely parameter values for the likelihood objective function. Mixing both leads to confusion, for example only reporting “Bayesian data analysis” as a calibration method, “Ensemble Kalman Filter”, or “Markov-Chain Monte Carlo” does not permit to evaluate whether the assumptions on the error-generating processes are reasonable or not.
**When your model is slow, adaptive samplers and emulators can help.** With Bayesian parameter estimation, the goal is to estimate the posterior given the prior (which encodes our prior knowledge) and the assumed likelihood function (which “contains” the model and the data”). Numerically, this is often done by Markov Chain Monte Carlo (MCMC) techniques because it gives complete freedom in specifying prior distributions. MCMC methods can be challenging when the posterior has many dimensions and is highly correlated, and when the model is slow. Scale-invariant or adaptive samplers can increase the efficiency and emulators of either the mechanistic model or the posterior can speed-up the analysis.
**When your likelihood function is analytically intractable, Approximate Bayesian computation can help.** With complicated error-generating processes, i.e. stochastic models, it might not be possible to directly sample from the posterior, because it takes too much time to evaluate the likelihood function (e.g. it may include high-dimensional integrals). Approximate Bayesian Computation, so-called “ABC”, can then be a strategy to get an approximate sample from the posterior. In contrast to several reports, ABC methods are not “likelihood-free”, because they still need an explicit formulation of the error-generating processes.
**We need more research on good likelihood functions.** Last, but not least, it has been reported that (rainfall-runoff) models which have been calibrated manually with due care, have more reliable predictions than those models which have been calibrated by Bayesian inference with a statistically sound likelihood function that accounts for model bias… This is not a methodological problem, but rather reflects the fact that the engineer has different preferences, which she cannot formulate mathematically. A simple example would be that in manual calibration she instinctively values peak flows higher than base flows. In parameter estimation, she did not explicitly assign different uncertainties to the different regimes.

[Deep silence.]

So.

via GIPHY

If this was too fast, or too rough, I encourage you to take a plunge into the two short papers. You can also shoot me an email if you have specific questions. If you want to get hands-on training, the SIAM Summer School in Environmental Systems Analysis is the place to be.

#### Don’t hide (your) errors

In summary, I think we as scientists can support (cost-)effective decisions by always reporting an uncertainty estimate with our model predictions. And we should strive to be as transparent as possible about the error-generating processes. We should also find mathematicians who listen to our needs and help us to develop appropriate likelihood functions and numerical techniques. Unfortunately, there is no easy way out and we should also be very skeptical about informal approaches who are not adopted by disciplines which traditionally consider uncertainty in their data analysis. Finally, how to best use uncertain predictions in decision support (Figure 1), is the concern of decision theory, an entire field of research of its own, which has probably yet to be fully adapted for integrated catchment studies. At the end of the day, the sensitive influence factors in the real-world decision problem may not be those of our likelihood functions or numerical algorithms, but how we frame the decision problems and construct useful objective hierarchies. In a similar fashion, trying to deal with staged decisions or even “unknown unknowns” requires very different methods.

Happy holidays!

Jörg Rieckermann

joerg.rieckermann@eawag.ch