Gather knowledge, fix a problem

Since I spend a lot of time in the train, there are people I happen to meet thanks to the accident of always sitting next to somebody.

Several weeks ago, a guy started the conversation with me because he saw the label on my computer with the name of my Institute… Turns out, he used some data gathered by somebody from Eawag, a few years ago. He is a psychologist, and the analysis he performed was concerning the territory of Zimbabwe. People that live there were introduced with, what might seem to be a simple method, to clean the water. All they had to do was get special bottles, keep the water in them for 6 hours on the sunshine and it would become drinkable. My first thought was that so many scientists must have struggled to make this method possible, probably a lot of analysis performed concerning the material and thickness of the bottles. They had to make it affordable but efficient. However, the citizens were very skeptical concerning the usage and effectiveness of these bottles. Perhaps they didn’t trust the person that was introducing this method to them. Perhaps there was some other reason making this method not so simple, that wasn’t taken into account. What the psychologists research showed is that not even 30% of the people adopted this solution…

This inspired me to think and to be aware of the bigger picture of every research. Every problem that needs to be fixed needs knowledge to be gathered wider than only within the scientific community. It needs engineers, scientists and sometimes even psychologists working together.

From the point of my background, I have been an exception in QUICS. I came from a scientific community that studies atmospheres of stars, interstellar matter, neutron stars… where all data is received almost exclusively from spectral lines. I thought that thanks to that, I will have a different perspective on uncertainty problems, but I realized that my different perspective comes from something deeper. It comes from my desire to achieve more general conclusions while conducting research, and my wish that the knowledge of the most intelligent and most educated people becomes implemented in real world systems, not only in theoretic scientific papers.

During my enrollment in QUICS I made an appraisal of two computational tools that are commonly used in two different communities and are built for exactly the same purpose. Very few or none of the researchers from one of those communities have heard of the method from the other community. Even though they could have benefited from it.

I still haven’t accomplished my aim that the research I do is *fixing the world’s problems*, but this is for sure a step towards that goal. Unfortunately, my enrollment in QUICS was shorter than for most other fellows, and after only a year I am summarizing my impressions and leaving QUICS.

I must say it has been such an experience that left a trace in my professional views as well as personal. I got a strong *introduction* to real science, discovered so many things and met exceptional people. I believe that the future research of the amazing group of people chosen for this project, as well as my future research, will contribute to solving some of the global issues and make the uncertainty smaller.

Truly grateful to have been a part of this project,

Sanda Dejanic.

IMG_3837
Sanda in India for a QUICS outreach event
Gather knowledge, fix a problem

Uncertainty & its implications

While I was bidding for a product I so wanted to buy on eBay, I had several calculations going in my mind. When you bid for a product there is a certain minimum increment you need to add to your bid in order to outbid the current highest bidder. However, you could have your very own personal upper cap on the product price after which you should (need to) let it go away. And, then there is a time factor, the sooner the bid is going to end, more is the competition (usually). These three parameters and their combinations and poor me! Unfortunately, I managed to lose the bid even if I could have paid more than the winning bid but somehow I lost the product and I couldn’t figure out why. I couldn’t even blame the models. Why? Because I did not use any! Or did I? Of course I did use a model; I just cannot give an expression for it. Apart from the mathematical nature of bidding system, there is my own behavioural influence which played its part given that I could manage to do correct calculations.

There could have been so many combinations of those right tweaks in my own mental eBay bidding model. But hey, my PhD is about the role of modelling uncertainty in decision-making and here I definitely expressed the uncertainty in my model parameters but still I lost. Was my model wrong (whatever it was ;))? Or the tweaks I gave it? Well, I would never know until my next bidding but this forced me to think about the other side of the uncertainty based decision-making process.

During my on-going PhD, I got opportunities to interact with people working in the wastewater management industry through the QUICS training network, conferences etc. where I could get some insights from these practitioners on the possible implications of my research. My research talks about the role of modelling uncertainty in the decision-making process involving water quality failure and at the face of it, it does look very exciting. Because when we talk about uncertainty based decision making, we tend to focus on the benign nature of uncertainty quantification of models and the fruit it should bear when we apply it.

These comments might have a slight pessimistic flavour and seem ironic too, given my PhD heavily relies on demonstrating the benign aspects of the aforementioned process. But just like any tool, application of uncertainty in decision-making can have some serious downsides too if it’s not done ‘properly’.

“How do we define what is the ‘proper’ way?” And, more importantly “who defines this ‘proper’ way?”

There are three major players who can influence the impact of sewer system on the rivers, lakes or ponds: the government, water utility companies and academia. Academia finds better ways (read models) to explain the physical behaviour of the system. Water utility companies (should) apply the state of the art to make decisions or action plans. Government directly/indirectly facilitates the exchange between academia and the water utilities. In addition, water quality regulations are put into place by the government such as the Environment Agency (EA) in the UK, to ensure that the water utilities within its jurisdiction adhere to their commitments to preserve the sanctity of the water bodies.

In all these aforementioned activities or transactions, models serve as currency. Similar to any currency, there have been initiatives to standardise this currency as well. For example, the Urban Drainage Group (UDG, formerly WaPUG) from the Chartered Institution of Water and Environmental Management (CIWEM) in the UK issues guidelines and codes of practices for hydraulic and quality modelling of sewer systems. In theory, this ensures that a particular modelling standard is adopted across different water utility companies. However, the extent to which these codes of practices are applied across the industry is a matter of further discussion but the underlying message is that there is an existing effort to promote uniform modelling practice on which basis the environmental regulation authorities can further build up a standardised regulation framework on the sewer system emissions.

Unless we live in a utopian world, these models do have inherent uncertainty in their representation. Research Projects like QUICS ITN strive towards finding better ways to quantify the uncertainty in these models and promote enthusiasm for increasing practical applications. There are multiple factors in an uncertainty quantification process affecting the outcome of the analysis such as the choice of uncertain model components, their uncertain range, and the types of probability distributions used to represent the uncertainty, and the time horizon.

Various recommendations from academia are available in the form of research articles to guide modellers to control these factors in the uncertainty analysis and use the results to make decisions. Different recommendations or preferences for these multiple factors might result into very different decision outcomes or solutions as the optimal solutions and given the individual choices or guidelines across the industry, all these optimal solutions can be justified unless there is provision to test the performance of these optimal solutions.

In a sense, the use of uncertainty analysis results in making decisions can only be regulated on ad hoc basis unless there is an effort to standardise it like the codes of practices. From my little experience, I could gather that the application of uncertainty analyses is still in early stage across the wastewater management industry. Perhaps this is the right time and a huge step forward if the environmental regulatory authorities with the help of academia and the practitioners could come up with a set of guidelines on uncertainty quantification of hydrodynamic and water quality models which can serve as a standard for communication between these three stakeholders.

Ambuj Sriwastava
ESR, University of Sheffield

 

Uncertainty & its implications

Marie Curie PhD, advantages and challenges…

“Is this a PhD or a kind of tour in Europe?”

“Do you have time to do research as well? or you only travel and teach at schools and kindergartens?”

“Ah, you guys and your luxury PhD!”

….

These are typical comments and questions that we, as Marie Curie (MC) fellows often hear from friends and colleagues. So, I thought that might be relevant to write about advantages and challenges of this experience. This can give an overall idea about the situation for students who are interested in this fellowship and want to know more.

me

Being a Marie Curie fellow in an ITN network, has numerous advantages as well as some challenges. I will try to list some of them briefly according to “my personal experience” in a sincere and honest way.

Advantages:

Among many advantages that MC fellowship has, I can mention:

  1. Reputation and Being Prestigious

A Marie Curie fellowship is one of the most prestigious fellowships in Europe and perhaps one of the best in the world. The majority of academic people know about it and it can be considered as a valuable asset in the future if you want to stay in academia, or even if you want to start working outside the academic world. (No need to mention that it is highly competitive to get selected).

  1. International Environment

The project has various partners (universities, institutes, companies, etc.) all over Europe and even outside Europe. In case of the QUICS project, 9 partners and 7 associate partners which are located in 9 countries! This is truly a unique experience as a PhD student to be involved in a serious project in such an environment!

  1. Secondments

Each MC fellow has the requirement of following so called “secondments” in the location of other project partners. For instance, I have 9 months of secondments which should be followed at TU Delft (NL), University of Sheffield (UK), University of Laval (Canada) and RTC4Water (Luxembourg). Hence, there is a great possibility to exchange knowledge and learn more on your topic from other project partners. This mobility will definitely nurture your other life skills as well apart from academic life.

  1. Lovely Training Budget!

A generous budget is allocated to each fellow to spend on their training and research as well as transfer of knowledge. We, Marie Curie fellows, love it! It gives the fellow a great opportunity to attend lots of courses, summer schools, trainings, conferences, and so on. As far as I know, this is not comparable with any other PhD grant. This gives you a unique opportunity to develop your discipline-related skills as well as soft skills and also to expand your professional network!

  1. Networking

In an ITN project, it is all about networking and collaboration possibilities. You have the possibility to meet experts in your field during various project meetings, while attending conferences and training events, or when you go to do your secondments with project partners. You may also have multiple supervisors from different universities and institutes, which is in fact another advantage in this regard.

  1. Public Outreach Events

As a MC fellow, you are required to convey the general knowledge about your research to the non-academic audience as well. This normally includes some outreach events for public audience such as school students and pupils, technicians at companies and so on. Although it is really challenging to organize these activities in a tailor-made manner, they are really fun at the end! It is a skill to simplify your message to be easily understandable for public.

  1. Collaboration

I think collaboration is one of the main keys to be more successful in research. With collaboration you can expand your knowledge, learn from others, and think outside the box. In the QUICS project there is a great collaboration opportunity at individual as well as institutional levels. For instance, at the moment I am collaborating with two other QUICS fellows to write a conference paper and hopefully a journal paper in future.

  1. Soft Skills

PhD topics are normally very detailed and they are defined to solve specific and tiny problems in this complex world. You may be lucky to find another specific and similar research topic or a job title to continue your career after graduation; however, what would make you a more suitable candidate for a wider range of careers is your ‘soft skills’. For example: communication skills, teamwork and collaboration, adaptability, project and time management, critical thinking and so on. Personally, I do not assert that currently I am great in these skills, but I am sure that the Marie Curie fellowship is helping me a lot in this regard. Most importantly, we develop our soft skills via ‘learning by doing’. Besides, there are plenty of courses during our training events and also in our universities and institutes.

Challenges:

  1. Distraction!

During the first year of my PhD, averagely, I had almost one work-related travel each month. This is really distracting when it comes to research. Add to this all the travel planning and the bureaucratic procedures (especially if you are working at LIST :D). On one hand, they are good for your skills development and changing the monotonous working environment, but on the other hand they can easily distract you from the current step and you would forget totally what you were doing before!

  1. Project Management and Time Management

As a MC fellow, you would be connected to multiple locations and entities, each of which brings different responsibilities. For my case, they are:

  • QUICS as the main project; which requires different tasks e.g. public and academic dissemination, deliverables for work packages, following the secondments, attending training events and various presentations, and even writing on the QUICS blog! 🙂
  • LIST as my host institution; which has its own requirements including: minimum working hours, filling weekly clarity forms, attending group meetings and department meetings, travel requests and travel expense requests, dealing with HR or IT departments, extending the working contract and so forth.
  • TU Delft as my registered university has also different requirements including: separate registration process, 45 credits to be passed, writing another detailed research proposal for the Go/NoGo evaluation, yearly evaluations afterwards, minimum publications to be eligible for graduation and so on.

To be honest, sometimes, I realize I am spending a considerable part of my time or a whole day only to do bureaucratic tasks. Dealing with all of the above mentioned responsibilities requires proper project management and time management skills that the MC fellow needs to develop over time.

  1. Managing Secondments

First of all, you need to define what your objectives are and what is the “optimum time” to go for a secondment. Then you need to plan and organize it:

  • Find another accommodation which is normally very difficult for short stays.
  • Apply for visa (if you need to) and plan your trips.
  • Adapt to the new work environment.
  • Do in parallel the responsibilities for your host institute.
  • Write a secondment report after finishing.
  1. Multi-supervisionship

Having double, triple or even more supervision is another challenge. It is clear that having more than one supervisor is beneficial in terms of sharing the knowledge, experience and new ideas. But sometimes it can be a challenge too. For instance, receiving the feedback from all of them would take a considerable amount of time; sometimes, ideas can be contradictory; besides you need to keep in touch with all to avoid miscommunication.

  1. Uncertainty in Visa Applications!

I really “dislike” this part and almost everyone in QUICS project knows why…

Imagine if you have to wait for about 6-7 months to get a visa to start your PhD in Luxembourg. Although, this way you will understand very well the meaning of “uncertainty” 😀

I do not want to go into political discussions here, but just a hint to those nationalities who are treated more strictly for entry visas: ”Apply very well in advance”.

Based on my experience after living in several countries and spending “n” hours in the embassies, there is no rule about granting visas. The uncertainty bound is too wide.  Here are some more examples for your information:

  • I missed participation in UDM 2015 conference in Canada because I did not receive the visa on time. Although I applied for it more than one month in advance. The visa, was approved one week after the conference! 😦
  • Sometimes it can be surprising too. I received my UK visa after 6 days of applying for it. 🙂
  • If there is one single country that I can enter without visa, it is Turkey. Therefore, I was planning to do an academic outreach event at Middle East Technical University (METU) in Ankara. Everything was going on as it was planned. Until, some days before the travel I was informed that there was a terrorist attack in Ankara. So I had to cancel everything. And guess what, the bomb was exactly in front of the hotel that I had reserved! 😮 This is one of those moments that you are not sure if you were unlucky or lucky!

These were totally personal experiences, but I hope I have conveyed the main message.

Summary:

All in all, Marie Curie PhD is a unique one. Although there are some challenges on the way, but it will definitely help you to develop your skills as a researcher as well as a project manager. Go for it if you have got the chance! 😉

Marie Curie PhD, advantages and challenges…

The uncertainty and risks related to parental leave and a research career

After several very interesting and statistical QUICS blogs on uncertainty and Bayesian statistics etc., this is a blog about a different kind of risk and uncertainty.

Although I’m busy trying to get used to coordinating the QUICS project, statistics and uncertainty analysis again, my mind is currently still very often on something else (or someone else, to be more precise, see Fig 1), as I have recently returned to work from maternity leave.

quics_baby
Fig. 1. QUICS outreach to the very young.

And being totally honest, getting back to research after 6 months of full time baby care is not as easy as I hoped. Although, I very much feel I am one of the lucky ones, for several reasons: Firstly having passed my probation as a lecturer, and then being promoted to senior lecturer a couple of years ago, means I now have a permanent contract and the maternity leave benefits are good. Of course, I worked hard to become a lecturer and write journal papers and apply for funding and all the rest of it, but to all the contract researchers out there still on short term postdoc contracts, please be aware it is also very much a case of being in the right place at the right time when vacancies open up, being able to take up opportunities in very different locations and countries, as well as a bit of luck in just getting that 1 point lower or higher score for your research proposals etc. Secondly, my husband, also a senior lecturer at Sheffield University, is very involved and has taken up the recently introduced shared parental leave option, meaning we each have 6 months ‘shared parental leave’, and he is currently at home looking after our son. And last but not least, the QUICS project is full of helpful people, e.g. people willing to take seriously things like meeting online when travel is difficult, willing to temporarily take over someone’s task and taking seriously the researchers’ code of practice.

But the decision of whether or not to have children whilst trying to build a research career is not such an easy one. We personally waited until we finished our PhD, completed several post-doc positions, wrote papers, got lecturer positions, applied for funding and got some projects funded and started up…. etc. etc. etc… Somehow it never seems to be quite the right time, but one thing is not uncertain at all – for women the biological clock is ticking. So we eventually decided to go for it and although I’m getting closer to 40 than to 30, thankfully, our son arrived all healthy and happy.

Unfortunately, it’s often more uncertain and risky for academic researchers, especially those on fixed research contracts. Sadly for Postdoc and PhD researchers, it can cause serious difficulties to take maternity leave or shared parental leave, and the news makes dismal reading: Academia for women: short maternity leave, few part-time roles and lower payShould academics lose out financially for taking maternity leave?.

Both in UK and other countries, it really depends on the way your project is funded as to what maternity benefits you can get, as well as on your department on how they deal with short contracts, and what happens if, for example a baby is due in the last few months of a 1-year contract. It also really depends on your supervisors, close colleagues and Head of Department. As apart from the research time that can be lost due to maternity leave when you are unlucky with your source of funding, there is a whole list of other things that can take your time away from research… Think for example of time lost due to morning sickness, miscarriage (more common than many people realise), back-ache, iron deficiency, breastfeeding once back at work & various other embarrassing tricks your pregnant body plays on you which no-one told you about earlier… (In case you wonder, yes I experienced all of the above). Many women I know also experienced these and various other pregnancy related inconveniences (I have yet to meet one woman who came out completely without issues! Talking to most women I know who recently had babies, it is not just the growing belly visible to all, there is also an equally large growing ‘mental’ baby that slowly over 9 months manages to take over most of your brain, only to be gradually released again once the physical baby arrives). Understandably, many women (and also their partners) do not necessarily wish their colleagues to know about these kind of issues, and thus struggle in silence. And hence this remains a perpetuating circle… As to you it then seems like every other woman goes through her pregnancy without any troubles, making you feel bad for feeling bad, and feeling guilty because you think you should be happy as for some people the silent heartache is that they are not able to get pregnant at all, and so on…

Again, I feel lucky in this respect; I have been working with the same colleagues for some time, and felt I could talk to them about this and they provided both moral support and support with my research. Also, once I was pregnant, my husband immediately said he wanted his ‘6 months off work’. I thought quite a few men in the UK would jump at this opportunity to share parental leave and spend some time at home bonding with their new baby. To our surprise, we were the first people in the faculty to go for this new option, and although HR sorted us out fine in the end, we were the first ones to try out the new forms to apply for this and get confused by the new forms. However, just as we had sorted out our applications for shared parental leave, news headlines appeared that since this opportunity was introduced in the UK, only 1 in 100 men were taking up this option!
Why are only 1 in 100 men taking up shared parental leave?
Why have so few men taken up shared parental leave? Perhaps it’s because mothers won’t hand over the baby.
It is not clear why so few men have taken up shared parental leave to date, reasons mentioned are income equality between men and women (why is this still so often the case in 2017!?), as well as women not wanting to give up their share of leave. Interestingly, a lot of people are now indeed asking me whether my husband is coping with the baby, to which I can happily say that both men are doing just fine (Fig. 2). But yes, I admit honestly, I do miss my son, and find it very hard to leave him in the mornings!

mencapableoflookingafterbabies
Fig. 2. Men are also perfectly capable of looking after babies.

So what can we all do to reduce this risk and uncertainty related to maternity / paternity / adoption leave and a research career? Firstly, keep checking with research funding providers what the provisions for maternity / paternity / adoption leave are, and be upfront about this to prospective researchers. If the provision is less than ideal, keep complaining about this and mentioning this to HR, Heads of department, the funding body etc. Secondly, talk a bit more openly about these issues, and finally, make it clear to researchers that having babies is ok and not an ‘inconvenience’, as we all need a new generation to carry on our research in 20-30 years’ time! Hopefully, this blog may help some researchers currently thinking about the risks and uncertainty related to becoming parents.

Alma Schellart

The uncertainty and risks related to parental leave and a research career

Overcoming writer’s block: “Read more” – my father said. On how to adjust expectations in light of new knowledge.

I was overdue with the blog for two weeks and struggling with finding inspiration for it when I routinely called my dad to chat about this and that. Not very proud I mentioned my struggle and in return I was told:

“You should read more” and he asked me to call him back in a bit. When I called back I started being told the following story: “When usually problems look simple to you, you might have a problem understanding why others have a problem. We might have a chance to understand it though if we thoroughly start comparing our ways of thinking.” He said, till now he remembers when a student asked him how it is possible that something has 1/7 chance to occur when the case involved 5 people. My dad told me that at first he did not understand the student’s question and the problem. Until he heard that the student intuitively was only allowing answers of 1/5, 2/5, and so on. But not 1/7…

He then sent me the book of Leonard Mlodinow “The Drunkard’s Walk: How Randomness Rules Our Lives” – and suggested that since QUICS is largely about calculating probabilities I may find inspiration for my blog in this book.

The above example with the student may appear trivial, but it is not. Leonard Mlodinow in his book shows a different example of how our intuition may complicate our life. Of course, you have already been forewarned with the student’s example, but before you read further, imagine a situation where someone you know got their health check results and (according to these results) has a virus of an incurable disease causing death within the next 10 years. You also have information that the accuracy of the detection method is 999/1000. Now close your eyes and imagine above again. What is that person’s chance to survive the next 10 years? Will you start supporting that person as if they were going to die? Or perhaps there is no need?

Now imagine a community of 10000 people in a place where this disease does not occur. If everyone undergoes the health check, the chance of 1/1000 that the detection method gives wrong result will effect in that about 10 people receives a positive result. But the chance that any of these people is sick is exactly ZERO!

That’s why it is so important to understand the additional information of the accuracy of the detection method – finding the virus becomes a correct result in 999 on 1000 cases, but only if the person IS sick. If you do the check on a healthy person, the detection of the virus means INCORRECT result in EVERY case. So, if like in your case, a person you know gets the result of having the virus – always ask them first what risk group they belong to.

Now quoting Leonard Mlodinow: “To not account for this is a common mistake in the medical profession. For instance, in studies in Germany and the United States, researchers asked physicians to estimate the probability that an asymptomatic woman between the ages of 40 and 50 who has a positive mammogram actually has breast cancer if 7 percent of mammograms show cancer when there is none. In addition, the doctors were told that the actual incidence was about 0.8 percent and that the false-negative rate about 10 percent. Putting that all together, one can use Bayes’s methods to determine that a positive mammogram is due to cancer in only about 9 percent of the cases. In the German group, however, one-third of the physicians concluded that the probability was about 90 percent, and the median estimate was 70 percent. In the American group, 95 out of 100 physicians estimated the probability to be around 75 percent.”

Many of us apply Bayes’ law in our research. But how much do we think of it in our day-to-day life?

Kasia

PS. Apparently, a similar mathematical question to the ones described above my dad asked a number of people, all with a higher education degree, some in maths, but only one (!) replied that the answer is not obvious and needs more information to calculate conditional probability (Bayes’). That person was my sister!

I don’t remember being asked that question, but she often happens to be ahead of me on various tracks:

f1

PPS. Now I hope that the above post won’t be like Leonard Mlodinow describes: “that sometimes lands me on the do-not-invite list for my neighbors’ parties” index

Overcoming writer’s block: “Read more” – my father said. On how to adjust expectations in light of new knowledge.

There is nothing as practical as a good assessment of uncertainty

Today, I would like to give you my two cents on uncertainty quantification of environmental models, meaning the assessment of various model uncertainties. This is more or less an attempt to summarize what I’ve learned in the past 10 years, or so.

I would like to walk you through a didactical example on a really, really high level and then draw some conclusions regarding the role of uncertainties in environmental decision support. To make things even simpler for me, I refer to two of my favorite papers as often as I can (no, not my own papers, don’t worry):

Kenneth H. Reckhow, 1994, Importance of scientific uncertainty in decision making, Environmental Management, Volume 18, Issue 2, pp 161–166 [PDF ]

Peter Reichert (2012) Conceptual and Practical Aspects of Quantifying Uncertainty in Environmental Modelling and Decision Support, International Environmental Modelling and Software Society (iEMSs), 2012 International Congress on Environmental Modelling and Software, Sixth Biennial Meeting, Leipzig, Germany [PDF ]

Off we go, fasten your seatbelts.

How to best operate a complex system?

As an example adapted from Reckhow (1994), imagine your task is to optimally operate a wastewater treatment plant (WWTP), which produces “pollution” (Figure 2). Figure 1 shows how the cost are related with the “pollution” level: lower pollution levels requires a more costly treatment, but if you exceed the permissible pollution levels ‘2’ you have to pay a (large) fine.

quics_blog-image

Let’s assume two scenarios: In A (red line) treatment is comparably cheap, but the WWTP is shut down if the pollution exceeds ‘2’, which is represented as infinite cost (vertical segment of red line). In B, treatment (below ‘2’) is comparably more expensive, but still cheaper than getting fined (above ‘2’). The question is: How would you operate the WWTP in each scenario if our only goal is to minimize the costs?

If know the pollution level without uncertainty (blue arrow), obviously you would operate the WWTP exactly at pollution level ‘2’.

Uncertainty comes at a price

Unfortunately, in the real world, your observed pollution from the WWTP is not exact (Figure 2), but contains uncertainty from incomplete knowledge, for example from measurement or sampling errors, as well as natural variability, for example due to different weather conditions.

Therefore, your decision regarding operational cost depends on two different things: 1) how you value the pollution, the so-called decision “attribute”, as shown in Figure 1, and 2) the uncertainty of your predicted performance regarding this attribute. These two are fundamentally independent. As a result, for the so-called decision “alternative” A, you operate the WWTP more conservatively and invest more into reducing pollution, because you want to avoid the brutal shut-down when you exceed ‘2’ pollution levels. For alternative B, you would accept exactly that probability of fines which minimizes overall cost.

For me, Ken Reckhow’s example nicely illustrates four major points:

  1. Having an accurate estimate of your prediction uncertainty is relevant for practice. It is not only scientifically interesting, because the spread of the bell curve in Figure 2, i.e. the scientific uncertainty ultimately determines the cost of operation.
  2. Without an estimate of prediction uncertainty, we are not able to optimally operate our systems. This is simply, because it is difficult to justify whether we are investing too much or too little.
  3. Reducing the scientific uncertainty is valuable. This can be seen in Figure 2, where narrowing the curves will avoid excessive cost due to both over-treatment (left branch) and fines (right branch).
  4. It is important to separate the prediction of pollution from the quantification of your preferences for different pollution levels. As a scientist and environmental engineer, my task is rather to come up with reliable prediction for pollution levels for the different alternatives A and B. The valuation concerns constructing the cost function and, with multiple decision attributes, quantifying the relative preferences for each attribute. Although practical engineers do this implicitly for their clients all the time, this is not our traditional core discipline and could probably be performed better by stakeholders and environmental economists.

So far, so good. But, as we are happy to delegate the valuation to others, for us engineers the grand challenge still remains:

How can we construct reliable prediction intervals of environmental models?

To answer this, I’d like to summarize the main suggestions of Reichert (2012), beefed-up with my personal ideas, in the following statements:

  1. Scientific knowledge is currently best formulated as inter-subjective probabilities, using the mathematical framework of probability theory. Counter-arguments mainly concern the fact that humans are uncertain about their own beliefs and different people may quantify their beliefs differently. Unfortunately, alternative theories such as fuzzy logic, possibility theory, interval analysis, etc. also do not have a solution for this problem but they lack a similarly good mathematical foundation for being an ideal representation of uncertainty. While not perfect, we think that probability theory is currently the best game in town.
  2. More “informal” approaches to express uncertainty should be avoided. These informal approaches, some of which have been developed by hydrologists, are either based on arbitrary modifications of the likelihood function or on the construction of uncertainty intervals based on the coverage frequency of observations, such as runoff. Several formal arguments have been made why those should be avoided. The one which speaks loudest to me is the fact that those approaches have not been adopted by disciplines where learning from data is key: statistics, physics, mathematics, information theory, medical sciences or machine learning.
  3. We need (computer) models to predict the environmental outcome for different alternatives. Environmental systems are so complex, that we usually cannot guess “what-will-happen-if”. Models often give much better predictions than guessing as they are built on mathematical descriptions of the different physical phenomena. Nevertheless, by definition, any model is still a very simplified description of reality. The challenge is to find a compromise between a too simple model, which delivers uncertain predictions because it does not include all important mechanisms, and a too complex one, which has many parameters which are not known exactly and thus cause large prediction uncertainties.
  4. We should consider error-generating processes realistically and where they occur. With purely deterministic models, without random influences, we cannot construct prediction intervals when single best estimates are used as model parameters. Commonly, randomness has often only been considered as additive i.i.d. (independent and identically distributed.) Gaussian errors. This can be interpreted such that the error-generating processes only concern observation noise, for example in flow data. Unfortunately, our experience shows that this is not realistic.
    One step further is to capture the analyst’s incomplete knowledge on credible model parameter values as probability distributions, which leads to Bayesian inference. A logical next step is to also formulate model structure deficits explicitly as additive stochastic processes, representing the “model bias”. While it does not provide much information on the origin of uncertainties, this at least is a practically feasible “crutch” to deal with structurally deficient models and other “known unknowns”. More desirable is probably a so-called “total uncertainty analysis”, which takes into account uncertainty in i) the model inputs, ii) model structure deficits, iii) incomplete knowledge on the model parameters and iv) output observation errors. In this regard, as written above, we should start to think about considering the randomness where it is appropriate. Often, this will lead us to using stochastic models, possibly using time-dependent parameters and, in rainfall-runoff modelling, stochastic components for the “true” catchment-average rainfall. For me as an engineer, the concept of time-dependent parameters is even compelling, because it might very well be that many of “our” parameters depend on external drivers. Examples could be the imperviousness, the stock of particulate matter or the groundwater infiltration into sewers, which depend on time (of the year) or soil moisture or air temperature or other external drivers. Introducing this flexibility would make it possible to actually learn from data on those dependencies, which ultimately leads to better understand our systems. In a similar fashion, it is rather crude to assume that rainfall data, often observed by point rain gauges, are error free and representative of a catchment of several hectares or more. Multipliers, or even stochastic error processes, might lead to more reliable predictions and avoid over-confidence. In summary, if the variability in your model predictions is caused by rain gauge errors, or by seasonal variability of your groundwater levels, you better have a good description for this. If not, you incorrectly map the variability to other parameters (from both your mechanistic model and error model), consequently infer the wrong things from your data and have to accept large prediction uncertainties. And, ultimately, build too expensive systems (point 1).
  5. Always consider model structure deficits. This is, because we fundamentally need to simplify our models, as mentioned above. We can detect such mismatch by residual analysis, which often shows autocorrelation which cannot be explained by observation errors alone. To generate reliable prediction, it is important to consider this bias explicitly, for example using the statistical bias description technique (see Section 3.2 in Reichert, 2012).
  6. Employing a statistical bias description practically requires Bayesian inference. Modeling the system response with a mechanistic model and an additive stochastic process for bias creates an identifiability problem between these two components, because the data can be modeled with either one of them. This identifiability problem can be resolved by adopting Bayesian inference, where we constrain credible parameter values by assigning a “prior” parameter distribution based on our knowledge or belief.
  7. Separate the description of the error generating processes from the numerical algorithm used to estimate parameter values. Effective learning from the data, i.e. calibrating our models, or inferring parameters, requires a good description of mechanism and error-generating processes and efficient numerical techniques. The first concerns the mathematical formulation of the so-called likelihood function, which describes how likely it is to observe the measured data given the model hypothesis and a set of parameters (see Reichert, 2012), and the second is the algorithmic method how to find likely parameter values for the likelihood objective function. Mixing both leads to confusion, for example only reporting “Bayesian data analysis” as a calibration method, “Ensemble Kalman Filter”, or “Markov-Chain Monte Carlo” does not permit to evaluate whether the assumptions on the error-generating processes are reasonable or not.
  8. When your model is slow, adaptive samplers and emulators can help. With Bayesian parameter estimation, the goal is to estimate the posterior given the prior (which encodes our prior knowledge) and the assumed likelihood function (which “contains” the model and the data”). Numerically, this is often done by Markov Chain Monte Carlo (MCMC) techniques because it gives complete freedom in specifying prior distributions. MCMC methods can be challenging when the posterior has many dimensions and is highly correlated, and when the model is slow. Scale-invariant or adaptive samplers can increase the efficiency and emulators of either the mechanistic model or the posterior can speed-up the analysis.
  9. When your likelihood function is analytically intractable, Approximate Bayesian computation can help. With complicated error-generating processes, i.e. stochastic models, it might not be possible to directly sample from the posterior, because it takes too much time to evaluate the likelihood function (e.g. it may include high-dimensional integrals). Approximate Bayesian Computation, so-called “ABC”, can then be a strategy to get an approximate sample from the posterior. In contrast to several reports, ABC methods are not “likelihood-free”, because they still need an explicit formulation of the error-generating processes.
  10. We need more research on good likelihood functions. Last, but not least, it has been reported that (rainfall-runoff) models which have been calibrated manually with due care, have more reliable predictions than those models which have been calibrated by Bayesian inference with a statistically sound likelihood function that accounts for model bias… This is not a methodological problem, but rather reflects the fact that the engineer has different preferences, which she cannot formulate mathematically. A simple example would be that in manual calibration she instinctively values peak flows higher than base flows. In parameter estimation, she did not explicitly assign different uncertainties to the different regimes.

[Deep silence.]

So.

giphy
via GIPHY

If this was too fast, or too rough, I encourage you to take a plunge into the two short papers. You can also shoot me an email if you have specific questions. If you want to get hands-on training, the SIAM Summer School in Environmental Systems Analysis is the place to be.

Don’t hide (your) errors

In summary, I think we as scientists can support (cost-)effective decisions by always reporting an uncertainty estimate with our model predictions. And we should strive to be as transparent as possible about the error-generating processes. We should also find mathematicians who listen to our needs and help us to develop appropriate likelihood functions and numerical techniques. Unfortunately, there is no easy way out and we should also be very skeptical about informal approaches who are not adopted by disciplines which traditionally consider uncertainty in their data analysis. Finally, how to best use uncertain predictions in decision support (Figure 1), is the concern of decision theory, an entire field of research of its own, which has probably yet to be fully adapted for integrated catchment studies. At the end of the day, the sensitive influence factors in the real-world decision problem may not be those of our likelihood functions or numerical algorithms, but how we frame the decision problems and construct useful objective hierarchies. In a similar fashion, trying to deal with staged decisions or even “unknown unknowns” requires very different methods.

Happy holidays!

Jörg Rieckermann
joerg.rieckermann@eawag.ch

There is nothing as practical as a good assessment of uncertainty

How extreme was Storm Angus?

On 21st November 2016 heavy rainfall battered the UK. This was storm Angus, which brought severe flooding in different parts of the UK, but in particular in the South West of England (SWE). According to the newspapers, nearly 70 flood warnings were in place, a river burst in Devon, heavy rain flooded the railway between Exeter and Bristol, several locations were flooded in and around Bristol (Backwell, Whitchurch, Fishponds), cars were reported under water in South Bristol, and flooding was reported in the Chew Valley. So, how extreme was this rainfall event and why did it cause so much damage in the SWE?
Figure 1a shows the average November rainfall for the last 25 years [1]. Figure 1b shows the daily rainfall amounts recorded by the Met Office raingauge network on 21st Nov 2016 [2]; the data have been interpolated to show the spatial distribution of precipitation across the UK. The actual rainfall depths recorded by the raingauges are also shown in the same figure. It is interesting to see that the rainfall depths recorded around Bristol (42.4 mm) and other places in the SWE on 21st Nov 2016 were equivalent to more than 50% the expected average rainfall in November. A rainfall sensor installed at the University of Bristol recorded 47.8 mm of rain during the same period. This rain sensor is a disdrometer, which is able to measure the raindrop size distribution that can be used to derive rainfall intensities with 1-minute temporal resolution. The largest rainfall depth was recorded in Devon with 63.2 mm of rain on the same day. Figure 1c shows the daily rainfall depths recorded by the Met Office weather radar network at 1km resolution on the same day [3]. There are similar patterns in the distribution of precipitation by radar rainfall and raingauge measurements, but radar clearly shows regions with very heavy localised rainfall (not observed by the raingauge network) that caused flash flooding in different places across the SWE.

figure1_161208
Figure 1. Average November rainfall (a); Raingauge rainfall (b) and Radar rainfall (c)

Figure 2 shows the depth-duration-frequency (DDF) rainfall curves (colour solid lines) for storms with different return periods (T) for south Bristol using the DDF model proposed in the Flood Estimation Handbook [4].  The figure also shows an estimation of the storm return period for south Bristol using raingauge, disdrometer and radar rainfall for different storm durations. Note that the storm started in the evening of 20th Nov and finished in the afternoon of 22nd Nov and this is why some durations are longer than 24 hr. There are two important points to mention about Figure 2. The first one is related to the fact that the return period of the storm is sensitive to the rainfall measurement used in the analysis, with maximum return periods of 3, 5 and 5 years when using raingauge, disdrometer and radar observations respectively. It is fair to say that all the measurements were not taken exactly at the same location and that all the rainfall sensors have different error characteristics, but the results highlight the uncertainty in the estimation of the storm return period when there is such a high spatial variability of precipitation.  The second point is related to the fact that the storm return period changes depending upon the selected duration of the storm. For instance, with the disdrometer measurements, a duration of about 10h gives a storm with a return period of less than 2 years, but a duration of about 20h gives a 5-year return period storm.  This result highlights the uncertainty in the estimation of the storm return period when using a given storm duration.  Sewer urban systems are often designed to cope with storms with return periods of 1-2 years, and in many cases return periods of 5 years are adopted in areas vulnerable to flooding; more recent guidelines suggest the design of systems for storms with return periods of up to 30 years in order to prevent surface flooding [5]. The results from radar and disdrometer measurements indicate that the storm had a return period of 5 years for South Bristol. However, given the large spatial variability of the storm and the severe localised flooding that occurred in some locations in Bristol, it is very likely that the return period of this storm was actually higher. It is also fair to say that the catchment initial conditions might have increased the risk of surface flooding, with factors such as catchment wetness and drainage blockages (e.g. due to autumn dead leaves) having important effects.

figure2_161208
Figure 2. Rainfall Depth-Duration-Frequency curves

We live in a world with environmental uncertainty and the EU QUICS project will tackle some of the important issues related to the quantification of uncertainty in integrated catchment studies.

figure3_161208

References
[1] Keller, et al. (2015) CEH-GEAR: 1 km resolution daily and monthly areal rainfall estimates for the UK for hydrological and other applications, Earth Syst. Sci. Data, 7, 143-155, http://dx.doi.org/10.5194/essd-7-143-2015
[2] Met Office (2006): MIDAS UK Hourly Rainfall Data. NCAS British Atmospheric Data Centre, 2016. http://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1
[3] Met Office (2003): 1 km Resolution UK Composite Rainfall Data from the Met Office Nimrod System. NCAS BADC, 2016. http://catalogue.ceda.ac.uk/uuid/27dd6ffba67f667a18c62de5c3456350
[4] Faulkner D (1999) Flood Estimation Handbook, Vol 2, Institute of Hydrology, ISBN 0948540907.
[5] Butler D and Davies JW (2011) Urban Drainage, Spon Press.

Miguel Angel Rico-Ramirez, University of Bristol.

How extreme was Storm Angus?