Repeating Stability Tests: Improving Stability Evaluation Confidence

Posted on 1/29/2014

In last week’s post we explored the idea of planning and preparing for unstable days, but in this week’s post I would like to take a step back. Unstable days with whumpfs and shooting cracks are likely the easiest days to manage safety concerns. There is no doubt: it is unstable and acting on that is easy.

But what happens when that is not so clear? The question is deceptively broad and deep, but today I would like to address part of the answer to that question. I would like to, on a high level, discuss the results of the Compression Test (CT), Extended Column Test (ECT), and the Propagation Saw Test (PST).

Broadly speaking, when you perform these tests one of two things can happen: you get a negative result indicating that you found no evidence of instability, or you get a positive result indicating evidence of instability. For example, a CT test in which the column does not fail at all (CTN) would be a negative result. An ECT in which the failure propagated across the block would be a positive result (ECTP).

What I would like to look at today is the rate of false positives and false negatives. That is to say, how likely is it that I perform a CT test and do not find evidence of instability when in fact it is unstable? Or, how likely is it that I perform a PST and find evidence of instability, when in fact there is no instability?

To do this, lets take a look at some simple outcomes of tests.

-The slope is unstable, and the test reports it is unstable (true unstable)

-The slope is unstable, and the test reports no evidence of instability (false unstable)

-The slope is stable, and the test reports it is unstable (false stable)

-The slope is stable, and the test reports no evidence of instability (true stable)

Additionally, I want to answer the question of how useful is it, to repeat this test? Specifically, how many times should I repeat my test so that if it is unstable, that I can be very confident that I recognize that? I focus on being confident in knowing it is unstable, because being wrong on that point carries big consequences. If we mistakingly think a truly stable slope is unstable, at worse we just come back another day. Thus, all of the attention will be focused on false stable results.

First, what does very confident mean? Being out in the backcountry carries inherent risks so at some level we have to accept a certain level of risk. Are we willing to take the chance that 1 out of 10 times we are wrong? Clearly thats a little high a risk to take. So what should we use as a basis of comparison?

The easiest basis of comparison is to look at risks that we already are willing to accept. For example, every day most of us make the decision that the risk of dying in a car accident is worth the benefit we gain. It turns out, that the probability of being in involved in a data car accident is about 1 in 20,000. So lets find how many times we must repeat our tests so that we are as accepting of the risk of having the wrong result as dying in a car accident.

For the moment, lets look at the case where it is unstable, but the test reports no evidence of instability. Using the result of a research paper from ISSW 2006 by Karl Birkeland and Doug Chabot, the probability of this occurring for the CT test is about 10% (

So, the result we are really interested is if under unstable conditions we perform a series of CT tests and all of them do not find any evidence of instability. Probabilistically, this can be done by multiplying 10% by itself for each test. The result is best shown with a plot:

On either of these plots we are interested in where these two lines intersect. That means finding the point at which the probability of a false stable has met and exceeded the probability of being involved in a fatal car accident. It can be hard to tell by the first plot, but putting the data on a logarithmic scale reveals the intersection point. Somewhere between 3 and 4 tests, closer to 4, do we reach the risk that we already, every day consider acceptable implicitly.

There are a few things to remark about this. First, 4 tests is really not all that much considering that it is relatively easy to do 3 CT tests and an ECT behind it (1 row of 3 CT tests is ~90cm with 1 90cm ECT behind it). Second, this is making the critical assumption that in all the day, this is the only data that you use to make a skiing decision.

This doesn’t include prior observations from other ski days or observations made throughout the day. If we include those as “tests” then all of a sudden our rate of error falls precipitously! Lastly, by skiing even at a ski resort we are more accepting of risk than the fairly conservative assumption of driving a car. In fact, the rate of injury for skiing is at about .2% per day for any one skier at a ski resort. That is 10 times higher than our accepted risk level already!

The big takeaway here is that for the small expense of performing multiple tests and being observant throughout the day and season, we can gain a great deal of confidence and accuracy in our stability and risk assessment. Isn’t that extra effort worth it?

Pedro Rodriguez