You are listening to the Edge Case Research Self Driving Car Safety Series and in this episode, Phil Koopman, continues down the metrics track with coverage driven metrics. Phil will provide an overview of industry approaches, and focus metrics that measure the edge cases, or the rare situations that self driving cars will encounter in the real world.
Now over to Phil.
This is Phil Koopman from Edge Case Research with a series on self driving car safety. This time I’ll be talking about how to use coverage-based metrics in support of a safety claim for self driving cars. It takes way too many road miles to be able to establish whether a self driving car is safe by brute force. Billions of miles of on-road testing are just not going to happen. Sometimes people say, “Well, that’s okay. We can do those billion miles in simulation.” While simulation surely can be helpful, there are two potential issues to this. The first is that simulation has to be shown to predict outcomes on real roads. That’s a topic for a different day, but the simple version is you have to make sure that what the simulator says actually predicts what will happen on the road.
The second problem, which is what I’d like to talk about this time, is that you need to know what to feed the simulation. Consider, if you hypothetically drove a billion miles on the real road, you’re actually doing two things at the same time. The first thing is you’re testing to see how often the system fails. But the second thing, a little more subtle, is you’re exposing the self driving car test platform to a billion miles of situations and weird things. That means the safety claim you’d be making on that hypothetical exercise, because you can’t really drive the billion miles, is that your car is safe because it did a billion miles safely. But you’re tangling up with that testing two things. One is whether the system performs and the other is what the system has been exposed to. If you do a billion miles of simulation, then sure, you’re exposing the system to a billion miles of whether it does the right thing. But what you might be missing is that billion miles of weird stuff that happens in the real world.
Think about it. Simulating going around the same block a billion times with the same weather and the same objects doesn’t really prove very much at all. So, you really need a billion miles worth of exposure to the real world in representative conditions that span everything you would actually see if you were driving on the road. In other words, the edge cases are what matter.
To make this more concrete is a story about a self driving car test platform that went to Australia. The first time they encountered kangaroos there was a big problem because their distance estimation assumed that animal’s feet were on the ground and that’s not how kangaroos work. Even if they had simulated a billion miles, so they didn’t have kangaroos in their simulator, they would have never seen that problem coming. But it’s not just kangaroos. There’s lots of things that happen every day but not necessarily inside of the self driving car test platform and that’s the issue.
A commonly discussed approach to get out of the “let’s do a billion miles game,” is to use an alternative approach of identifying and taking care of the edge cases one at a time. This is the approach favored by the community that uses the Safety of the Intended Functions, SOTIF approach. For example, as described in the standard ISO PAS 21448. The idea is to go out, find edge cases, figure out how to mitigate any risk presented by them and continue until you found enough of the edge cases that you think it’s okay to deploy. The good part of this approach is that it changes the metrics conversation from lots and lots of miles to instead talking about what percentage of the edge cases you’ve covered. If you think of a notional zoo of all the possible edge cases, well, once you’ve covered them all, then you should be good to go.
This works up to a point. The problem is you don’t actually know what all the edge cases are. You don’t know which edge case cases happen only once in a while that you didn’t see during testing. This coverage approach works great for things where 90%, 99% is fine. Think about it. If there’s a driver in charge of a car and you’re designing a system that helps the driver recover after the drivers made a mistake, and you only do that 90% of the time, just picking a number, that’s still a win. Nine times out of 10 you help the driver. As long as you’re not causing an accident on the 10th time, it’s all good. But for a self driving car, you’re not helping a driver. You’re actually in charge of getting everything done so 90% isn’t near good enough. You need 99 points lots of nines. Then you have a problem that if you’re missing even a few things from the edge case zoo that will happen in the real world, you could have a loss event when you hit one of them.
That means this approach is great when you know the edge cases, but it has a problem with unknown unknowns, things you didn’t even know you didn’t know because you didn’t see them during testing. As an aside, it’s important to realize there are actually two flavors of edge cases. Most of the discussions happen around scenario planning. Things like geometry, is it an unprotected left turn or what if somebody is turning in front of you, what if there’s a pedestrian at a crosswalk? Those sorts of planning type things are one class of edge cases, but there’s a completely different class of edge case, which is object classification. What’s that thing that’s yellow and blobby? I don’t know what that is. Is that a person or is that a tarp that’s got loose? I don’t know. Being able to handle the edge cases and geometry is important. That’s one thing. Being able to handle the perception edge cases is also important, but it’s quite different.
If you’re doing coverage based metrics, then your metrics need to account for both the planning and the perception edge cases, possibly with two separate metrics. Okay, so the SOTIF coverage approach can certainly help, but it has a limit that you don’t know all the edge cases. Why is that? Well, the explanation is the 90/10 rule. The 90/10 rule in this case is 90% of the times you have a problem, it’s only caused by the 10% of the very common edge cases that happen every day. When you get out to the stuff that happens very rarely, once every 10 million miles say, well, that’s 90% of the edge cases, but you only see them 10% of the time because they happen so rarely. The issue is there’s an essentially infinite number of edge cases that each one happens very rarely, but in aggregate, they happen often enough to be a problem. This is due to the heavy tail nature of edge cases and just weird things in the world. The practical implication is you can look as hard as you want for as long as you want, you’ll never find all the edge cases and they may be arriving so often that you can’t guarantee an appropriate level of safety even though you fixed every single one you found because you would just need to look too long.
Going back to closing loop with simulation, what this means is if you want to simulate a billion miles worth of operation to prove you’re a billion miles worth of safe, you need a billion miles worth of actual real world data to know that you’ve seen enough of the rare edge cases that statistically probably it works out. We’re back to a billion miles of data on the same exact sensor suite you’re going to deploy is not such a simple thing. What might be able to help is ways to sift through data and extract the edge cases so you can put them in a simulation.
The takeaways from all this or that doing simulation and analysis to make sure you’ve covered all the edge cases you know about is crucial to being able to build a self driving car but it’s not quite enough. What you want is a metric that gives you the coverage of the perception edge cases and it gives you the coverage of the scenario and planning edge cases. When you’ve covered everything you know about, that’s great, but it’s not the only thing you need to think about when deciding if you’re safe enough to deploy.
If you have those coverage metrics, one way you can measure progress is by looking at how often surprises happen. How often do you discover a new edge case for perception? How often you discover a new edge case for planning? When you get to the point that the edge cases arriving very infrequently or maybe you’ve gone a million miles and haven’t seen one, that means there’s probably not a lot of utility in accumulating more miles and accumulating more data because you’re getting diminishing returns. The important thing is that does not mean you’ve got them all. It means you’ve covered all the ones you know about and it’s becoming too expensive to discover new edge cases. When you hit that point, you need another plan to assure safety beyond just coverage of edge cases. Summing up, metrics that have to do with perception and planning edge cases are an important piece of self driving car safety, but you need to do something beyond that to handle the unknown unknowns.
You have just heard from Phil Koopman address coverage driven metrics. The key takeaway – edge cases matter. Metrics that accounts for both planning and perception edge cases are crucial when building a self driving car. To learn more about our approach to measuring and mitigating edge cases, please visit our website at www.ecr.ai. From there you can connect with our safety experts via email and on social media. We thank you for listening, and we look forward to working with you on delivering the promise of autonomy.