Naïve Bayes theorem: what do a road trip and the Suez Canal have in common?
"Oh dear, the notorious intersection is coming up!"
I was thinking of all the things I could be doing instead of driving to my in-laws, and my wife's remark brought me back to the highway. The dreaded intersection was famous for all the wrong reasons. Every accident would bring in a news crew before the ambulance would arrive. Then we would have road safety experts from industry and academia on radio, print and TV pointing out issues with the road surface, the signal coordination, the font sizes on signposts and more. Within a week, it would be business as usual, waiting for the next event.
"It's all right. We still have five hours to go from there. A few minutes won't matter so much."
"Really? Did you forget the last time we were stuck here for four hours? I kept insisting you take the detour, but you never listen to my instinct. I should get you the How to Lie with Statistics book for your birthday."
Now, this was a hit below the belt. I knew the data on accidents and corresponding delays on every major highway. After all, compiling these statistics was my job! I knew ID:2375 very well. It had a delay history that was published but ignored. There was a 90 percent chance of zipping through it. A delay of one, two, three and four hours was likely to happen with six, one, one and two percent probability respectively. So the only time we got stuck here for four hours was a one in 50 scenario. Given that we have visited my in-laws every quarter over the last 15 years, I'm surprised how reliable, though annoying, the statistics are.
"The expected delay is only 0.19 hours. If you compute the delays weighted by the probability, that is what you should get. Remember: people lie, statistics don't."
"Are you promising we'll arrive in exactly five hours, 11 minutes and 24 seconds?"
"Sorry! That's not how it works. The intersection may be fine, and we'll arrive in five hours. But that is 90% likely."
Perhaps I spoke too soon. We could see traffic piling up, and two police cars passed us.
"Looks like we'll be leaving within 12 minutes, right?"
"That's too naïve. Since we know an accident has happened, the expected delay is now 1.9 hours."
"Wow! That's a big jump."
"Well, that's where conditional probabilities come into play. Since we know an accident has occurred, we are confined to the ten percent zone. The no-delay scenario is ruled out. Now, a delay of one, two, three and four hours is likely to happen with respective 60, ten, ten and 20 percent probability, which gives a new expectation".
"I say we take the detour. At least we are sure to reach there in seven hours."
"Statistically, we are likely to reach there in 6.9 hours if we stay put".
We spent the next hour exploring the joys of the Naïve Bayes theorem. My wife could now appreciate how the probability of a one hour delay went up from six to 60 percent since we could rule out the no-accident scenario with a 90 percent likelihood.
"OK, your time is up. It's time to take the detour."
"Not so fast! Since an hour has passed, we are now looking at a new set of conditional probabilities. A delay of two, three and four hours is likely to happen with respective 25, 25 and 50 percent probability, giving a new expectation of 3.25 hours, or 2.25 hours more from now. Which means we'll arrive in 7.25 hours as compared to…"
"I win! The detour should take seven hours from now. You should have trusted my instinct an hour ago."
As I begrudgingly steered to the left and took the detour, I wondered if all that discussion about decision-making using the Naïve-Bayes model was a waste of time.
"It is just a matter of how the delays are distributed. For uniform distribution, it might have made sense to take the detour right away. A Poisson distribution could have kept us waiting until the end. Even if I didn't win, I'm glad statistics did."
"Dad, could you please step on it? I'm running out of juice."
Ah! The sulky teenager spoke at last, even if it took a draining battery for him to join the family conversation. And then it was time for him to utter something profound.
"When the ships were stuck outside the Suez Canal, did they use the same logic to decide whether to wait or reroute around Africa?"
That got me thinking. Did they? Could they? Should they? I already knew what statistics project I was going to work on while at my in-laws'.
Param Iyer is a PhD student at the University of Auckland Business School, Department of Information Systems and Operations Management.