False Positive Primer

We hear a lot of talk about False Positive and False Negative these days.

Along with True Positive and True Negative, they constitute important principles in statistics.

In general, this is what the terms mean in the context of a test for X:

  • True = Test is right
  • False = Test is wrong
  • Positive = Test finds X
  • Negative = Test does not find X

While the generic definition is quite clear, the interpretation of these terms can be quite confusing in individual situations.

Ergo this primer where I take three different scenarios and explain what these terms mean under each of them. The scenarios are:

  1. Credit Card Fraud Detection & Prevention
  2. Covid-19 Test
  3. Cry Wolf Syndrome

#1. CREDIT CARD FRAUD DETECTION & PREVENTION

A Credit Card Issuer Bank uses a Fraud Detection & Prevention (FD&P) software to test credit card transactions for Genuine or Fraudulent use and, accordingly, approve or decline them respectively. Genuine is also called Authorized and references the use of the credit card by the genuine holder of the credit card; Fraudulent is also called Unauthorized and comes into picture when a fraudster steals the plastic credit card or otherwise gets hold of the credit card details (e.g. from dark web sites) and goes to town with it.

See my article “Controlling Credit Card Fraud Through Predictive Analytics” for details of this software. Suffice to say in the context of this blog post that the algorithms used by a typical FD&P software are stochastic i.e. non-deterministic in nature.

There are four outcomes of an FD&P software’s action on a credit card transaction:

Real Owner uses Credit Card. Transaction is Approved. Test does not find Fraud, ergo Negative. Payor is Real Owner, ergo test is correct, ergo True. True Negative. Enables genuine transaction to go through. Cardholder is happy. Good outcome.

Real Owner uses Credit Card. Transaction is Declined. Test finds Fraud, ergo Positive. Payor is Real Owner, ergo test is wrong, ergo False. False Positive. Impacts Revenue and CX adversely. Cardholder is annoyed. Bad outcome.

Fraudster uses Credit Card. Transaction is Approved. Test does not find Fraud, ergo Negative. Payor is not Real Owner, so test is wrong, ergo False. False Negative. Cardholder is outraged when he notices the fraudulent charge on his credit card statement. Bad outcome.

Fraudster uses Credit Card. Transaction is Declined. Test finds Fraud, ergo Positive. Payor is not Real Owner, so test is right, ergo True. True Positive. Cardholder can be blissfully ignorant of the attempted fraud.

Let’s summarize the above outcomes in the following table:

True Negative Real Owner uses Credit Card. Test does not find Fraud. Transaction is Approved. Good Outcome
False Positive Real Owner uses Credit Card. Test finds Fraud. Transaction is Declined. Bad Outcome.
False Negative Fraudster uses Credit Card. Test does not find Fraud. Transaction is Approved. Bad Outcome
True Positive Fraudster uses Credit Card. Test finds Fraud. Transaction is Declined. Good Outcome

Implication:

In an ideal world, an FD&P software will approve all genuine transactions and decline all fraudulent transactions.

But, in a real world driven by cost, time and ROI considerations, that’s impossible. Every real world FD&P software will exhibit some degree of False Positives and False Negatives.

At best, we can expect an FD&P software to minimize False Positives and False Negatives.

#2. COVID-19 TEST

The four outcomes of a Covid-19 test are as follows:

  • True Positive: Patient has Pandemic and Test finds Pandemic Marker.
  • False Negative: Patient has Pandemic but Test does not find Pandemic Marker.
  • False Positive: Patient does not have Pandemic but Test finds Pandemic Marker.
  • True Negative: Patient does not have Pandemic and Test does not find Pandemic Marker.

The mark of a reliable test is that it exhibits high True Positive and True Negative rates and low False Positive and False Negative rates.

While False Positive and False Negative rates should ideally be zero, in current tests they tend to be around 10% and 30% respectively.

Implication 1: Hospital Administration

Non-Pandemic hospital hates False Negative since it means patient has to be retained and treated by it, which creates risk of spread of infection to other patients who are in the hospital for reasons unrelated to Pandemic. Patient won’t like False Positive since it means he will be shifted to Pandemic hospital where he risks getting infected from other patients who are there because they have Pandemic infection.

Implication 2: Test Claims

Pune test company MyLab claims 0% False Positive and 0% False Negative. While that’s unprecedented even for longstanding, well established tests, it’s not impossible. It’s important to examine what exactly MyLab tests. It could be testing for some X, where X is not the pandemic marker itself but is something very close to the pandemic marker. In that case, 0% False Positive and 0% False Negative are possible, but for X, not pandemic itself.

Implication 3: Testing Moral Hazard

This WhatsApp Forward insinuates that a testing lab is strongly driven to declare positive results:

My take:

The commercial incentive for a testing lab to declare Covid-19 Positive is obvious. But that doesn’t mean it will – or can.

The operative phrase in the above post is “All were Negative”. That’s not the case. A coronavirus test sample is not Positive or Negative by itself. It’s the Testing Lab that determines that. It does so based on guidelines that are fairly broad and via tests that use probabilistic – not determininistic – techniques.

Accordingly, the result is subject to fairly high percentage of False Positive and False Negative rates. A fraud testing lab can be driven by profit motive and may declare Positive when the sample is eventually found to be Negative (after more sophisticated tests are conducted). But a genuine testing lab can equally well report a False Positive without any shady intent because that’s the nature of the non-deterministic testing beast.

I see a parallel between this and stock recommendations: Typically for every 10 brokerage recommendations for a scrip, eight will be BUY and two will be SELL. That’s because a BUY recommendation has a much larger market than a SELL recommendation. If you’re wondering why, a BUY tip will address 100% of the market i.e. both people who don’t own that scrip (and can buy it) and people who do own that scrip (and can buy more of it) whereas a SELL tip will be applicable only to 5% of the market that owns the scrip (thence being in a position to sell it).

Implication 4: Engineering Versus Medicine

This is one of the most fundamental differences between engineering discipline, which deals with inanimate objects, and medical discipline, which deals with living beings.

If a ball is dropped from a height, it will fall to ground. Period. It does not change depending on whether the ball is observed by Newton or Einstein or any Tom Dick & Harry. The result is deterministic.

That’s not the case with living beings. Opinions and test results can, and do, differ from one person to another. Said in the language of Big Data / Statistics, science of inanimate objects is driven more by Causation whereas the science of living beings is driven more by Correlation.

#3. “CRY WOLF” SYNDROME

Google presents an interesting illustration of false positives etc. by using the famous Aesop’s fable about the Boy Who Cried Wolf.


I keep looking up this post from time to time to make sense out of False Positives etc. in different scenarios.

I hope you find it as useful as I do.