Since
@VicVox72 mentioned Bayesian statistics
You are kind of describing the difference between sensitivity/ specificity of a test and positive/ negative predictive value of a test result.
I understand the utility of sensitivity and specificity in a medical context but I find they make the math confusing. Which is to say, I would stop here:
P(c19 | test positive)
= P(test positive | c19) P(c19) / P(test positive)
@rmrf has wittingly, or unwittingly, transcribed the
examples in Wikipedia . So yes! The algebra checks out. While it is interesting you can derive the positive predictive value using Bayes rule, i think it confuses what it going on.
The utility of Bayes rule is that you can update the 'belief' in a proposition based on new knowledge. The canonical form is:
P(A|B) = P(B|A)P(A) / P(B)
The value on the left-hand side of the equation, P(A|B), is called the
posterior. It represents the 'belief' we have in the hypothesis 'A' after accounting for the evidence 'B'. The 'belief' P(A), before knowing anything about the evidence 'B', is known as the
prior. The important part is to recognise the posterior is a function of the evidence
and the prior. The cool thing about Bayes' rule is that it is recursive. The posterior can act as the new prior for the next round of observations. To bring this beautifully full-circled:
this episode shows that the scientific method and peer review process are, in the long run, self-correcting and effective.
You can see the analogue between the scientific method and Bayesian statistics: update your belief when new evidence is available
Moving back to the SARS-CoV-2 example; substitute 'A' with 'c19' and 'B' with 'test positive' into the canonical formula above and it is identical to what
@rmrf wrote. Despite more confusing syntax, the same simple principle applies. Bayes' rule allows you to calculate the probability of a person having SARS-CoV-2 given the other prior knowledge (background prevalence, contact) AND a positive test result (new evidence).
A small weakness of Bayesian methods is that they rely on assumptions. For example, how do you reasonably determine the prior? I don't view this criticism as a major flaw in the method. Modelling is littered with assumptions. To me, the more interesting 'weakness' is computational complexity - in particular the value P(B), which is known as the model evidence. It can only be evaluated analytically (fast) for a limited set of probability distributions. In many complex models, the value has to be approximated numerically (slow). Whether this is really a disadvantage depends on your application and whether or not you can avoid the issue with a more intelligent model construction.
Finally, people
are naturally bad a statistics. Even when trained, they can be crappy. It is difficult and it can be counter intuitive. To illustrate the point, riddle me this:
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
It is a famous statistics puzzle: the
Monty Hall Problem. Before clicking on the Wikipedia link, calculate the odds of winning the car if you switch