Writing in Wired News, Bruce Schneier gives a good explanation of why low base-rate events (like terrorist chatter on the Internet) are so hard to identify through government’s efforts to eavesdrop on just about everything. The problem of reducing false positives to a manageable level while not increasing false negatives — missing the real bad guys — becomes insurmountable when almost all conversations and emails that mention terrorist targets or offer other supposed clues to terrorist intent are in fact harmless. This problem is common in many areas, because trying to find something that has low frequency by testing everyone always leads to so many false positives. Excerpts:
Security is always a trade-off, and for a system to be worthwhile, the advantages have to be greater than the disadvantages. A national security data-mining program is going to find some percentage of real attacks and some percentage of false alarms. If the benefits of finding and stopping those attacks outweigh the cost — in money, liberties, etc. — then the system is a good one. If not, you’d be better off spending that capital elsewhere….
All data-mining systems fail in two different ways: false positives and false negatives. A false positive is when the system identifies a terrorist plot that really isn’t one. A false negative is when the system misses an actual terrorist plot. Depending on how you “tune” your detection algorithms, you can err on one side or the other: you can increase the number of false positives to ensure you are less likely to miss an actual terrorist plot, or you can reduce the number of false positives at the expense of missing terrorist plots…..
Data mining is like searching for a needle in a haystack. There are 900 million credit cards in circulation in the United States. According to the FTC September 2003 Identity Theft Survey Report, about 1 percent (10 million) cards are stolen and fraudulently used each year. When it comes to terrorism, however, trillions of connections exist between people and events — things that the data-mining system will have to “look at” — and very few plots. This rarity makes even accurate identification systems useless.
Let’s look at some numbers. We’ll be optimistic — we’ll assume the system has a one in 100 false-positive rate (99 percent accurate), and a one in 1,000 false-negative rate (99.9 percent accurate). Assume 1 trillion possible indicators to sift through: that’s about 10 events — e-mails, phone calls, purchases, web destinations, whatever — per person in the United States per day. Also assume that 10 of them are actually terrorists plotting.
This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999 percent and you’re still chasing 2,750 false alarms per day — but that will inevitably raise your false negatives, and you’re going to miss some of those 10 real plots.
This isn’t anything new. In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. For example, even highly accurate medical tests are useless as diagnostic tools if the incidence of the disease is rare in the general population. Terrorist attacks are also rare, any “test” is going to result in an endless stream of false alarms.