Crowds can beat smart people, but crowds of smart people do best of all

Saturday, January 28th, 2023

Last January, Scott Alexander — along with amateur statisticians Sam Marks and Eric Neyman — solicited predictions from 508 people:

Contest participants assigned percentage chances to 71 yes-or-no questions, like “Will Russia invade Ukraine?” or “Will the Dow end the year above 35000?”

[…]

Are some people really “superforecasters” who do better than everyone else? Is there a “wisdom of crowds”? Does the Efficient Markets Hypothesis mean that prediction markets should beat individuals? Armed with 508 people’s predictions, can we do math to them until we know more about the future (probabilistically, of course) than any ordinary mortal?

After 2022 ended, Sam and Eric used a technique called log-loss scoring to grade everyone’s probability estimates. Lower scores are better. The details are hard to explain, but for our contest, guessing 50% for everything would give a score of 40.21, and complete omniscience would give a perfect score of 0.

[…]

As mentioned above: guessing 50% corresponds to a score of 40.2. This would have put you in the eleventh percentile (yes, 11% of participants did worse than chance).

Philip Tetlock and his team have identified “superforecasters” — people who seem to do surprisingly well at prediction tasks, again and again. Some of Tetlock’s picks kindly agreed to participate in this contest and let me test them. The median superforecaster outscored 84% of other participants.

The “wisdom of crowds” hypothesis says that averaging many ordinary people’s predictions produces a “smoothed-out” prediction at least as good as experts. That proved true here. An aggregate created by averaging all 508 participants’ guesses scored at the 84th percentile, equaling superforecaster performance.

There are fancy ways to adjust people’s predictions before aggregating them that outperformed simple averaging in the previous experiments. Eric tried one of these methods, and it scored at the 85th percentile, barely better than the simple average.

Crowds can beat smart people, but crowds of smart people do best of all. The aggregate of the 12 participating superforecasters scored at the 97th percentile.

Prediction markets did extraordinarily well during this competition, scoring at the 99.5th percentile — ie they beat 506 of the 508 participants, plus all other forms of aggregation. But this is an unfair comparison: our participants were only allowed to spend five minutes max researching each question, but we couldn’t control prediction market participants; they spent however long they wanted. That means prediction markets’ victory doesn’t necessarily mean they’re better than other aggregation methods — it might just mean that people who can do lots of research beat people who do less research.2 Next year’s contest will have some participants who do more research, and hopefully provide a fairer test.

The single best forecaster of our 508 participants got a score of 25.68. That doesn’t necessarily mean he’s smarter than aggregates and prediction markets. There were 508 entries, ie 508 lottery tickets to outperform the markets by coincidence. Most likely he won by a combination of skill and luck. Still, this is an outstanding performance, and must have taken extraordinary skill, regardless of how much luck was involved.

Comments

  1. Anti-Stats says:

    “Crowds”, or, “Consensus” will always be bad. No matter if it has higher average rates precisely because consensus will always remove exceptional ideas.

    Example: 100 hundred smart people would have a 79% rate of correct predictions, but if you source them individually, you would probably have 95% due to their original ideas not shared in the consensus.

    Statistics per se is bad science. It’s only useful to describe the past, but terrible for when the future is regarded.

  2. Pseudo-Chrysostom says:

    To make a riff on the old Anonymous saying: none of us are as stupid as all of us.

    To estimate the effective decision making capability of a consensus-based decision making body, take the wisdom of its most foolish member, and divide it by the total number of members.

Leave a Reply