Crowds can beat smart people, but crowds of smart people do best of all

Saturday, January 28th, 2023

Last January, Scott Alexander — along with amateur statisticians Sam Marks and Eric Neyman — solicited predictions from 508 people:

Contest participants assigned percentage chances to 71 yes-or-no questions, like “Will Russia invade Ukraine?” or “Will the Dow end the year above 35000?”

[…]

Are some people really “superforecasters” who do better than everyone else? Is there a “wisdom of crowds”? Does the Efficient Markets Hypothesis mean that prediction markets should beat individuals? Armed with 508 people’s predictions, can we do math to them until we know more about the future (probabilistically, of course) than any ordinary mortal?

After 2022 ended, Sam and Eric used a technique called log-loss scoring to grade everyone’s probability estimates. Lower scores are better. The details are hard to explain, but for our contest, guessing 50% for everything would give a score of 40.21, and complete omniscience would give a perfect score of 0.

[…]

As mentioned above: guessing 50% corresponds to a score of 40.2. This would have put you in the eleventh percentile (yes, 11% of participants did worse than chance).

Philip Tetlock and his team have identified “superforecasters” — people who seem to do surprisingly well at prediction tasks, again and again. Some of Tetlock’s picks kindly agreed to participate in this contest and let me test them. The median superforecaster outscored 84% of other participants.

The “wisdom of crowds” hypothesis says that averaging many ordinary people’s predictions produces a “smoothed-out” prediction at least as good as experts. That proved true here. An aggregate created by averaging all 508 participants’ guesses scored at the 84th percentile, equaling superforecaster performance.

There are fancy ways to adjust people’s predictions before aggregating them that outperformed simple averaging in the previous experiments. Eric tried one of these methods, and it scored at the 85th percentile, barely better than the simple average.

Crowds can beat smart people, but crowds of smart people do best of all. The aggregate of the 12 participating superforecasters scored at the 97th percentile.

Prediction markets did extraordinarily well during this competition, scoring at the 99.5th percentile — ie they beat 506 of the 508 participants, plus all other forms of aggregation. But this is an unfair comparison: our participants were only allowed to spend five minutes max researching each question, but we couldn’t control prediction market participants; they spent however long they wanted. That means prediction markets’ victory doesn’t necessarily mean they’re better than other aggregation methods — it might just mean that people who can do lots of research beat people who do less research.2 Next year’s contest will have some participants who do more research, and hopefully provide a fairer test.

The single best forecaster of our 508 participants got a score of 25.68. That doesn’t necessarily mean he’s smarter than aggregates and prediction markets. There were 508 entries, ie 508 lottery tickets to outperform the markets by coincidence. Most likely he won by a combination of skill and luck. Still, this is an outstanding performance, and must have taken extraordinary skill, regardless of how much luck was involved.

Posted in Economics, Policy | 2 Comments »

Comments

Anti-Stats says:

January 29, 2023 at 6:02 am

“Crowds”, or, “Consensus” will always be bad. No matter if it has higher average rates precisely because consensus will always remove exceptional ideas.

Example: 100 hundred smart people would have a 79% rate of correct predictions, but if you source them individually, you would probably have 95% due to their original ideas not shared in the consensus.

Statistics per se is bad science. It’s only useful to describe the past, but terrible for when the future is regarded.
Pseudo-Chrysostom says:

February 1, 2023 at 4:21 am

To make a riff on the old Anonymous saying: none of us are as stupid as all of us.

To estimate the effective decision making capability of a consensus-based decision making body, take the wisdom of its most foolish member, and divide it by the total number of members.

Jim: What actually mattered was not the profession of the user, but their expertise. The more domain experience someone had, the more successful they were in using Claude Code in that domain. And, even more interestingly, the more useful output they got from Claude from each prompt. This is because each session begins at the center of the manifold, and human expertise is required to push it further and further into distant regions. The more expertise you have, the more you can push the session in a...
Jim: Biotech is arguably AI’s most promising application.
Jim: James James: Granted, the Great Halfrican Uprising Media Spectacle presumably was astroturfed with real money—to the extent that the banks’ circulating credits can be described either as “real” or as “money”—which must then have found its way into the pockets of real people.
Jim: *A media spectacle involving Halfricans somehow, or a different media spectacle involving a rabble of white people wandering the premises of the so-called “People’s House”, one would not.
James James: Black Lives Matter may have been fleeting, but for the people who stole millions of dollars and bought multiple houses, the consequences were not fleeting. These short-term movements could also be conceptualized as temporary astroturf fronts for permanent political machines.
Jim: Politics ordinarily affects the distribution or redistribution of wealth, directly or indirectly, as its ends or as a byproduct. “Follow the money,” as they say. Thus, the enormous shorting of airline stocks just before 9/11 was politics, just as South Carolina’s tax cut on Boomers’ boats is politics, and Florida’s tax cut on Boomers’ houses is politics. By this metric, one would expect “hyperpoliticsR 21; to be something like the seizure of innocent...
Jim: There is hardly anything more unsettling than a parasite.
Jim: Bruce: honored. Isegoria: Thiel’s optimistic thought experiment, as excerpted, fails to suggest any awareness that material production has intrinsic value independent of its successful financialization. As a card-carrying member of the “investor” class, a group of “people” who “allocate capital” (i.e., redirect the allocation of the labor of engineers) in order to “seek returns” (i.e., extract free money), that he would be blind to unpriced,...
Bob Sykes: The US no longer has either the military or industrial base (and maybe not the quality of people) needed to control the Strait of Hormuz. The situation in the Gulf has changed radically since we invaded Iraq. Iran is much stronger in every way, especially in asabiyyah. Iran covers some 600,000 sq mi, and has a population of 92 M. By comparison, Western Europe. The EU and UK combined have a population of 430 M and an area of 1.2 M sq mi. Iran has a modern industrial economy, and is capable of...
Isegoria: You might also consider Thiel’s optimistic thought experiment.
Bruce: Jim, your sales pitch made me buy it.
Jim: Just think of it: * Billionaire * Homosexual * Recipient of CIA investment funds * Gives talks at CIA events * Capitalist * Most famously associated with PayPal, the definitive fintech company * Now famously associated with Palantir, the definitive private surveillance company * One of the earliest investors in FaceBook, formerly LifeLog * Funded the foremost rocket company, SpaceX * Funded one of the leading arms producers, Anduril * Many other things * Presumably a bunch of weird stuff that...
Jim: Thiel’s book, read adversarially, is shockingly revealing. I’ve long been intensely amused that he gets away with calling himself a libertarian.
Jim: Correction: The Protocols of the Elders of Zion.
Isegoria: Thiel definitely warrants multiple posts. I first mentioned him as part of the PayPal mafia, back in 2007, and then as the head of Founders Fund.
Gaikokumaniakku: I only read Zero to One once, back in 2015 or so, but I should definitely give it another look. I encountered it back then as required reading for the interview process at a tech startup that ended up failing. Peter Thiel is an interesting character that could provide material for several blog posts.
Jim: Peter Thiel’s Zero to One is like The Protocols of the Elders of Zion for capitalist rent-seekers.
Bob Sykes: One of the reasons for the existence of EMT’s was the inability of medics and corpsmen from the Vietnam War to get work as nurses in hospitals and clinics. Nurses and their professional associations adamantly opposed formal recognition of the emergency skills of the men who had served in the war. Eventually EMT’s were added to fire departments as a way of letting the men use their skills, much to the benefit of everyone. The nurses’ resistance to recognizing these men was especially churlish...
Jim: Gaikokumaniakku: “Jim probably knows a lot more about capitalism than I do, and probably could teach me, but probably has better things to do with his time. If anyone wants to chime in with book recommendations, I’m all ears.” Zero to One, by Peter Thiel.
Eric Brown: Ah, no. Railroads got public land *after* building the railroad, not before. Even so, ~80% of the railroad companies got overextended and went bankrupt. Same thing happened with telegraph companies, though they didn’t get public land. There are obvious parallels with the internet and AI.

Isegoria

Crowds can beat smart people, but crowds of smart people do best of all

Comments

Leave a Reply

Search

Recent Comments

Categories