Crowds can beat smart people, but crowds of smart people do best of all

Saturday, January 28th, 2023

Last January, Scott Alexander — along with amateur statisticians Sam Marks and Eric Neyman — solicited predictions from 508 people:

Contest participants assigned percentage chances to 71 yes-or-no questions, like “Will Russia invade Ukraine?” or “Will the Dow end the year above 35000?”

[…]

Are some people really “superforecasters” who do better than everyone else? Is there a “wisdom of crowds”? Does the Efficient Markets Hypothesis mean that prediction markets should beat individuals? Armed with 508 people’s predictions, can we do math to them until we know more about the future (probabilistically, of course) than any ordinary mortal?

After 2022 ended, Sam and Eric used a technique called log-loss scoring to grade everyone’s probability estimates. Lower scores are better. The details are hard to explain, but for our contest, guessing 50% for everything would give a score of 40.21, and complete omniscience would give a perfect score of 0.

[…]

As mentioned above: guessing 50% corresponds to a score of 40.2. This would have put you in the eleventh percentile (yes, 11% of participants did worse than chance).

Philip Tetlock and his team have identified “superforecasters” — people who seem to do surprisingly well at prediction tasks, again and again. Some of Tetlock’s picks kindly agreed to participate in this contest and let me test them. The median superforecaster outscored 84% of other participants.

The “wisdom of crowds” hypothesis says that averaging many ordinary people’s predictions produces a “smoothed-out” prediction at least as good as experts. That proved true here. An aggregate created by averaging all 508 participants’ guesses scored at the 84th percentile, equaling superforecaster performance.

There are fancy ways to adjust people’s predictions before aggregating them that outperformed simple averaging in the previous experiments. Eric tried one of these methods, and it scored at the 85th percentile, barely better than the simple average.

Crowds can beat smart people, but crowds of smart people do best of all. The aggregate of the 12 participating superforecasters scored at the 97th percentile.

Prediction markets did extraordinarily well during this competition, scoring at the 99.5th percentile — ie they beat 506 of the 508 participants, plus all other forms of aggregation. But this is an unfair comparison: our participants were only allowed to spend five minutes max researching each question, but we couldn’t control prediction market participants; they spent however long they wanted. That means prediction markets’ victory doesn’t necessarily mean they’re better than other aggregation methods — it might just mean that people who can do lots of research beat people who do less research.2 Next year’s contest will have some participants who do more research, and hopefully provide a fairer test.

The single best forecaster of our 508 participants got a score of 25.68. That doesn’t necessarily mean he’s smarter than aggregates and prediction markets. There were 508 entries, ie 508 lottery tickets to outperform the markets by coincidence. Most likely he won by a combination of skill and luck. Still, this is an outstanding performance, and must have taken extraordinary skill, regardless of how much luck was involved.

Posted in Economics, Policy | 2 Comments »

Comments

Anti-Stats says:

January 29, 2023 at 6:02 am

“Crowds”, or, “Consensus” will always be bad. No matter if it has higher average rates precisely because consensus will always remove exceptional ideas.

Example: 100 hundred smart people would have a 79% rate of correct predictions, but if you source them individually, you would probably have 95% due to their original ideas not shared in the consensus.

Statistics per se is bad science. It’s only useful to describe the past, but terrible for when the future is regarded.
Pseudo-Chrysostom says:

February 1, 2023 at 4:21 am

To make a riff on the old Anonymous saying: none of us are as stupid as all of us.

To estimate the effective decision making capability of a consensus-based decision making body, take the wisdom of its most foolish member, and divide it by the total number of members.

Jim: Japan?
Jim: Rest assured that, while the New York Stock Exchange has chickened out on its plan to list natural asset companies, or NACs, many of the world’s most conscious people continue to work diligently to financialize and/or securitize the natural world in toto. https://www.reuters.com/ sustainability/climate-e nergy/nyse-pulls-plan-en vironmentally-sustainabl e-asset-class-2024-01-17 https://www.youtube.com/ watch?v=DvxxdZpMFHg
bomag: Thanks, Jim. I’m wondering about the urge to securitize other things, like carbon. What stops TPTB from nationalizing oxygen in the atmosphere, and charging us for it’s use; and punishing badthinkers by denying use of the resource?
Handle: Affordable repairability of the kind of damage cars tend to receive from accidents has not been in the economic interests of manufacturers for a long time, being something they can pass on to the insurance industry which can pass on the costs easier due to mandatory coverage. The typical body shop price to get even minor cosmetic dings and scratches fixed has exploded in recent years, far above even the plentiful examples from all those “The Inflation Is Real” memes. I see a lot more...
Handle: For over 60 years the most sophisticated missiles have been able to perform automated celestial navigation with sensors able to see the stars through clouds and smoke even in the daytime and with enough precision to geolocate and correct course / drift of inertial-systems on the fly. A lot of the best of these and other thermal sensors have been stubbornly expensive, resource intensive, or needing cryogenic conditions for a long time, seems overdue for disruption especially with the utility of...
McChuck: The Brits invented acoustic targeting during The Troubles. They hooked up an automated quad-.50 to their sensor system in Bosnia for counter-sniper purposes. It worked a treat! There is no reason a similar (but improved by 30 years) system couldn’t be hooked up to something like a 40mm Bofors with proximity fused shells.
Jim: Ukrainian resilience is genuinely impressive. It’s amazing what a helping of Jew-organized neo-Nazism can do to advance long-term goy operational effectiveness.
Isegoria: Janes explains a bit more: Speaking at SAE Media Group’s Air and Missile Defence conference in London, the representatives said the Skyfortress detection and tracking system combines cheap and domestically designed passive sensors to detect, track, and classify airborne threats. It consists of an array of acoustic sensors that gather information and feed this into Ukraine’s national air-defence command-and-control network, known as ‘Virazh’. The deployment of thousands of...
David Foster: Is the antiaircraft gun aimed by the system, or does a human need to aim it? If the latter, it’s hard to imagine that 6 hours of training would be sufficient.
Handle: Roo_ster, UGS! A long time ago, my unit placed a number of them near the Syrian border. Lots of false positives from driven herds of sheep, but occasionally they would help detect an incursion attempt under cover of darkness. The things were pricy in part because of the need for long battery life and the ability to phone home data via secure satellite comms. The Syrians found one and dug it up, and while this is no big deal today, at the time it was not cheap for it to be able to send it’s...
Roo_ster: Mack has it right. Being able to replace/service automobile bits comes in handy if you take them on the road. Sykes perhaps has little experience with contemporary German automobiles. Absolute trash, reliability & repair-wise. That reasonably-priced German sedan with 50k miles? Run from it in financial terror. Nothing like the German autos I owned in the 1980s.
Roo_ster: UGS Unattended Ground Sensors were developed for the old, defunct FCS program. Bet the UGS cost more than $500/ea. OTOH, I do not believe the claim of 80/84 interceptions for one minute.
David Foster: There’s a good book, American Steel, about the early days of Nucor and their pioneering installation of the first continuous casting machine in a US steel mill.
Bob Sykes: The difference between high quality German and Japanese consumer products and mediocre American and Chinese consumer products is the German/Japanese culture vs. the American/Chinese culture.
M. Mack: Except that a large, one piece casting has to be completely replaced when damaged (the structural integrity will be compromised, or it might even be completely broken) whereas a assembly made of sheet metal welded and stamped together can be, within reasonable tolerances, pulled back into shape in a body shop. Elon needs to remember toy cars don’t go out on the road and get hit by other toy cars at 20, 30, 40 or more MPH.
Jim: Industry consolidation is self-evident. Industry cooperation consists of standards boards and such. Perhaps the archetype of such is PCI-DSS. The regulatory state creates tens of millions of jobs for middle-class women. Real-estate securitization is how the banking system generates its good credit. Real-estate titling is monopolized by the court system, which lets the member banks of the Federal Reserve System inflate the unit of account (the dollar) by refinancing the nation’s real estate...
Jim: Peter Zeihan is quite right in this. Overproduction—i.e., material abundance for non-shareholders—has been Capital’s Achilles’ heel for nearly a century. Fortunately, the solution to the problem of overproduction has been found in industry consolidation and cooperation, the regulatory state, and real-estate securitization.
Bruce: Cummings is smart, and might be right. But In the Ukraine, the CIA coup (or uprising of concerned citizens) that installed Zelensky’s dictatorship (or democracy that’s in a tight spot and can’t afford elections) and expelled Russian ethnics (or those dastards just felt like travelling), the CIA coup was a marvel of competence compared to any others since CIA started. I never thought Putin would invade Ukraine. Everyone who hated Ukrainians went to the Russian side, everyone who...
Bomag: Bleh. The whole “colonialism made _____ rich” seems mainly pushed by those who think, “if I could just steal all my neighbor’s shit, I’d be set.” I’d suggest Britain became relatively wealthy in spite of, not because of, their empire building proclivities. Italy; France; Germany; others became similarly wealthy with a lot less ‘White Man Burdening.’ Also don’t like his emphasis on consumption, as if selling toilet paper is the only...
Wanweilin: Geography does matter. Compare Greece with Sub-Saharan Africa. Attica had pentelic marble and the Mediterranean versus the Sub-Saharan savanna of grass and scrub trees.

Isegoria

Crowds can beat smart people, but crowds of smart people do best of all

Comments

Leave a Reply

Search

Recent Comments

Categories

Archives