Why Wine Ratings Are Badly Flawed

Wednesday, November 25th, 2009

Acting on an informant’s tip, in June 1973, French tax inspectors barged into the offices of the 155-year-old Cruse et Fils Frères wine shippers. Eighteen men were eventually prosecuted by the French government, accused, among other things, of passing off humble wines from the Languedoc region as the noble and five-times-as-costly wine of Bordeaux. During the trial it came out that the Bordeaux wine merchants regularly defrauded foreigners. One vat of wine considered extremely inferior, for example, was labeled “Salable as Beaujolais to Americans.”

In this climate, lawyer-turned-wine-critic Robert M. Parker Jr. created his 100-point scale — which went on to become very, very influential:

According to a 2001 study of Bordeaux wines, a one-point bump in Robert Parker’s wine ratings averages equates to a 7% increase in price, and the price difference can be much greater at the high end.

But these wine ratings are flawed, according to two studies published in the Journal of Wine Economics:

In his first study, each year, for four years, Mr. Hodgson served actual panels of California State Fair Wine Competition judges — some 70 judges each year — about 100 wines over a two-day period. He employed the same blind tasting process as the actual competition. In Mr. Hodgson’s study, however, every wine was presented to each judge three different times, each time drawn from the same bottle.

The results astonished Mr. Hodgson. The judges’ wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. A wine rated 91 on one tasting would often be rated an 87 or 95 on the next. Some of the judges did much worse, and only about one in 10 regularly rated the same wine within a range of ±2 points.

Mr. Hodgson also found that the judges whose ratings were most consistent in any given year landed in the middle of the pack in other years, suggesting that their consistent performance that year had simply been due to chance.

Mr. Hodgson said he wrote up his findings each year and asked the board for permission to publish the results; each year, they said no. Finally, the board relented — according to Mr. Hodgson, on a close vote — and the study appeared in January in the Journal of Wine Economics.

“I’m happy we did the study,” said Mr. Pucilowski, “though I’m not exactly happy with the results. We have the best judges, but maybe we humans are not as good as we say we are.”

This September, Mr. Hodgson dropped his other bombshell. This time, from a private newsletter called The California Grapevine, he obtained the complete records of wine competitions, listing not only which wines won medals, but which did not. Mr. Hodgson told me that when he started playing with the data he “noticed that the probability that a wine which won a gold medal in one competition would win nothing in others was high.” The medals seemed to be spread around at random, with each wine having about a 9% chance of winning a gold medal in any given competition.

To test that idea, Mr. Hodgson restricted his attention to wines entering a certain number of competitions, say five. Then he made a bar graph of the number of wines winning 0, 1, 2, etc. gold medals in those competitions. The graph was nearly identical to the one you’d get if you simply made five flips of a coin weighted to land on heads with a probability of 9%. The distribution of medals, he wrote, “mirrors what might be expected should a gold medal be awarded by chance alone.”

Leave a Reply