More data usually beats better algorithms

Friday, April 4th, 2008

Anand Rajaraman teaches a data mining class at Stanford, and he has found that more data usually beats better algorithms:

Team A came up with a very sophisticated algorithm using the Netflix data. Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB). Guess which team did better?

Team B got much better results, close to the best results on the Netflix leaderboard!
[...]
Another fine illustration of this principle comes from Google. Most people think Google’s success is due to their brilliant algorithms, especially PageRank. In reality, the two big innovations that Larry and Sergey introduced, that really took search to the next level in 1998, were:

The recognition that hyperlinks were an important measure of popularity — a link to a webpage counts as a vote for it.

The use of anchortext (the text of hyperlinks) in the web index, giving it a weight close to the page title.

First generation search engines had used only the text of the web pages themselves. The addition of these two additional data sets — hyperlinks and anchortext — took Google’s search to the next level. The PageRank algorithm itself is a minor detail — any halfway decent algorithm that exploited this additional data would have produced roughly comparable results.

The same principle also holds true for another area of great success for Google: the AdWords keyword auction model. Overture had previously proved that the model of having advertisers bid for keywords could work. Overture ranked advertisers for a given keyword based purely on their bids. Google added some additional data: the clickthrough rate (CTR) on each advertiser’s ad. Thus, to a first approximation, Google ranks advertisers by the product of their bid and their CTR (this was true in the first version of AdWords; they now use more considerations). This simple change made Google’s ad marketplace much more efficient than Overture’s. Notice that the algorithm itself is quite simple; it is the addition of the new data that made the difference.

Posted in Technology | No Comments »

gaikokumaniakku: https://www.zerohedge.co m/commodities/copper-scr appers-target-tesla-supe rchargers-metal-prices-s oar Also, the cars may or may not be well-designed but the charging stations apparently were designed to be convenient for copper thieves.
Gaikokumaniakku: “Musk, in contrast, spent more time walking assembly lines than he did walking around the design studio. ‘The brain strain of designing the car is tiny compared to the brain strain of designing the factory,’ he says.” I have heard rumors that Tesla cars cannot survive car washes or mud puddles unless they are first put into “car wash mode.” If such rumors are true, maybe the factory is well-designed but the car is badly designed.
M. Mack: Elon sounds pretty “Old School,” like Henry Ford and his right hand man “Cast Iron” Charlie Sorensen. Between them, they defined the auto industry, with the moving assembly line, one worker doing one task, and setting up the River Rouge plant to be vertically integrated, from the iron ore being turned into steel and iron in blast furnaces on site all the way to the cars rolling off the end of the line, and close to every step in between.
Elmer: Few decades back Marlin made the Supergoose, a 10-gauge bolt action shotgun with a 36-inch extra full choke barrel, took 3 1/2 inch 10-gauge shells, had a 2-round box magazine. Heavy, ugly, slow rate of fire unless you knew how to run a bolt gun, but it brought down geese at distances a 12-gauge could only dream about. Might be pretty good against drones. About the same time Ithaca made a 10 gauge 3 round semi-auto shotgun, also took 3 1/2 unch shells; IIRC, a 32 inch full choke barrel was...
Phileas Frogg: Blanket censorship of internal criticisms and critiques, without respect to their efficacy, is a sure sign of a social group lacking the internal social flexibility to withstand a serious external threat to their cohesion. The only thing keeping them together is the static nature of their circumstances; change that and there is a high likelihood of failure. Such a group is ossified and ready to break. I hesitate to put a number on it, but I’m willing to wager that a shockingly small...
Lu An Li: For McChuck and thank you. Makes sense. Unity of command.
McChuck: Just as there can only be one king, and army may only have a single commander. Command, by its very nature, rests upon a single head. Two commanders will constantly butt heads, confusing the troops and adding to the natural chaos of war. They often end up fighting each other more than the enemy. Even if they have the best of intentions, two commanders will inadvertently misdirect each others’ troops and interfere with each others’ plans.
Lucklucky: Maybe bad translation. But i think it means it is better to have 1 good and 1 bad than 2 good to be able to choose.
Lu An Li: “I believe it would be better to have one bad general than to have two good ones.” Someone explain this to me.
Gaikokumaniakku: After the lemur bites the millipede, it sprays its toxic secretion, which the lemur then rubs all over its fur. Research suggests that there is a practical purpose to this: the benzoquinone secretion functions as a natural pesticide and wards off malaria-carrying mosquitos. The secretion also acts as a narcotic, which causes the lemur to salivate profusely and enter a state of intoxication.
Freddo: If you want to reach the mass market you need an equally sized megaphone (or a lot of luck). A push by Oprah will do it, Joe Rogan probably to a lesser extend. Amazons algorithm and star rating have been wokified to the point of uselessness; goodreads fast approaching the same point. Aspiring writers get the advice to build their own internet presence, but that of course is a lot of work by itself. I like the concept of a book bomb where a group of like-minded authors do a push of a new novel.
Felix: And, no mention of romance novels. Aren’t they supposed to be 50% or better of books sold? Sure, there are big name romance writers. A lot of them, if the grocery store is indicative. But. No mention?
Felix: Jim, what are the important, unread books you have in mind?
Albion: As someone who has published a couple of books (independently) on-line I can testify no one reads them much. In a way not a problem as it was more of a hobby, and there is a certain pleasure in getting words down in roughly the right order. But in terms of return for effort, it is barely a penny an hour in all likelihood. As they say, don’t give up the day job. Equally I can go to a library or bookshop, and while there tens of thousands of books available, I know I don’t want to read...
Phileas_Frogg: Jim, Your comment is obviously true, and yet the religion of those Ellis Islanders has managed to intellectually persevere, and indeed dominate, in intellectually rigorous fields, and in particular at the most intellectually rarified branch of the government (SCOTUS is 6 Catholics, 2 Protestants, 1 Jew). It is an odd paradox. I suspect we’re seeing a selection process take place where the less intelligent, curious, and literate Catholics are ending up non-Catholic, while the more...
Jim: Pre-World Wars America was so much more literate than now that it may as well have been another planet. Nor has the displacement of Americans by the descendants of Ellis Islanders helped much.
Jim: No one buys books because no one reads books. There are books of great importance that practically no one has had access to for dozens or hundreds of years now available for free on the Internet Archive, and they have been “viewed” only a handful of times, let alone read. It would be comical were it not so tragic.
David Foster: Does this include books sold for Kindle and other e-readers? Sounds like it doesn’t. Also, I wonder how much more effectively books could be marketed by people outside the publishing establishment.
Phileas Frogg: And yet they continue to furiously, and quietly, ban or hide titles that they consider dangerous as rapidly as ever. Just ask Aleksandr Dugin, Harold Saltzman, or Jean Raspail. Odd.
Bob Sykes: One has to wonder what the aerodynamics of the huge WW II bomber formations were. Often hundreds of large aircraft flew in tight formations. PS. “Twelve O’Clock High” is still one of the best war films ever made.

Isegoria

More data usually beats better algorithms

Leave a Reply

Search

Recent Comments

Categories

Archives