Chapter One Who''s Doing Your Thinking for You? Recommendations make life a lot easier. Want to know what movie to rent? The traditional way was to ask a friend or to see whether reviewers gave it a thumbs-up. Nowadays people are looking for Internet guidance drawn from the behavior of the masses. Some of these "preference engines" are simple lists of what''s most popular. The New York Times lists the "most emailed articles." iTunes lists the top downloaded songs. Del.icio.
us lists the most popular Internet bookmarks. These simple filters often let surfers zero in on the greatest hits. Some recommendation software goes a step further and tries to tell you what people like you enjoyed. Amazon.com tells you that people who bought The Da Vinci Code also bought Holy Blood, Holy Grail. Netflix gives you recommendations that are contingent on the movies that you yourself have recommended in the past. This is truly "collaborative filtering," because your ratings of movies help Netflix make better recommendations to others and their ratings help Netflix make better recommendations to you. The Internet is a perfect vehicle for this service because it''s really cheap for an Internet retailer to keep track of customer behavior and to automatically aggregate, analyze, and display this information for subsequent customers.
Of course, these algorithms aren''t perfect. A bachelor buying a one-time gift for a baby could, for example, trigger the program into recommending more baby products in the future. Wal-Mart had to apologize when people who searched for Martin Luther King: I Have a Dream were told they might also appreciate a Planet of the Apes DVD collection. Amazon.com similarly offended some customers who searched for "abortion" and were asked "Did you mean adoption?" The adoption question was generated automatically simply because many past customers who searched for abortion had also searched for adoption. Still, on net, collaborative filters have been a huge boon for both consumers and retailers. At Netflix, nearly two-thirds of the rented films are recommended by the site. And recommended films are rated half a star higher (on Netflix''s five-star ranking system) than films that people rent outside the recommendation system.
While lists of most-emailed articles and best-sellers tend to concentrate usage, the great thing about the more personally tailored recommendations is that they diversify usage. Netflix can recommend different movies to different people. As a result, more than 90 percent of the titles in its 50,000-movie catalog are rented at least monthly. Collaborative filters let sellers access what Chris Anderson calls the "long tail" of the preference distribution. The Netflix recommendations let its customers put themselves in rarefied market niches that used to be hard to find. The same thing is happening with music. At Pandora.com, users can type in a song or an artist that they like and almost instantaneously the website starts streaming song after song in the same genre.
Do you like Cyndi Lauper and Smash Mouth? Voila, Pandora creates a Lauper/Smash Mouth radio station just for you that plays these artists plus others that sound like them. As each song is playing, you have the option of teaching the software more about what you like by clicking "I really like this song" or "Don''t play this type of song again." It''s amazing how well this site works for both me and my kids. It not only plays music that each of us enjoys, but it also finds music that we like by groups we''ve never heard of. For example, because I told Pandora that I like Bruce Springsteen, it created a radio station that started playing the Boss and other well-known artists, but after a few songs it had me grooving to "Now" by Keaton Simons (and because of on-hand quick links, it''s easy to buy the song or album on iTunes or Amazon). This is the long tail in action because there''s no way a nerd like me would have come across this guy on my own. A similar preference system lets Rhapsody.com play more than 90 percent of its catalog of a million songs every month.
MSNBC.com has recently added its own "recommended stories" feature. It uses a cookie to keep track of the sixteen articles you''ve most recently read and uses automated text analysis to predict what new stories you''ll want to read. It''s surprising how accurate a sixteen-story history can be in kickstarting your morning reading. It''s also a bit embarrassing: in my case American Idol articles are automatically recommended. Still, Chicago law professor Cass Sunstein worries that there''s a social cost to exploiting the long tail. The more successful these personalized filters are, the more we as a citizenry are deprived of a common experience. Nicholas Negroponte, MIT professor and guru of media technology, sees in these "personalized news" features the emergence of the "Daily Me"--news publications that expose citizens only to information that fits with their narrowly preconceived preferences.
Of course, self-filtering of the news has been with us for a long time. Vice President Cheney only watches Fox News. Ralph Nader reads Mother Jones. The difference is that now technology is creating listener censorship that is diabolically more powerful. Websites like Excite.com and Zatso.net started to allow users to produce "the newspaper of me" and "a personalized newscast." The goal is to create a place "where you decide what''s the news.
" Google News allows you to personalize your newsgroups. Email alerts and RSS feeds allow you now to select "This Is the News I Want." If we want, we can now be relieved of the hassle of even glancing at those pesky news articles about social issues that we''d rather ignore. All of these collaborative filters are examples of what James Surowiecki called "The Wisdom of Crowds." In some contexts, collective predictions are more accurate than the best estimate that any member of the group could achieve. For example, imagine that you offer a $100 prize to a college class for the student with the best estimate of the number of pennies in a jar. The wisdom of the group can be found simply by calculating their average estimate. It''s been shown repeatedly that this average estimate is very likely to be closer to the truth than any of the individual estimates.
Some people guess too high, and others too low--but collectively the high and low estimates tend to cancel out. Groups can often make better predictions than individuals. On the TV show Who Wants to Be a Millionaire, "asking the audience" produces the right answer more than 90 percent of the time (while phoning an individual friend produces the right answer less than two-thirds of the time). Collaborative filtering is a kind of tailored audience polling. People who are like you can make pretty accurate guesses about what types of music or movies you''ll like. Preference databases are powerful ways to improve personal decision making. eHarmony Sings a New Tune There is a new wave of prediction that utilizes the wisdom of crowds in a way that goes beyond conscious preferences. The rise of eHarmony is the discovery of a new wisdom of crowds through Super Crunching.
Unlike traditional dating services that solicit and match people based on their conscious and articulated preferences, eHarmony tries to find out what kind of person you are and then matches you with others who the data say are most compatible. eHarmony looks at a large database of information to see what types of personalities actually are happy together as couples. Neil Clark Warren, eHarmony''s founder and driving force, studied more than 5,000 married people in the late 1990s. Warren patented a predictive statistical model of compatibility based on twenty-nine different variables related to a person''s emotional temperament, social style, cognitive mode, and relationship skills. eHarmony''s approach relies on the mother of Super Crunching techniques--the regression. A regression is a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest. In eHarmony''s case the variable of interest is how compatible a couple is likely to be. And the causal factors are twenty-nine emotional, social, and cognitive attributes of each person in the couple.
The regression technique was developed more than 100 years ago by Francis Galton, a cousin of Charles Darwin. Galton estimated the first regression line way back in 1877. Remember Orley Ashenfelter''s simple equation to predict the quality of wine? That equation came from a regression. Galton''s very first regression was also agricultural. He estimated a formula to predict the size of sweet pea seeds based on the size of their parent seeds. Galton found that the offspring of large seeds tended to be larger than the offspring of average or small seeds, but they weren''t quite as large as their large parents. Galton calculated a different regression equation and found a similar tendency for the heights of sons and fathers. The sons of tall fathers were taller than average but not quite as tall as their fathers.
In terms of the regression equation, this means that the formula predicting a son''s height will multiply the father''s height by some factor less than one. In fact, Galton estimated that every additional inch that a father was above average only contributed two-thirds of an inch to the son''s predicted height. He found the pattern again when he calculated the regression equation estimating the relationship between the IQ of parents and children. The children of smart parents were smarter than the average person but not as smart as their folks. Th.