Categories
Uncategorized

Does Building a "Smart Google" Mean We Have to Make the Same Dumb Mistakes Over Again?

Techdirt took the post title I wanted to use ("If You Liked This Post, Perhaps You'd Like To Look At The History Of Failed Recommendation Systems," clever bastards), so would Slide's algorithm try to connect me to Mike Masnick?

Fortune takes a look at some of the newer businesses out there building themselves around recommendation engines (such as Pandora, the personalized internet radio station). The company with the broadest ambitions in this space is Slide (they've recently pulled down some major financing), founded by ex-PayPal bigshot Max Levchin.

Suppose, for example, there's a user named YankeeDave who sees a Treo 750 scroll by in his Slide Show. He gives it a thumbs-up and forwards it to his buddy" we'll call him Smooth-P. Slide learns from this that both YankeeDave and Smooth-P have an interest in a smartphone and begins delivering competing prices. If YankeeDave buys the item, Slide displays headlines on Treo tips or photos of a leather case. If Smooth-P gives a thumbs-down, Slide gains another valuable piece of data. (Maybe Smooth-P is a BlackBerry guy.) Slide has also established a relationship between YankeeDave and Smooth-P and can begin comparing their ratings, traffic patterns, clicks and networks.

Based on all that information, Slide gains an understanding of people who share a taste for Treos, TAG Heuer watches and BMWs. Next, those users might see a Dyson vacuum, a pair of Forzieri wingtips or a single woman with a six-figure income living within a ten-mile radius. In fact, that's where Levchin thinks the first real opportunity lies – hooking up users with like-minded people.

Fascinating how we're constantly repeating ourselves when it comes to this specific field. Let me tell you quickly about Opencola.

Opencola was a startup Joey and I worked at (actually, it was thanks to Joey I got the job with OC) built entirely around the idea that collaborative filtering could help you sort through all the crap on the internet. Basically, after watching users and figuring out what they did (and didn't) like of the content they saw on the internet, Opencola's software would find other like-minded users and start filtering the internet for you based on their preferences. Our pithiest pitch for our vision was that we wanted to "relevance-switch" the internet (I credit our co-founder Cory Doctorow for that little bit of genius wordsmithing). Just to add to the excitement, it was open source and built on a peer-to-peer networking architecture. You could imagine what it must have been like to take that pitch on the road during 2000 (and pre-crash 2001).

Step 1: Vision.

Step 2: PowerPoint

Step 3: Sit back and open up the checks.

So, Joey and I (and every other member of the Opencola diaspora) know recommendations. Here are a few things (I think) I learned:

  1. The broader your topic space, the more difficult it is to determine relevance. It's a hard bias to shake, but we all instinctively believe that our similarity with someone in one domain indicates similarity in another. For example, a shared love of experimental jazz doesn't mean that someone else with an extensive Ornette Coleman collection shares your political views. Or your taste in films. Or your style in clothing. It's the basic problem of trying to build a recommendation engine for something as broad as the internet, or even something like eBay or Amazon.com. The only way around it is to have multiple, segmented, and deep individual profiles filled with preferences.
  2. Deep individual profiles are computationally difficult to compare. Assuming you can narrow down the topic space to something where similarity does correlate to relevance, you still have to handle the problem of comparing very deep profiles. One of the reasons the Amazon.com recommendations seem so basic is that they go for breadth, rather than depth. In other words, your purchasing history isn't carefully being compared with the purchasing history of other individuals as much as Amazon.com's simply seeing what most other people bought when they also bought the item you're currently looking at. The profile is one data point deep, but hopes the wisdom of the crowds steers customers in the right direction.
  3. Relevance and preference signals are difficult to collect. Ultimately the whole recommendation enterprise rests on the collection of signals from users. They have to indicate what they like, what they buy, etc. In some cases, the signals are explicit and obvious—if I buy something (from a stock to a CD) it's a pretty good bet that I'm sending a positive signal—but in most cases, there aren't many explicit signals to go on. So what can recommendation engines do to overcome this? Making users signal their preferences explicitly is one way to go: rating their music from zero to five stars, digging news stories, etc, are all examples of how some systems cope with the issue. Unfortunately, making people do stuff  they probably wouldn't otherwise do is often a dead end. It's effort, and people hate effort. That Pandora works for their current base of users probably has more to do with the fact that it attracts the kind of user who's predisposed to contribute to the overall effort by rating the songs they hear. To get beyond the most basic kind of recommendation algorithms, you need to pick up the subtle judgments of the individuals in the system, and that's hard to do without either being invasive (to the point of violating privacy) or forcing people to make their implicit judgments artificially explicit.

Before Opencola went through the first of many VCimposed mini-implosions (resulting in the loss of our founders and many of the original mad scientist staff, and ultimately in the ignominious garage sale of the resulting technology to Open Text), we thought we had figured out one way to make our vision real. First, we reduced the topic space: rather than relevance switch the whole damn internet, why not simply help gamers find new games that would appeal to them? That way the topic space is bounded to games and what gamers think about them, and we weren't going to try to tell you what toothpaste to buy based on your gaming preferences.

By building on a peer-to-peer network architecture, we also went some way towards solving the problem of comparing deep profiles. We elected to distribute the work to the peers themselves by having each peer constantly evaluate their similarity (based on that deep individual profile) to others, rather than trying to perform all this magic in some ginormous data center.

Finally, game software is one of those rare things where you can actually collect a lot of subtle judgments through explicit signals. Names get searched on. Demos get downloaded and installed. Demo software gets played. Perhaps the user plays it once, but maybe they play the demo several times. They trash it, or they buy the full version. Do they play it longer than the average player for that game? Do they download mods? Make their own mods? In other words, you can collect tons of meaningful data without asking people to step out of their gaming routine.

It was a nifty idea and to this day, I wonder why nobody has put all of these pieces together yet (but I assume it'll happen someday).

The hoops Opencola had to jump through all point to the difficulties of building a recommendation-based business, and automating the subtle process of tastemaking. The ultimate prize of really relevant suggestions, however, means we'll see many more millions thrown at this problem for years to come.

Link

Tags: , , , , , , , ,