RSS Contact

Monday, February 15, 2010

In god we trust, all others we filter with Genieo


I had an email conversation today with Matt Marshall, Matt was curios about our "Coca-Cola formula", which is totally understood. As an experienced person in the online market Matt heard the promises about a real personalization technology so many times in the past, that i could only respect his doubts about how Genieo really works.
I am afraid I cannot give the full recipe for our “secret source” but you can download the application and test it yourself, and see if its worked for you guys.
I will try to explain how we do our filtering in a few sentences.

First of all, if you have doubts if Genieo really do the job, you are right! The task of filtering the similar documents with no user explicit inputs is indeed a difficult task! We have put a lot of effort into it (and are not done yet) I believe we have come a long way on this path. 

But, we approach the problem from a new and different perspective and angle and managed to avoid some of the pitfalls other has stumbled into.
We have carefully studied what other people have done in this field, their success and failures and learned from it.
I will explain how we do our filtering in a few sentences.
• We do not use explicit user feedback but do use implicit feedback about the user reading patterns that helps estimate if he likes an article or not.
• We will not bring the user a new article based on a single page he reads but will look for enough information to estimate where his area of interest is (if he reads about the Yankees we will focus on the Yankees, if he focus on Derek Jeter we will follow-up on that). 
The more you read the better the system accuracy is.
• We try to distinguish between a long and short terms area of interest and provide the article in the right context.
• We make sure to provide our users with relevant data from sources they trust so they should not be overflowed with articles.
• We do sometimes make mistakes but try to learn from them and improve ;)

On the algorithms side: 
We use semantic analysis and a simple set of clustering algorithms enhanced with Heuristics designed for the task.  We weight different page according to text, title, first paragraph & URL. 
We feed this analysis into our “topic extraction engine” and we later use those topics as a filter for new articles. We do not use a complex statistical model or comprehensive topics ontology we maintain our ontology on a user level.
I am afraid I cannot give the full recipe for our “secret source” but you can download the application and test it yourself, and see if you like it.

Thanks to Dotan Emanuel, Our VP R&D to address this question.

0 comments: