Monday, June 8, 2009

Item-Based And User-Based Analysis For Recommendation Engines

There are two main approaches to building recommendation systems, based on whether the system searches for related items or related users.

In item-based analysis, when a user likes a particular item, items related to that item are recommended.

As shown in this figure:

If items A and C are highly similar and a user likes item A, then item C is recommended to the user.

There’s two approaches to finding similar items. First was content-based analysis, where the term vector associated with the content was used. The second was collaborative filtering, where user actions such as rating, bookmarking, and so forth are used to find similar items.

In user-based analysis, users similar to the user are first determined. As shown in the blow figure:

If a user likes item A, then the same item can be recommended to other users who are similar to user A.

Similar users can be obtained by using profile-based information about the user—for example cluster the users based on their attributes, such as age, gender, geographic location, net worth, and so on.

Alternatively, you can find similar users using a collaborative-based approach by analyzing the users’ actions.

Here are some tips that may help you decide which approach is most suitable for your application:

■ If your item list doesn’t change much, it’s useful to create an item-to-item correlation table using item-based analysis. This table can then be used in the recommendation engine.

■ If your item list changes frequently, for example for news-related items, it may be useful to find related users for recommendations.

■ If the recommended item is a user, there’s no option but to find related users.

■ The dimensionality of the item and user space can be helpful in deciding which approach may be easier to implement. For example, if you have millions of users and an order of magnitude fewer items, it may be easier to do item-based analysis. Whenever users are considered, you’ll deal with sparse matrices. For example, a typical user may have bought only a handful of items from the thousands or millions of items that are available in an application.

■ If there are only a small number of users, it may be worthwhile to bootstrap your application using item-based analysis. Furthermore, there’s no reason (other than perhaps time to implement and performance) why these two approaches can’t be combined.

■ It’s been shown empirically that item-based algorithms are computationally faster to implement than user-based algorithms and provide comparable or better results.

Satnam Alag, “Collective Intelligence In Action”, Manning Publications Co., first edition, 2009.

No comments: