Being accurate is not enough - how accuracy metrics have hurt recommender systems

2022/01/22

Conclusion

We need to judge the quality of recommendations as users see them: as recommendation lists
- We need to create a variety of metrics that act on recommendation lists (e.g., intra-list similarity as introduced in Improving recommendation lists through topic diversificationImproving recommendation lists through topic diversification
  
  Accuracy is not enough
  Introduce intra-list similarity metric to measure diversity which improves user satisfaction
  Propose topic diversification method to increase diversity
  
  The Intra-...)
We need to understand the differences between recommender algorithms and measure them in ways beyond their ratability.
Users return to recommenders over a while, growing from new to experienced users. If we understand their purpose and intent, we can generate better recommendations
Propose new user-centric directions for evaluating recommender systems
We reward a travel recommender for recommending places a user has already visited instead of rewarding it for finding new places for the user to visit
We review 3 aspects
- Similarity
  - Once a user rated one Star Trek movie, she would only receive recommendations for more Star Trek movies)
  - This problem is more noticeable for "cold-start" users.
  - This problem could convince a user to leave the recommendation
  - Accuracy metrics cannot see this problem.
  - One approach to solve was Improving recommendation lists through topic diversificationImproving recommendation lists through topic diversification
    
    Accuracy is not enough
    Introduce intra-list similarity metric to measure diversity which improves user satisfaction
    Propose topic diversification method to increase diversity
    
    The Intra-...
- Recommender System - SerendipityRecommender System - Serendipity
  Supports
  
  [[Beyond accuracy - evaluating recommender systems by coverage and serendipity]]
  
  Serendipity - Unexpected and usefulness (usefulness judged by user)
  
  [[Being accura...
  - Recommending the highest ratability items have good accuracy but is not useful for users
    - Recommend the item already owned or consumed. Those recommendations were rarely acted on by users
  - Serendipity metric may be difficult to create without feedback from users
- User needs and expectations
  - New users have different needs from an experienced user
    - Highly ratable items - establish trust
    - The choice of the algorithm used for new users dramatically affects the user experience
      - Getting to know you - learning new user preference in recommender systemsGetting to know you - learning new user preference in recommender systems
        
        Experimental design
        
        Goal: Measure the effectiveness of the signup process
        Metrics:
        
        User effort - How hard was it to sign up
        Accuracy - How well can ...
        
        Suggest that popularity, item-item personalized can perform well for new users
    - Differences in language and cultural background influenced user satisfaction

Reference

S. M. McNee, J. Riedl, and J. A. Konstan, “Being accurate is not enough: how accuracy metrics have hurt recommender systems,” in CHI ’06 Extended Abstracts on Human Factors in Computing Systems, Montréal Québec Canada, Apr. 2006, pp. 1097–1101. doi: 10.1145/1125451.1125659.

#recommender-system #recommender-system/quality #recommender-system/accuracy #recommender-system/serendipity

lukkiddd. 2022, powered by Jekyll Garden

Linkedin | Github | Twitter

Being accurate is not enough - how accuracy metrics have hurt recommender systems

Conclusion

Reference

Links to this note