Being accurate is not enough - how accuracy metrics have hurt recommender systems

2022/01/22


Conclusion

  • We need to judge the quality of recommendations as users see them: as recommendation lists
    • We need to create a variety of metrics that act on recommendation lists (e.g., intra-list similarity as introduced in Improving recommendation lists through topic diversificationImproving recommendation lists through topic diversification

      Accuracy is not enough
      Introduce intra-list similarity metric to measure diversity which improves user satisfaction
      Propose topic diversification method to increase diversity



      The Intra-...
      )
  • We need to understand the differences between recommender algorithms and measure them in ways beyond their ratability.
  • Users return to recommenders over a while, growing from new to experienced users. If we understand their purpose and intent, we can generate better recommendations

  • Propose new user-centric directions for evaluating recommender systems
  • We reward a travel recommender for recommending places a user has already visited instead of rewarding it for finding new places for the user to visit
  • We review 3 aspects
    • Similarity
      • Once a user rated one Star Trek movie, she would only receive recommendations for more Star Trek movies)
      • This problem is more noticeable for "cold-start" users.
      • This problem could convince a user to leave the recommendation
      • Accuracy metrics cannot see this problem.
      • One approach to solve was Improving recommendation lists through topic diversificationImproving recommendation lists through topic diversification

        Accuracy is not enough
        Introduce intra-list similarity metric to measure diversity which improves user satisfaction
        Propose topic diversification method to increase diversity



        The Intra-...
    • Recommender System - SerendipityRecommender System - Serendipity
      Supports


      [[Beyond accuracy - evaluating recommender systems by coverage and serendipity]]

      Serendipity - Unexpected and usefulness (usefulness judged by user)


      [[Being accura...
      • Recommending the highest ratability items have good accuracy but is not useful for users
        • Recommend the item already owned or consumed. Those recommendations were rarely acted on by users
      • Serendipity metric may be difficult to create without feedback from users
    • User needs and expectations
      • New users have different needs from an experienced user
        • Highly ratable items - establish trust
        • The choice of the algorithm used for new users dramatically affects the user experience
          • Getting to know you - learning new user preference in recommender systemsGetting to know you - learning new user preference in recommender systems

            Experimental design

            Goal: Measure the effectiveness of the signup process
            Metrics:

            User effort - How hard was it to sign up
            Accuracy - How well can ...
            • Suggest that popularity, item-item personalized can perform well for new users
        • Differences in language and cultural background influenced user satisfaction

Reference

S. M. McNee, J. Riedl, and J. A. Konstan, “Being accurate is not enough: how accuracy metrics have hurt recommender systems,” in CHI ’06 Extended Abstracts on Human Factors in Computing Systems, Montréal Québec Canada, Apr. 2006, pp. 1097–1101. doi: 10.1145/1125451.1125659.

#recommender-system #recommender-system/quality #recommender-system/accuracy #recommender-system/serendipity