A Comparative analysis of offline and online evaluations and discussions of research paper recommender system evaluation

2022/01/18

Offline evaluation contradict with online evaluation
Both CTR and MAP never contradicted each other
- It could still be possible that MAP over users will differ

Research Questions

Why do offline evaluations only sometimes accurately predict performance in real-world systems?
1. Human factors
  1. Wait too long to receive recommendations
  2. Presentation is unappealing
  3. Label of recommendations is suboptimal, or for commercial
  4. Older users tend to be more satisfied with recommendations than younger user
  5. Unregistered users are more concerned about privacy
2. Imperfection of offline datasets
  1. Containing only a fraction of all relevant documents
Is it possible to identify the situation where offline evaluations have predictive power?
Is it problematic that offline evaluations do not (always) have predictive power?

Reference

J. Beel, M. Genzmehr, S. Langer, A. Nürnberger, and B. Gipp, “A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation,” in Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation - RepSys ’13, Hong Kong, China, 2013, pp. 7–14. doi: 10.1145/2532508.2532511.

#recommender-system

lukkiddd. 2022, powered by Jekyll Garden

Linkedin | Github | Twitter

A Comparative analysis of offline and online evaluations and discussions of research paper recommender system evaluation

Research Questions

Reference

Links to this note