Research Article
Predicting Bestsellers and Key Drivers using Explainable Machine Learning
Sapyeong Publishing
Sungkyunkwan University
Sungkyunkwan University
Published: January 2025 · Vol. 54, No. 1 · pp. 81-108
DOI: https://doi.org/10.17287/kmr.2025.54.1.81
Full Text PDF
Abstract
Research on predicting sales volumes or identifying bestsellers in online bookstores often relies on post-publication data, such as sales records and customer reviews, which poses a cold start problem for new books. This study addresses this issue by developing a machine learning model based solely on metadata from newly released literary books. Among the tested models, LightGBM exhibits the best predictive performance. Using feature importance analysis and the SHAP method, we identify key factors influencing bestseller prediction, including author frequency, publisher frequency, category frequency, price, and publication month. Our findings provide a solution to the cold start problem and offer actionable insights for online bookstores to anticipate a book’s success potential and refine marketing strategies.
