Abstract
Every year, thousands of people receive consumer product related injuries.Research indicates that online customer reviews can be processed toautonomously identify product safety issues. Early identification of safetyissues can lead to earlier recalls, and thus fewer injuries and deaths. Adataset of product reviews from Amazon.com was compiled, along with\emph{SaferProducts.gov} complaints and recall descriptions from the ConsumerProduct Safety Commission (CPSC) and European Commission Rapid Alert system. Asystem was built to clean the collected text and to extract relevant features.Dimensionality reduction was performed by computing feature relevance through aRandom Forest and discarding features with low information gain. Variousclassifiers were analyzed, including Logistic Regression, SVMs,Na{"i}ve-Bayes, Random Forests, and an Ensemble classifier. Experimentationwith various features and classifier combinations resulted in a logisticregression model with 70.2\% precision in the top 50 reviews surfaced. Thisclassifier outperforms all benchmarks set by related literature and consumerproduct safety professionals.