Digital Book Genre Classification Based on Text Summaries Using the TF-IDF Method and Support Vector Machine
Abstract
The rapid growth of digital book collections poses challenges in effectively and consistently grouping books by genre. This study aims to classify book genres based on text summaries using a text mining approach. The dataset used is the Book Genre Dataset, which consists of 4,657 book documents across ten genre categories. The research process included text preprocessing, feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF), and the development of a classification model using Support Vector Machines (SVM). The data was divided into a training set (80%) and a test set (20%) using stratified sampling. The results show that the classification model achieved an accuracy of 68.78%, a precision of 69.26%, a recall of 68.78%, and an F1-score of 68.52%. The Fantasy genre achieved the best performance with an F1-score of 0.77, while the Romance genre showed the lowest performance due to an imbalance in data distribution and similarities in text characteristics with other genres. The research findings indicate that book summaries contain sufficient information to support the process of automatic genre identification. The results of this study have the potential to be utilized in the development of digital libraries and content-based book recommendation systems.


