Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis


Ukhti Ikhsani Larasati, ILKOM UNNES and Much Aziz Muslim, ILKOM UNNES and Riza Arifudin, ILKOM UNNES and Alamsyah, ILKOM UNNES (2019) Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis. Scientific Journal of Informatics, 6 (1). pp. 138-149. ISSN 2407-7658

[thumbnail of Turnitin_Improve_the_Accuracy_of_Support_Vector_Machine_Using_Chi_Square_Statistic_and_Term_Frequency_Inverse_Document_Frequency_on_Movie_Review_Sentiment_Analysis.pdf]
Preview
PDF
Download (3MB) | Preview
[thumbnail of Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis ]
Preview
PDF (Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis ) - Published Version
Download (2MB) | Preview

Abstract

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Item Type: Article
Uncontrolled Keywords: SVM, Chi square statistic, TFIDF, Sentiment Analysis, Text Classification
Subjects: T Technology > Information and Computer
T Technology > Computer Engineering
Fakultas: Fakultas Matematika dan Ilmu Pengetahuan Alam > Ilmu Komputer, S1
Depositing User: mahargjo hapsoro adi
Date Deposited: 05 Oct 2019 15:02
Last Modified: 05 Oct 2019 15:02
URI: http://lib.unnes.ac.id/id/eprint/33065

Actions (login required)

View Item View Item