Download PDFOpen PDF in browser

Create List of Stopwords and Typing Error by TF-IDF Weight Value

EasyChair Preprint 1410

4 pagesDate: August 24, 2019

Abstract

On these days, development of SNS generate huge text data. It is most important things to remove the meaningless words, stopwords and typing error to analyse text data. In English, it grew rapidly to create stopwords dictionary. However, there are few researchs in Korea for Korean language. In this research, we suggest way to firter stopwords and typing errors out by words importance with TF-IDF algorithm. First, calculate TF-IDF value from collected data. Second, decide criteria to separate to two groups by TF-IDF value and transform to n*2 matrix. Third, calculate accumulative frequency of TF-IDF weight. In this way, new accumulative frequency is gotten without stopwords and typing error. Furthermore, this method can be used in both language : Korean and English. without creating stopwords dictionary.

Keyphrases: Preprocessing, Stopwords, TF-IDF, text mining

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:1410,
  author    = {Woo-Seok Choi and Ki-Cheol Yoo and Sang-Hyun Choi},
  title     = {Create List of Stopwords and Typing Error  by TF-IDF Weight Value},
  howpublished = {EasyChair Preprint 1410},
  year      = {EasyChair, 2019}}
Download PDFOpen PDF in browser