Research on Keyword Extraction Algorithm Based on Improved TF-IDF
Jia Qiang1, Feng Xiwei1, Wang Zhifeng1, Zhu Rui1, Qin Hang2
1.School of Computer and Communicating Engineering，Liaoning Shihua University， Fushun Liaoning 113001, China；2.Teacher Continuing Education School of Wanghua District， Fushun City of Liaoning Province, Fushun Liaoning 113001, China
Abstract：In the text feature word extraction algorithm,TF-IDF algorithm is the most common feature weight calculation method. On the basis of the traditional TF-IDF extract algorithm, a new keyword extraction algorithm based on the text word length is proposed.Using chinese phrase word segmentation technique to identify long words and ordinary words in text,the proposed TF-IDF-WL method is used to recompute weights for different lengths of words, and the keywords are sorted by weights. Experimental results show that the new feature word extraction algorithm can more accurately reflect the lexical length of the feature words.Compared with the traditional TF-IDF algorithm, the algorithm has greatly improved accuracy and recall rate.