• Dec 26, 2017 News!Vol. 4, No. 1-No.3 has been indexed by EI (Inspec).   [Click]
  • Dec 26, 2017 News!Vol. 3, No. 4 has been indexed by EI (Inspec).   [Click]
  • Dec 25, 2017 News!Welcome to 2018 7th International Conference on Software and Computing Technologies (ICSCT 2018), which will be held in Kuala Lumpur during April 7-9, 2018.   [Click]
General Information
    • ISSN: 2301-3559
    • Frequency: Quarterly
    • DOI: 10.18178/LNSE
    • Editor-in-Chief: Prof. Jemal Antidze
    • Executive Editor: Ms. Nina Lee
    • Abstracting/ Indexing: EI (INSPEC, IET), Electronic Journals Library,  Ulrich's Periodicals Directory, International Computer Science Digital Library (ICSDL), ProQuest and Google Scholar.
    • E-mail: lnse@ejournal.net
Prof. Jemal Antidze
I. Vekua Scientific Institute of Applied Mathematics
Tbilisi State University, Georgia
I'm happy to take on the position of editor in chief of LNSE. We encourage authors to submit papers concerning any branch of Software Engineering.

LNSE 2014 Vol.2(3): 262-267 ISSN: 2301-3559
DOI: 10.7763/LNSE.2014.V2.134

Stemming and Lemmatization: A Comparison of Retrieval Performances

Vimala Balakrishnan and Ethel Lloyd-Yemoh
Abstract—The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result.

Index Terms—Document retrieval, language models, lemmatization, stemming.

The authors are with the Faculty of Computer Science and Information Systems, University of Malaya, Kuala Lumpur, Malaysia (e-mail: vimala.balakrishnan@um.edu.my, ethel_lloyd@siswa.um.edu.my).


Cite: Vimala Balakrishnan and Ethel Lloyd-Yemoh, "Stemming and Lemmatization: A Comparison of Retrieval Performances," Lecture Notes on Software Engineering vol. 2, no. 3, pp. 262-267, 2014.

Copyright © 2008-2015. Lecture Notes on Software Engineering. All rights reserved.
E-mail: lnse@ejournal.net