• Dec 26, 2017 News!Vol. 4, No. 1-No.3 has been indexed by EI (Inspec).   [Click]
  • Dec 26, 2017 News!Vol. 3, No. 4 has been indexed by EI (Inspec).   [Click]
  • Dec 25, 2017 News!Welcome to 2018 7th International Conference on Software and Computing Technologies (ICSCT 2018), which will be held in Kuala Lumpur during April 7-9, 2018.   [Click]
General Information
    • ISSN: 2301-3559
    • Frequency: Quarterly
    • DOI: 10.18178/LNSE
    • Editor-in-Chief: Prof. Jemal Antidze
    • Executive Editor: Ms. Nina Lee
    • Abstracting/ Indexing: EI (INSPEC, IET), Electronic Journals Library,  Ulrich's Periodicals Directory, International Computer Science Digital Library (ICSDL), ProQuest and Google Scholar.
    • E-mail: lnse@ejournal.net
Editor-in-chief
Prof. Jemal Antidze
I. Vekua Scientific Institute of Applied Mathematics
Tbilisi State University, Georgia
I'm happy to take on the position of editor in chief of LNSE. We encourage authors to submit papers concerning any branch of Software Engineering.

LNSE 2014 Vol.2(4): 353-359 IS4SN: 2301-3559
DOI: 10.7763/LNSE.2014.V2.149

The GDense Algorithm for Clustering Data Streams with High Quality

Ye-In Chang, Chia-En Li, and Shu-Yi Lin
Abstract—A data streams is a sequence of dynamic, continuous, unbounded and real time data items with a very high data rate that can only be read once. In data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering can be defined formally as follows: given n data points in the d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. In the data streams environment, the difficulties of data streams clustering contain storage overhead, low clustering quality and a low updating efficiency. Therefore, in this paper, we present a new clustering algorithm with high quality, GDense, for data streams. The GDense algorithm has high quality due to two kinds of partition: cells and quadcells, and two kinds of threshold: δ and (1/4)δ. From our simulation results, no matter what condition (including the number of data points, the number of cells, the size of the sliding window, and the threshold of dense cell) is, the clustering purity of our GDense algorithm is always higher than that of the CDS-Tree algorithm.

Index Terms—Clustering, data mining, data stream, density-based, grid-based.

Ye-In Chang, Chia-En Li, and Shu-Yi Lin are with the Computer Science and Engineering Department, University of National Sun Yat-Sen University, 70 Lienhai Rd., Kaohsiung 80424, Taiwan, R.O.C. (e-mail: changyi@cse.nsysu.edu.tw, lice@db.cse.nsysu.edu.tw, suyiinformation@gmail.com).

[PDF]

Cite: Ye-In Chang, Chia-En Li, and Shu-Yi Lin, "The GDense Algorithm for Clustering Data Streams with High Quality," Lecture Notes on Software Engineering vol. 2, no. 4, pp. 353-359, 2014.

Copyright © 2008-2015. Lecture Notes on Software Engineering. All rights reserved.
E-mail: lnse@ejournal.net