• Aug 25, 2016 News!Vol.4, No.3 has been published with online version. 15 peer reviewed articles from 3 specific areas are published in this issue.   [Click]
  • May 03, 2016 News!Vol. 3, No. 3 has been indexed by EI (Inspec).   [Click]
  • May 03, 2016 News!Vol. 3, No. 2 has been indexed by EI (Inspec).   [Click]
General Information
    • ISSN: 2301-3559
    • Frequency: Quarterly
    • DOI: 10.18178/LNSE
    • Editor-in-Chief: Prof. Jemal Antidze
    • Executive Editor: Ms. Nina Lee
    • Abstracting/ Indexing: EI (INSPEC, IET), DOAJ, Electronic Journals Library, Engineering & Technology Digital Library, Ulrich's Periodicals Directory, International Computer Science Digital Library (ICSDL), ProQuest and Google Scholar.
    • E-mail: lnse@ejournal.net
Editor-in-chief
Prof. Jemal Antidze
I. Vekua Scientific Institute of Applied Mathematics
Tbilisi State University, Georgia
I'm happy to take on the position of editor in chief of LNSE. We encourage authors to submit papers concerning any branch of Software Engineering.

LNSE 2014 Vol.2(4): 353-359 IS4SN: 2301-3559
DOI: 10.7763/LNSE.2014.V2.149

The GDense Algorithm for Clustering Data Streams with High Quality

Ye-In Chang, Chia-En Li, and Shu-Yi Lin
Abstract—A data streams is a sequence of dynamic, continuous, unbounded and real time data items with a very high data rate that can only be read once. In data mining, clustering is one of useful techniques for discovering interesting data in the underlying data objects. The problem of clustering can be defined formally as follows: given n data points in the d-dimensional metric space, partition the data points into k clusters such that the data points within a cluster are more similar to each other than data points in different clusters. In the data streams environment, the difficulties of data streams clustering contain storage overhead, low clustering quality and a low updating efficiency. Therefore, in this paper, we present a new clustering algorithm with high quality, GDense, for data streams. The GDense algorithm has high quality due to two kinds of partition: cells and quadcells, and two kinds of threshold: δ and (1/4)δ. From our simulation results, no matter what condition (including the number of data points, the number of cells, the size of the sliding window, and the threshold of dense cell) is, the clustering purity of our GDense algorithm is always higher than that of the CDS-Tree algorithm.

Index Terms—Clustering, data mining, data stream, density-based, grid-based.

Ye-In Chang, Chia-En Li, and Shu-Yi Lin are with the Computer Science and Engineering Department, University of National Sun Yat-Sen University, 70 Lienhai Rd., Kaohsiung 80424, Taiwan, R.O.C. (e-mail: changyi@cse.nsysu.edu.tw, lice@db.cse.nsysu.edu.tw, suyiinformation@gmail.com).

[PDF]

Cite: Ye-In Chang, Chia-En Li, and Shu-Yi Lin, "The GDense Algorithm for Clustering Data Streams with High Quality," Lecture Notes on Software Engineering vol. 2, no. 4, pp. 353-359, 2014.

Copyright © 2008-2015. Lecture Notes on Software Engineering. All rights reserved.
E-mail: lnse@ejournal.net