cluster analysis(redirected from Data clustering)
Also found in: Medical, Financial, Wikipedia.
cluster analysis[′kləs·tər ə′nal·ə·səs]
cluster analysisa technique used to identify groups of objects or people that can be shown to be relatively distinct within a data set. The characteristics of those people within each cluster can then be explored. In market research, for example, cluster analysis has been used to identify groups of people for whom different marketing approaches would be appropriate.
There is a rich variety of clustering methods available. A common method is hierarchical clustering which can work either from ‘bottom up’ or from ‘top down’. In ‘agglomerative hierarchical clustering’ (i.e. bottom up), the process begins with as many ‘clusters’ as cases. Using a mathematical criterion such as the standardized Euclidean distance, objects or people are successively joined together into clusters. In ‘divisive hierarchical clustering’ (i.e. top down), the process starts with one single cluster containing all cases, which is then broken down into smaller clusters.
There are many practical problems involved in the use of cluster analysis. The selection of variables to be included in the analysis, the choice of distance measure and the criteria for combining cases into clusters are all crucial. Because the selected clustering method can itself impose a certain amount of structure on the data, it is possible for spurious clusters to be obtained. In general, several different methods should be used. (See Anderberg, 1973, and Everitt, 1974, for full discussions of methods.)