Fundamental Clustering Problems Suite
The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning published in .
Table of contents
Description
The Fundamental Clustering Problems Suite (FCPS) summaries over sixty state-of-the-art clustering algorithms available in R language. An important advantage is that the input and output of clustering algorithms is simplified and consistent in order to enable users a swift execution of cluster analysis. By combining mirrored-density plots (MD plots) with statistical testing FCPS provides a tool to investigate the cluster tendency quickly prior to the cluster analysis itself
.
Common clustering challenges can be generated with arbitrary sample size
.
Additionally, FCPS sums 26 indicators with the goal to estimate the number of clusters up and provides an appropriate implementation of the clustering accuracy for more than two clusters
.
A subset of methods was used in a benchmarking of algorithms published in
.
Installation
Installation using CRAN
Install automatically with all dependencies via
install.packages("FCPS",dependencies = T) # Optionally, for the automatic installation # of all suggested packages: Suggested=c("kernlab", "cclust", "dbscan", "kohonen", "MCL", "ADPclust", "cluster", "DatabionicSwarm", "orclus", "subspace", "flexclust", "ABCanalysis", "apcluster", "pracma", "EMCluster", "pdfCluster", "parallelDist", "plotly", "ProjectionBasedClustering", "GeneralizedUmatrix", "mstknnclust", "densityClust", "parallel", "energy", "R.utils", "tclust", "Spectrum", "genie", "protoclust", "fastcluster", "clusterability", "signal", "reshape2", "PPCI", "clustrd", "smacof", "rgl", "prclust", "dendextend", "moments", "prabclus", "VarSelLCM", "sparcl", "mixtools", "HDclassif", "clustvarsel", "knitr", "rmarkdown") for(i in 1:length(Suggested)) { if (!requireNamespace(Suggested[i], quietly = TRUE)) { message(paste("Installing the package", Suggested[i])) install.packages(Suggested[i], dependencies = T) } }
Installation using Github
Please note, that dependecies have to be installed manually.
remotes::install_github("Mthrun/FCPS")
Installation using R Studio
Please note, that dependecies have to be installed manually.
Tools -> Install Packages -> Repository (CRAN) -> FCPS
Tutorial Examples
The tutorial with several examples can be found on in the vignette on CRAN:
https://cran.r-project.org/web/packages/FCPS/vignettes/FCPS.html
Manual
The full manual for users or developers is available here: https://cran.r-project.org/web/packages/FCPS/FCPS.pdf
Use Cases
Cluster Analysis of High-dimensional Data
The package FCPS provides a clear and consistent access to state-of-the-art clustering algorithms:
library(FCPS) data("Leukemia") Data=Leukemia$Distance Classification=Leukemia$Cls ClusterNo=6 CA=ADPclustering(Leukemia$DistanceMatrix,ClusterNo) Cls=ClusterRenameDescendingSize(CA$Cls) ClusterPlotMDS(Data,Cls,main = ’Leukemia’,Plotter3D = ’plotly’) ClusterAccuracy(Cls,Classification) [1] 0.9963899
Generating Typical Challenges for Clustering Algorithms
Several clustering challenge can be generated with an arbitrary sample size:
set.seed(600) library(FCPS) DataList=ClusterChallenge("Chainlink", SampleSize = 750, PlotIt=TRUE) Data=DataList$Chainlink Cls=DataList$Cls > ClusterCount(Cls) $CountPerCluster $NumberOfClusters $ClusterPercentages [1] 377 373 [1] 2 [1] 50.26667 49.73333
Cluster-Tendency
For many applications, it is crucial to decide if a dataset possesses cluster structures:
library(FCPS) set.seed(600) DataList=ClusterChallenge("Chainlink",SampleSize = 750) Data=DataList$Chainlink Cls=DataList$Cls library(ggplot2) ClusterabilityMDplot(Data)+theme_bw()
Estimation of Number of Clusters
The “FCPS” package provides up to 26 indicators to determine the number of clusters:
library(FCPS) set.seed(135) DataList=ClusterChallenge("Chainlink",SampleSize = 900) Data=DataList$Chainlink Cls=DataList$Cls Tree=HierarchicalClustering(Data,0,"SingleL")[[3]] ClusterDendrogram(Tree,4,main="Single Linkage") MaximumNumber=7 clsm <- matrix(data = 0, nrow = dim(Data)[1], ncol = MaximumNumber) for (i in 2:(MaximumNumber+1)) { clsm[,i-1] <- cutree(Tree,i) } out=ClusterNoEstimation(Data, ClsMatrix = clsm, MaxClusterNo = MaximumNumber, PlotIt = TRUE)
Additional information
| Authors website | http://www.deepbionics.org/ |
|---|---|
| License | GPL-3 |
| Dependencies | R (>= 3.5.0) |
| Bug reports | https://github.com/Mthrun/FCPS/issues |
References
- [Thrun/Stier, 2021] Thrun, M. C., & Stier, Q.: Fundamental Clustering Algorithms Suite SoftwareX, Vol. 13(C), pp. 100642. doi 10.1016/j.softx.2020.100642, 2021.
- [Thrun, 2020] Thrun, M. C.: Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plot, in Archambault, D., Nabney, I. & Peltonen, J. (eds.), Machine Learning Methods in Visualisation for Big Data, DOI 10.2312/mlvis.20201102, The Eurographics Association, Norrköping , Sweden, May, 2020.
- [Thrun/Ultsch, 2020a] Thrun, M. C., & Ultsch, A.: Clustering Benchmark Datasets Exploiting the Fundamental Clustering Problems, Data in Brief,Vol. 30(C), pp. 105501, DOI 10.1016/j.dib.2020.105501 , 2020.
- [Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, \doi{10.1016/j.artint.2020.103237}, 2021.
- [Thrun/Ultsch, 2020b] Thrun, M. C., & Ultsch, A. : Using Projection based Clustering to Find Distance and Density based Clusters in High-Dimensional Data, Journal of Classification, \doi{10.1007/s00357-020-09373-2}, Springer, 2020.



