Mulguisin Clustering Algorithm I. Comparison of Clustering Algorithms for Study of Cosmic Structure Finding
- What is the cluster? → We need Cluster finding algorithm
- MulGuiSin algorithm
- We propose a new cluster finding algorithm, MulGuiSin(MGS)
- Schematic idea of MGS
- 1. MGS with different height hide into the water tank
- 2. We drain water from the water tank
- 3. The highest MGS first appears
- 4. As draining consecutively, the next highest MGS appears
- 4. Keep draining until there is no remained MGS
- → The MGS trace galaxies top-bottom approach
- MGS algorithm
- 1. FInd galaxy with the highest density
- 2. The galaxy becomes a new cluster, "MGS"
- 3. FInd the next galaxy with second highest density and estimate the distance between "MGS" and next galaxy
- 4. If the distance is less than our criteria, the second galaxy is merged with "MGS"
- 5. This process continues until there is no galaxy
- Features of MGS algorithm
- 1. MGS can find structure from galaxy data in detail
- 2. Provide topological information
- Number of nodes
- Number of branches
- Number of children
- Link length
- Average node generation
- ...
- What is the difference between MGS and other algorithm?
- Traditional algorithm : FoF (Friends of Friends)
- Graph based algorithm : MST (Minimum spanning tree)
- Machine learning (ML) algorithm : DBSCAN (Density-based spatial clustering of applications with noise)
- Performance test
- We test performance of MGS :
- Test : How many cluster are returned
- 1. 2 controlled data
- a. simple gaussian model
- b. power-law model & noise
- 2. Comparing with other algorithms - FoF, MST, DBSCAN
- Controlled data 1 : simple gaussian model
- Check the number of clusters with changing linking-length
- Controlled data 2 : power-law model
- 50 halos ; 7041 galaxies
- MGS find well number of cluster when linking length > 4
- MST find less number of cluster than MGS
- FoF and DBSCAN is similar with MST
- Several cluster is very close to each other, other algorithms find some cluster to one giant cluster
- Controlled data 2 : power-law model & noise
- Add the noise into the controlled data 2
- We check number of clusters changing with linking-length
- The minimum member of cluster is 50
- MGS find well number of cluster when linking length > 4
- MST, FoF and DBSCAN find less number of cluster than MGS
- Several cluster are very close to each other, other algorithms find some cluster to one giant cluster
- For linking length > 12, the number of cluster in larger than 50 for MGS
- Lots of noise become cluster, while the other algorithm make one giant cluster
- In the high noise, the more noise become new clusters
- In contrast, the other algorithms make one giant cluster
- This shows that the MGS is very different algorithm comparing with other 3 algorithms
- In the beginning, we introduce that MGS find cluster in detail
- The other algorithms find cluster as one giant cluster when linking length is large
- In contrast, the MGS always trace structure of cluster. They do not make one giant cluster
- It is very different feature comparing with other algorithms
- Conclusion
- We propose a new cluster finding algorithm MGS
- The MGS shows good performance in the controlled data comparing with other algorithms
- MGS can provide lots of topological information, it can give us other perspective of research for large-scale structure