| Title: | Iterative Pruning Population Admixture Inference Framework |
|---|---|
| Description: | A data clustering package based on admixture ratios (Q matrix) of population structure. The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K* that makes majority of members of two clusters are in the different clusters. This K* reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K* clusters based on maximum admixture ratio of individuals. The publication of this package is at Chainarong Amornbunchornvej, Pongsakorn Wangkumhang, and Sissades Tongsima (2020) <doi:10.1101/2020.03.21.001206>. |
| Authors: | Chainarong Amornbunchornvej [aut, cre] (ORCID: <https://orcid.org/0000-0003-3131-0370>) |
| Maintainer: | Chainarong Amornbunchornvej <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.2 |
| Built: | 2026-06-02 08:34:41 UTC |
| Source: | https://github.com/darkeyes/ipadmixture |
biclustFunc is a binary clustering function using hierarchical clustering.
biclustFunc(Qmat, admixRatioThs = 0.5, method = "average")biclustFunc(Qmat, admixRatioThs = 0.5, method = "average")
Qmat |
is a Q matrix that contains admixture ratios of all individuals where the |
admixRatioThs |
is a threshold to determine that if a cluster has |
method |
is a method parameter of |
This function returns binary clustering results.
heteroFlag |
is a flag that represents a status whether a given cluster is heterogeneous (having sub-clusters). It is TRUE if |
clusterInx |
is a vector of clustering assignment where |
meanDiffAdmixRatio |
is a vector of magnitude-difference of admixture ratios. It is calculated by splitting a given cluster into two sub-clusters. Then, we take the absolute on the difference between mean admixture ratios of sub-clusters. |
Qmat1 |
is a Q matrix of sub-cluster #1 after splitting a given cluster into two sub-clusters that contains admixture ratios of all individuals where the |
Qmat2 |
is a Q matrix of sub-cluster #2 after splitting a given cluster into two sub-clusters that contains admixture ratios of all individuals where the |
maxDiffAdmixRatio |
is a maximum of magnitude-difference of admixture ratios for a given cluster before splitting into two sub-clusters. |
# Running biclustFunc on Q matrix of 27 human population dataset where K = 12 obj<-biclustFunc(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)# Running biclustFunc on Q matrix of 27 human population dataset where K = 12 obj<-biclustFunc(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
getPhyloTree is function that reports a phylogenetic tree of clusters based on admixture analysis.
The phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters.
By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K that makes majority of members of two clusters are in the different ancestor groups.
This K reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K clusters based on maximum admixture ratio of individuals.
getPhyloTree(QmatList, indexClsVec)getPhyloTree(QmatList, indexClsVec)
QmatList |
is list of Q matrix where |
indexClsVec |
is a vector of clustering assignment where |
This function returns an object of nj tree as well as a matrix minDiffAncestorClsMat that is used as a similarity matrix.
tree |
is an object of nj tree calculated by ape::nj() function on a dissimilarity version of |
minDiffAncestorClsMat |
is a minimum-ancestor-number matrix in the group level where |
minDiffAncestorMat |
is a minimum-ancestor-number matrix in the individual level where |
# Running ipADMIXTURE on Q matrices (K=2-12) of 27 human population dataset. h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) out<-ipADMIXTURE::getPhyloTree(ipADMIXTURE::human27pop_Qmat,h27pop_obj$indexClsVec) plot(out$tree)# Running ipADMIXTURE on Q matrices (K=2-12) of 27 human population dataset. h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) out<-ipADMIXTURE::getPhyloTree(ipADMIXTURE::human27pop_Qmat,h27pop_obj$indexClsVec) plot(out$tree)
Labels of 27 human populations
human27pop_labelshuman27pop_labels
Labels of 27 human populations. :
It is a vector of labels of 544 individuals. There are 27 populations.
...
A dataset containing admixture ratios of 544 individuals from 27 human populations where the number of ancestors ranges from 2 to 12. This dataset was the result of running ADMIXTURE software developed by Zhou, H., et al. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and computing, 21(2), 261-273. on the 27-human-population dataset published by Xing, J., Watkins, W. S. et al. (2009). Fine-scaled human genetic structure revealed by SNP microarrays. Genome research, 19(5), 815-825.
human27pop_Qmathuman27pop_Qmat
A list of Q matrices of 544 individuals from 27 human populations. There are 2-12 ancestors in the list.
It is list of Q matrices that contains admixture ratios of 544 individuals from the 27 population human dataset.
human27pop_Qmat[[k]][i,j] is the admixture ratio of jth ancestor for ith individual in the (k+1)-ancestor Q matrix.
...
A data clustering package based on admixture ratios (Q matrix) of population structure.
The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K that makes majority of members of two clusters are in the different clusters. This K reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K clusters based on maximum admixture ratio of individuals.
ipADMIXTURE(Qmat, admixRatioThs, method = "average")ipADMIXTURE(Qmat, admixRatioThs, method = "average")
Qmat |
is a Q matrix that contains admixture ratios of all individuals where the |
admixRatioThs |
is a threshold to determine that if a cluster has |
method |
is a method parameter of |
This function returns clustering results in a form of an object of ipADMIXTURE class. The object contains the following items.
indexClsVec |
is a vector of clustering assignment where |
homoClusters |
is a list of cluster objects where each object contains member indices, cluster's |
maxDiffAdmixRatioVec |
is a vector of |
Qmat |
is a Q matrix that contains admixture ratios of all individuals where the |
admixRatioThs |
is a threshold to determine that if a cluster has |
Chainarong Amornbunchornvej, [email protected]
# Running ipADMIXTURE on Q matrix of 27 human population dataset where K = 12 h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)# Running ipADMIXTURE on Q matrix of 27 human population dataset where K = 12 h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
plotAdmixClusters is function that plots admixture ratios where the x axis represents individuals with cluster labels and y axis represents admixture ratios.
plotAdmixClusters(obj)plotAdmixClusters(obj)
obj |
is an object of ipADMIXTURE class. |
h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::plotAdmixClusters(h27pop_obj)h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::plotAdmixClusters(h27pop_obj)
plotClusterLeaves is function that plots clusters in a form of treemap plot. Subsquares represent clusters. Each subsquare contains cluster label (ID), number of members (N), and a maximum of manitude-difference of admixture ratios (md). A size of each subsquare represents a ratio of member numbers compared to other clusters. A color represents an md value of cluster.
plotClusterLeaves(obj)plotClusterLeaves(obj)
obj |
is an object of ipADMIXTURE class. |
h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::plotClusterLeaves(h27pop_obj)h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::plotClusterLeaves(h27pop_obj)
printClustersFromLabels is function that reports that clustering results in text mode.
printClustersFromLabels(obj, labels)printClustersFromLabels(obj, labels)
obj |
is an object of ipADMIXTURE class. |
labels |
is a vector of labels of all individuals. |
h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::printClustersFromLabels(h27pop_obj,ipADMIXTURE::human27pop_labels)h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15) ipADMIXTURE::printClustersFromLabels(h27pop_obj,ipADMIXTURE::human27pop_labels)
A dataset containing admixture ratios of 1200 individuals from 20 simulation populations where the number of ancestors ranges from 2 to 18. This dataset was the result of running LEA library developed by Frichot, E., & François, O. (2015). LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6(8), 925-929. on the 20-simulation-population dataset published by Limpiti, T., et al. (2014). iNJclust: iterative neighbor-joining tree clustering framework for inferring population structure. IEEE/ACM transactions on computational biology and bioinformatics, 11(5), 903-914.
UD1_QmatUD1_Qmat
A list of Q matrices of 1200 individuals from 20 populations. There are Q matrices that have the number of ancestors ranges from from 2 to 18.
It is list of Q matrices that contains admixture ratios of 1200 individuals from the 20-population dataset.
UD1_Qmat[[k]][i,j] is the admixture ratio of jth ancestor for ith individual in the (k+1)-ancestor Q matrix.
...
Labels of 20 simulation populations
UD1labelsUD1labels
Labels of 20 populations. :
It is a vector of labels of 1200 individuals. There are 20 populations.
...