In the first step, we generate a simple dataset. where C1 and C2 are dominated by C3, C3 is dominated by C4, and is C4 dominated by C5. There is no dominant-distribution relation between C1 and C2.
# Simulation section
nInv<-100
initMean=10
stepMean=20
std=8
simData1<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("C1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C5"),times=nInv) )The framework is used to analyze the data below.
## Loading required package: boot
# parameter setting
bootT=1000 # Number of times of sampling with replacement
alpha=0.05 # significance significance level
#======= input
Values=simData1$Values
Group=simData1$Group
#=============
A1<-EDOIF(Values,Group,bootT = bootT, alpha=alpha )We print the result of our framework below.
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A ≺ B
## A dominant-distribution network density:0.900000
## Distribution: C1
## Mean:10.390372 95CI:[ 8.566748,12.156405]
## Distribution: C2
## Mean:10.561377 95CI:[ 9.016444,12.057158]
## Distribution: C3
## Mean:48.775060 95CI:[ 47.284140,50.350988]
## Distribution: C4
## Mean:70.211303 95CI:[ 68.734203,71.754436]
## Distribution: C5
## Mean:89.758513 95CI:[ 88.130031,91.277694]
## =======================================================
## Mean difference of C2 (n=100) minus C1 (n=100): C1 ⊀ C2
## :p-val 0.4548
## Mean Diff:0.171005 95CI:[ -2.292575,2.582627]
##
## Mean difference of C3 (n=100) minus C1 (n=100): C1 ≺ C3
## :p-val 0.0000
## Mean Diff:38.384688 95CI:[ 35.964144,40.613780]
##
## Mean difference of C4 (n=100) minus C1 (n=100): C1 ≺ C4
## :p-val 0.0000
## Mean Diff:59.820931 95CI:[ 57.642098,62.198030]
##
## Mean difference of C5 (n=100) minus C1 (n=100): C1 ≺ C5
## :p-val 0.0000
## Mean Diff:79.368142 95CI:[ 77.005736,81.829558]
##
## Mean difference of C3 (n=100) minus C2 (n=100): C2 ≺ C3
## :p-val 0.0000
## Mean Diff:38.213683 95CI:[ 36.000088,40.254205]
##
## Mean difference of C4 (n=100) minus C2 (n=100): C2 ≺ C4
## :p-val 0.0000
## Mean Diff:59.649926 95CI:[ 57.293507,61.787610]
##
## Mean difference of C5 (n=100) minus C2 (n=100): C2 ≺ C5
## :p-val 0.0000
## Mean Diff:79.197137 95CI:[ 77.147087,81.414555]
##
## Mean difference of C4 (n=100) minus C3 (n=100): C3 ≺ C4
## :p-val 0.0000
## Mean Diff:21.436243 95CI:[ 19.378245,23.564621]
##
## Mean difference of C5 (n=100) minus C3 (n=100): C3 ≺ C5
## :p-val 0.0000
## Mean Diff:40.983454 95CI:[ 38.764582,43.173473]
##
## Mean difference of C5 (n=100) minus C4 (n=100): C4 ≺ C5
## :p-val 0.0000
## Mean Diff:19.547211 95CI:[ 17.478842,21.703990]
The first plot is the plot of mean-difference confidence intervals
The second plot is the plot of mean confidence intervals
The third plot is a dominant-distribution network.
We generate more complicated dataset of mixture distributions. C1, C2, C3, and C4 are dominated by C5. There is no dominant-distribution relation among C1, C2, C3, and C4.
library(EDOIF)
# parameter setting
bootT=1000
alpha=0.05
nInv<-1200
start_time <- Sys.time()
#======= input
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
Values=simData3$Values
Group=simData3$Group
#=============
A3<-EDOIF(Values,Group, bootT=bootT, alpha=alpha, methodType ="perc")
A3## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A ≺ B
## A dominant-distribution network density:0.600000
## Distribution: C4
## Mean:80.478760 95CI:[ 75.004497,85.388057]
## Distribution: C2
## Mean:80.536030 95CI:[ 77.542143,83.300424]
## Distribution: C1
## Mean:82.375342 95CI:[ 80.384265,84.281310]
## Distribution: C3
## Mean:83.275574 95CI:[ 81.821495,84.734183]
## Distribution: C5
## Mean:140.263415 95CI:[ 136.659268,144.460782]
## =======================================================
## Mean difference of C2 (n=1200) minus C4 (n=1200): C4 ⊀ C2
## :p-val 0.3794
## Mean Diff:0.057270 95CI:[ -5.672094,6.455249]
##
## Mean difference of C1 (n=1200) minus C4 (n=1200): C4 ⊀ C1
## :p-val 0.7169
## Mean Diff:1.896582 95CI:[ -3.317803,7.411728]
##
## Mean difference of C3 (n=1200) minus C4 (n=1200): C4 ≺ C3
## :p-val 0.0465
## Mean Diff:2.796814 95CI:[ -2.250009,8.417247]
##
## Mean difference of C5 (n=1200) minus C4 (n=1200): C4 ≺ C5
## :p-val 0.0000
## Mean Diff:59.784655 95CI:[ 53.784514,66.029768]
##
## Mean difference of C1 (n=1200) minus C2 (n=1200): C2 ⊀ C1
## :p-val 0.8116
## Mean Diff:1.839312 95CI:[ -1.862882,5.300996]
##
## Mean difference of C3 (n=1200) minus C2 (n=1200): C2 ⊀ C3
## :p-val 0.0843
## Mean Diff:2.739544 95CI:[ -0.421932,5.995455]
##
## Mean difference of C5 (n=1200) minus C2 (n=1200): C2 ≺ C5
## :p-val 0.0000
## Mean Diff:59.727386 95CI:[ 55.160493,64.723166]
##
## Mean difference of C3 (n=1200) minus C1 (n=1200): C1 ≺ C3
## :p-val 0.0109
## Mean Diff:0.900232 95CI:[ -1.660579,3.420197]
##
## Mean difference of C5 (n=1200) minus C1 (n=1200): C1 ≺ C5
## :p-val 0.0000
## Mean Diff:57.888074 95CI:[ 53.867889,62.440454]
##
## Mean difference of C5 (n=1200) minus C3 (n=1200): C3 ≺ C5
## :p-val 0.0000
## Mean Diff:56.987842 95CI:[ 52.569633,61.048346]
## Time difference of 3.007787 secs
Generating \(A\) dominates \(B\) with different degrees of uniform noise
library(ggplot2)
nInv<-1000
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
#plot(density(simData3$V3))
dat <- data.frame(dens = c(simData3$V3, simData3$V5)
, lines = rep(c("B", "A"), each = nInv))
#Plot.
p1<-ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5) +xlim(-400, 400)+ ylim(0, 0.07) + ylab("Density [0,1]") +xlab("Values") + theme( axis.text.x = element_text(face="bold",
size=12) )
theme_update(text = element_text(face="bold", size=12) )
p1$labels$fill<-"Categories"
plot(p1)## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).