In the first step, we generate a simple dataset. where C1 and C2 are dominated by C3, C3 is dominated by C4, and is C4 dominated by C5. There is no dominant-distribution relation between C1 and C2.
# Simulation section
nInv<-100
initMean=10
stepMean=20
std=8
simData1<-c()
simData1$Values<-rnorm(nInv,mean=initMean,sd=std)
simData1$Group<-rep(c("C1"),times=nInv)
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C2"),times=nInv))
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+2*stepMean,sd=std) )
simData1$Group<-c(simData1$Group,rep(c("C3"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+3*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C4"),times=nInv) )
simData1$Values<-c(simData1$Values,rnorm(nInv,mean=initMean+4*stepMean,sd=std) )
simData1$Group<-c(simData1$Group, rep(c("C5"),times=nInv) )
The framework is used to analyze the data below.
## Loading required package: boot
# parameter setting
bootT=1000 # Number of times of sampling with replacement
alpha=0.05 # significance significance level
#======= input
Values=simData1$Values
Group=simData1$Group
#=============
A1<-EDOIF(Values,Group,bootT = bootT, alpha=alpha )
We print the result of our framework below.
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A ≺ B
## A dominant-distribution network density:0.900000
## Distribution: C2
## Mean:9.908279 95CI:[ 8.233366,11.511307]
## Distribution: C1
## Mean:10.316297 95CI:[ 8.585959,12.143803]
## Distribution: C3
## Mean:48.253921 95CI:[ 46.594943,49.852309]
## Distribution: C4
## Mean:71.292256 95CI:[ 69.917655,72.529361]
## Distribution: C5
## Mean:90.798027 95CI:[ 89.235872,92.461153]
## =======================================================
## Mean difference of C1 (n=100) minus C2 (n=100): C2 ⊀ C1
## :p-val 0.4040
## Mean Diff:0.408018 95CI:[ -2.038711,2.951707]
##
## Mean difference of C3 (n=100) minus C2 (n=100): C2 ≺ C3
## :p-val 0.0000
## Mean Diff:38.345642 95CI:[ 35.949213,40.661435]
##
## Mean difference of C4 (n=100) minus C2 (n=100): C2 ≺ C4
## :p-val 0.0000
## Mean Diff:61.383977 95CI:[ 59.209313,63.620465]
##
## Mean difference of C5 (n=100) minus C2 (n=100): C2 ≺ C5
## :p-val 0.0000
## Mean Diff:80.889749 95CI:[ 78.524552,83.308901]
##
## Mean difference of C3 (n=100) minus C1 (n=100): C1 ≺ C3
## :p-val 0.0000
## Mean Diff:37.937624 95CI:[ 35.705174,40.331144]
##
## Mean difference of C4 (n=100) minus C1 (n=100): C1 ≺ C4
## :p-val 0.0000
## Mean Diff:60.975959 95CI:[ 58.777147,63.036500]
##
## Mean difference of C5 (n=100) minus C1 (n=100): C1 ≺ C5
## :p-val 0.0000
## Mean Diff:80.481730 95CI:[ 78.111806,82.756794]
##
## Mean difference of C4 (n=100) minus C3 (n=100): C3 ≺ C4
## :p-val 0.0000
## Mean Diff:23.038335 95CI:[ 20.935462,25.142604]
##
## Mean difference of C5 (n=100) minus C3 (n=100): C3 ≺ C5
## :p-val 0.0000
## Mean Diff:42.544106 95CI:[ 40.239515,44.844809]
##
## Mean difference of C5 (n=100) minus C4 (n=100): C4 ≺ C5
## :p-val 0.0000
## Mean Diff:19.505772 95CI:[ 17.390359,21.637649]
The first plot is the plot of mean-difference confidence intervals
The second plot is the plot of mean confidence intervals
The third plot is a dominant-distribution network.
We generate more complicated dataset of mixture distributions. C1, C2, C3, and C4 are dominated by C5. There is no dominant-distribution relation among C1, C2, C3, and C4.
library(EDOIF)
# parameter setting
bootT=1000
alpha=0.05
nInv<-1200
start_time <- Sys.time()
#======= input
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
Values=simData3$Values
Group=simData3$Group
#=============
A3<-EDOIF(Values,Group, bootT=bootT, alpha=alpha, methodType ="perc")
A3
## EDOIF (Empirical Distribution Ordering Inference Framework)
## =======================================================
## Alpha = 0.050000, Number of bootstrap resamples = 1000, CI type = perc
## Using Mann-Whitney test to report whether A ≺ B
## A dominant-distribution network density:0.400000
## Distribution: C3
## Mean:80.745709 95CI:[ 73.558131,86.228265]
## Distribution: C1
## Mean:81.221504 95CI:[ 79.109833,83.145510]
## Distribution: C2
## Mean:82.726136 95CI:[ 80.973854,84.414111]
## Distribution: C4
## Mean:84.074850 95CI:[ 80.000069,89.452124]
## Distribution: C5
## Mean:139.373236 95CI:[ 136.120341,142.142562]
## =======================================================
## Mean difference of C1 (n=1200) minus C3 (n=1200): C3 ⊀ C1
## :p-val 0.4237
## Mean Diff:0.475795 95CI:[ -5.397095,8.138872]
##
## Mean difference of C2 (n=1200) minus C3 (n=1200): C3 ⊀ C2
## :p-val 0.4210
## Mean Diff:1.980427 95CI:[ -3.670707,9.750068]
##
## Mean difference of C4 (n=1200) minus C3 (n=1200): C3 ⊀ C4
## :p-val 0.8186
## Mean Diff:3.329142 95CI:[ -3.965649,12.651677]
##
## Mean difference of C5 (n=1200) minus C3 (n=1200): C3 ≺ C5
## :p-val 0.0000
## Mean Diff:58.627527 95CI:[ 52.286294,67.269674]
##
## Mean difference of C2 (n=1200) minus C1 (n=1200): C1 ⊀ C2
## :p-val 0.4919
## Mean Diff:1.504632 95CI:[ -1.319430,4.352950]
##
## Mean difference of C4 (n=1200) minus C1 (n=1200): C1 ⊀ C4
## :p-val 0.8608
## Mean Diff:2.853347 95CI:[ -1.587534,8.926942]
##
## Mean difference of C5 (n=1200) minus C1 (n=1200): C1 ≺ C5
## :p-val 0.0000
## Mean Diff:58.151732 95CI:[ 54.060740,61.967736]
##
## Mean difference of C4 (n=1200) minus C2 (n=1200): C2 ⊀ C4
## :p-val 0.8705
## Mean Diff:1.348715 95CI:[ -2.891923,6.957193]
##
## Mean difference of C5 (n=1200) minus C2 (n=1200): C2 ≺ C5
## :p-val 0.0000
## Mean Diff:56.647100 95CI:[ 53.014285,60.303282]
##
## Mean difference of C5 (n=1200) minus C4 (n=1200): C4 ≺ C5
## :p-val 0.0000
## Mean Diff:55.298385 95CI:[ 49.058590,60.547991]
## Time difference of 3.202724 secs
Generating A dominates B with different degrees of uniform noise
library(ggplot2)
nInv<-1000
simData3<-SimNonNormalDist(nInv=nInv,noisePer=0.01)
#plot(density(simData3$V3))
dat <- data.frame(dens = c(simData3$V3, simData3$V5)
, lines = rep(c("B", "A"), each = nInv))
#Plot.
p1<-ggplot(dat, aes(x = dens, fill = lines)) + geom_density(alpha = 0.5) +xlim(-400, 400)+ ylim(0, 0.07) + ylab("Density [0,1]") +xlab("Values") + theme( axis.text.x = element_text(face="bold",
size=12) )
theme_update(text = element_text(face="bold", size=12) )
p1$labels$fill<-"Categories"
plot(p1)
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density()`).