MDL Multiresolution Linear Regression Framework

In this work, we provide the framework to analyze a multiresolution partition (e.g. country, provinces, subdistrict) where each individual data point belongs to only one partition in each layer (e.g. i belongs to subdistrict A, province P, and country Q).

We assume that a partition in a higher layer subsumes lower-layer partitions (e.g. a nation is at the 1st layer subsumes all provinces at the 2nd layer).

Given N individuals that have a pair of real values (x, y) that generated from independent variable X and dependent variable Y. Each individual i belongs to one partition per layer.

Our goal is to find which partition at which highest level that all individuals in the this partition share the same linear model Y = f(X) where f is a linear function.

Explanation: FindMaxHomoOptimalPartitions(DataT,gamma)

INPUT: DataT$X[i,j] is the value of jth independent variable of ith individual.
INPUT: DataT$Y[i] is the value of dependent variable of ith individual.
INPUT: DataT$clsLayer[i,k] is the cluster label of ith individual in kth cluster layer.
OUTPUT: out$Copt[p,1] is equal to k implies that a cluster that is a pth member of the maximal homogeneous partition is at kth layer and the cluster name in kth layer is Copt[p,2]
OUTPUT: out$Copt[p,3] is “Model Information Reduction Ratio” of pth member of the maximal homogeneous partition: positive means the linear model is better than the null model.
OUTPUT: out$Copt[p,4] is η(C)_cv of pth member of the maximal homogeneous partition. The greater Copt[p,4], the higher homogeneous degree of this cluster.
OUTPUT: out$models[[k]][[j]] is the linear regression model of jth cluster in kth layer.
OUTPUT: outmodels[[k]][[j]]clustInfoRecRatio is the “Cluster Information Reduction Ratio” between the jth cluster in kth layer and its children clusters in (k+1)th layer: positive means current cluster is better than its children clusters. Hence, we should keep this cluster at the member of maximal homogeneous partition instead of its children.

library(MRReg)

## Loading required package: caret

## Loading required package: ggplot2

## Loading required package: lattice

# Generate simulation data type 4 by having 100 individuals per homogeneous partition.
DataT<-SimpleSimulation(100,type=4)

gamma <- 0.05 # Gamma parameter

out<-FindMaxHomoOptimalPartitions(DataT,gamma)

## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable
## Warning in summary.lm(submodels[[inx2]]): essentially perfect fit: summary may
## be unreliable

#Plotting optimal homogeneous tree The red nodes are homogeneous partitions. All children of a homogeneous partition node share the same linear model.

plotOptimalClustersTree(out)

#Printing optimal homogeneous partitions Selected features: 1 is reserved for an intercept, and d is a selected feature if Y[i] ~ X[i,d-1] in linear model Note that the clustInfoRecRatio values are always NA for last-layer partitions.

PrintOptimalClustersResult(out, selFeature = TRUE)

## [1] "========== List of Optimal Clusters =========="
## [1] "Layer2,ClS-C1:clustInfoRecRatio=0.08,modelInfoRecRatio=0.73, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 2
## [1] "Layer3,ClS-C11:clustInfoRecRatio=0.10,modelInfoRecRatio=0.70, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 2
## [1] "Layer3,ClS-C12:clustInfoRecRatio=0.10,modelInfoRecRatio=0.70, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 3
## [1] "Layer3,ClS-C13:clustInfoRecRatio=0.09,modelInfoRecRatio=0.52, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 4
## [1] "Layer3,ClS-C14:clustInfoRecRatio=0.10,modelInfoRecRatio=0.61, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 5
## [1] "Layer4,ClS-C21:clustInfoRecRatio=NA,modelInfoRecRatio=0.65, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 2
## [1] "Layer4,ClS-C22:clustInfoRecRatio=NA,modelInfoRecRatio=0.43, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 3
## [1] "Layer4,ClS-C23:clustInfoRecRatio=NA,modelInfoRecRatio=0.61, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 4
## [1] "Layer4,ClS-C24:clustInfoRecRatio=NA,modelInfoRecRatio=0.51, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 5
## [1] "Layer4,ClS-C25:clustInfoRecRatio=NA,modelInfoRecRatio=0.61, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 6
## [1] "Layer4,ClS-C26:clustInfoRecRatio=NA,modelInfoRecRatio=0.67, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 7
## [1] "Layer4,ClS-C27:clustInfoRecRatio=NA,modelInfoRecRatio=0.65, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 8
## [1] "Layer4,ClS-C28:clustInfoRecRatio=NA,modelInfoRecRatio=0.49, eta(C)cv=1.00"
## [1] "Selected features"
## [1] 9
## [1] "min eta(C)cv:1.000000"

- MDL Multiresolution Linear Regression Framework