Title: | MDL Multiresolution Linear Regression Framework |
---|---|
Description: | We provide the framework to analyze multiresolution partitions (e.g. country, provinces, subdistrict) where each individual data point belongs to only one partition in each layer (e.g. i belongs to subdistrict A, province P, and country Q). We assume that a partition in a higher layer subsumes lower-layer partitions (e.g. a nation is at the 1st layer subsumes all provinces at the 2nd layer). Given N individuals that have a pair of real values (x,y) that generated from independent variable X and dependent variable Y. Each individual i belongs to one partition per layer. Our goal is to find which partitions at which highest level that all individuals in the these partitions share the same linear model Y=f(X) where f is a linear function. The framework deploys the Minimum Description Length principle (MDL) to infer solutions. The publication of this package is at Chainarong Amornbunchornvej, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong (2021) <doi:10.1145/3424670>. |
Authors: | Chainarong Amornbunchornvej [aut, cre]
|
Maintainer: | Chainarong Amornbunchornvej <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.5 |
Built: | 2025-02-04 03:09:14 UTC |
Source: | https://github.com/darkeyes/mrreg |
FindMaxHomoOptimalPartitions is a main function for inferring optimal homogeneous clusters from a multiresolution dataset DataT
.
FindMaxHomoOptimalPartitions( DataT, gamma = 0.05, insigThs = 1e-08, alpha = 0.05, minInvs = 99, polyDegree = 1, expFlag = FALSE, messageFlag = FALSE )
FindMaxHomoOptimalPartitions( DataT, gamma = 0.05, insigThs = 1e-08, alpha = 0.05, minInvs = 99, polyDegree = 1, expFlag = FALSE, messageFlag = FALSE )
DataT |
contains a multiresolution dataset s.t.
|
gamma |
is a threshold to ... |
insigThs |
is a threshold to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature. |
alpha |
is a significance level to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature. |
minInvs |
is a minimum number of individuals for a cluster to be considered for inferring eta(C)cv, otherwise, eta(C)cv=0. |
polyDegree |
is a degree of polynomial function that is used to fit the data.
If it is greater than 1, the polynomial formula is used in |
expFlag |
is an exponential flag to control the formula for data fitting.
If it is true, then the exp() formula is used in |
messageFlag |
is a flag. If it is true, the function shows the text regarding the progress of computing. |
This function returns Copt
, models
, nNodes
, invOptCls
, and minR2cv
.
Copt |
|
clustInfoRecRatio |
|
models |
|
invOptCls |
|
minR2cv |
is the value of eta(C)cv from the cluster that has the lowest eta(C)cv. |
DataT |
is an updated |
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05)
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05)
linearModelTraining is a support function for training linear models for partitions in all layers.
linearModelTraining( DataT, insigThs = 1e-08, alpha = 0.05, messageFlag = FALSE, polyDegree = 1, expFlag = FALSE )
linearModelTraining( DataT, insigThs = 1e-08, alpha = 0.05, messageFlag = FALSE, polyDegree = 1, expFlag = FALSE )
DataT |
contains a multiresolution dataset s.t.
|
insigThs |
is a threshold to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature. |
alpha |
is a significance level to determine whether a magnitude of a feature coefficient is enough so that the feature is designated as a selected feature. |
messageFlag |
is a flag. If it is true, the function shows the text regarding the progress of computing. |
polyDegree |
is a degree of polynomial function that is used to fit the data.
If it is greater than 1, the polynomial formula is used in |
expFlag |
is an exponential flag to control the formula for data fitting.
If it is true, then the exp() formula is used in |
This function returns models
and DataT
.
models |
|
DataT |
is a |
# Running linearModelTraining using simulation data DataT<-SimpleSimulation(100,type=1) obj<-linearModelTraining(DataT)
# Running linearModelTraining using simulation data DataT<-SimpleSimulation(100,type=1) obj<-linearModelTraining(DataT)
plotOptimalClustersTree is a support function for plotting the hierarchical tree of optimal clusters from FindMaxHomoOptimalPartitions function.
The red node(s) are the optimal homogeneous clusters while the gray nodes are non-optimal clusters.
plotOptimalClustersTree(resObj)
plotOptimalClustersTree(resObj)
resObj |
is an object list, which is the output of FindMaxHomoOptimalPartitions function |
No return value, called for plotting the hierarchical tree of optimal clusters.
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05) # Plotting the result plotOptimalClustersTree(obj)
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05) # Plotting the result plotOptimalClustersTree(obj)
PrintOptimalClustersResult is a support function for printing the optimal clusters from FindMaxHomoOptimalPartitions function.
PrintOptimalClustersResult(resObj, selFeature = FALSE)
PrintOptimalClustersResult(resObj, selFeature = FALSE)
resObj |
is an object list, which is the output of FindMaxHomoOptimalPartitions function |
selFeature |
is a flag. If it is true, then the function shows the selected feature(s) of each optimal cluster. |
No return value, called for printing optimal clusters.
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05) # Printing the result PrintOptimalClustersResult(obj)
# Running FindMaxHomoOptimalPartitions using simulation data DataT<-SimpleSimulation(100,type=1) obj<-FindMaxHomoOptimalPartitions(DataT,gamma=0.05) # Printing the result PrintOptimalClustersResult(obj)
SimpleSimulation is a support function for generating multiresolution datasets.
All simulation types have three layers except the type 6 has four layers.
The type-1 simulation has all individuals belong to the same homogeneous partition in the first layer.
The type-2 simulation has four homogeneous partitions in a second layer. Each partition has its own models.
The type-3 simulation has eight homogeneous partitions in a third layer. Each partition has its own models
The type-4 simulation has one homogeneous partition in a second layer, four homogeneous partitions in a third layer, and eight homogeneous partitions in a fourth layer. Each partition has its own model.
The type-5 simulation is similar to type-4 simulation but Y=h(X) is an exponential function.
The type-6 simulation is similar to type-4 simulation but Y=h(X) is a polynomial function with degree
parameter.
SimpleSimulation(indvN = 10000, type = 1, degree = 2)
SimpleSimulation(indvN = 10000, type = 1, degree = 2)
indvN |
is a number of individuals per homogeneous partition. |
type |
is a type of simulation dataset. There are four types. |
degree |
is a degree parameter of a polynomial function for type-5 simulation |
The function returns a multiresolution dataset.
DataT$X[i , d]
|
is a value of feature |
DataT$Y[i] |
is value of target variable of individual |
clsLayer[i , j]
|
is a cluster ID of individual |
DataT$TrueFeature[i] |
is equal to |
# Running SimpleSimulation to generate a dataset. DataT<-SimpleSimulation(100,type=1)
# Running SimpleSimulation to generate a dataset. DataT<-SimpleSimulation(100,type=1)