Title: | Variable-Lag Time Series Causality Inference Framework |
---|---|
Description: | A framework to infer causality on a pair of time series of real numbers based on variable-lag Granger causality and transfer entropy. Typically, Granger causality and transfer entropy have an assumption of a fixed and constant time delay between the cause and effect. However, for a non-stationary time series, this assumption is not true. For example, considering two time series of velocity of person A and person B where B follows A. At some time, B stops tying his shoes, then running to catch up A. The fixed-lag assumption is not true in this case. We propose a framework that allows variable-lags between cause and effect in Granger causality and transfer entropy to allow them to deal with variable-lag non-stationary time series. Please see Chainarong Amornbunchornvej, Elena Zheleva, and Tanya Berger-Wolf (2021) <doi:10.1145/3441452> when referring to this package in publications. |
Authors: | Chainarong Amornbunchornvej [aut, cre]
|
Maintainer: | Chainarong Amornbunchornvej <[email protected]> |
License: | GPL-3 |
Version: | 0.1.5 |
Built: | 2025-01-30 03:21:55 UTC |
Source: | https://github.com/darkeyes/vltimeseriescausality |
checkMultipleSimulationVLtimeseries is a support function that can compare two adjacency matrices: groundtruth and inferred matrices. It re
checkMultipleSimulationVLtimeseries(trueAdjMat, adjMat)
checkMultipleSimulationVLtimeseries(trueAdjMat, adjMat)
trueAdjMat |
a groundtruth matrix. |
adjMat |
an inferred matrix. |
This function returns a list of precision prec
, recall rec
, and F1 score F1
of inferred vs. groundtruth matrices.
## Generate simulation data #G<-matrix(FALSE,10,10) # groundtruth #G[1,c(4,7,8,10)]<-TRUE #G[2,c(5,7,9,10)]<-TRUE #G[3,c(6,8,9,10)]<-TRUE #TS <- MultipleSimulationVLtimeseries() #out<-multipleVLGrangerFunc(TS) #checkMultipleSimulationVLtimeseries(trueAdjMat=G,adjMat=out$adjMat)
## Generate simulation data #G<-matrix(FALSE,10,10) # groundtruth #G[1,c(4,7,8,10)]<-TRUE #G[2,c(5,7,9,10)]<-TRUE #G[3,c(6,8,9,10)]<-TRUE #TS <- MultipleSimulationVLtimeseries() #out<-multipleVLGrangerFunc(TS) #checkMultipleSimulationVLtimeseries(trueAdjMat=G,adjMat=out$adjMat)
followingRelation is a function that infers whether Y
follows X
.
followingRelation(Y, X, timeLagWindow, lagWindow = 0.2)
followingRelation(Y, X, timeLagWindow, lagWindow = 0.2)
Y |
is a numerical time series of a follower |
X |
is a numerical time series of a leader |
timeLagWindow |
is a maximum possible time delay in the term of time steps. |
lagWindow |
is a maximum possible time delay in the term of percentage of length(X).
If |
This function returns a list of following relation variables below.
follVal |
is a following-relation value s.t. if |
nX |
is a time series that is rearranged from |
optDelay |
is the optimal time delay inferred by cross-correlation of |
optCor |
is the optimal correlation of |
optIndexVec |
is a time series of optimal warping-path from DTW that is corrected by cross correlation.
It is approximately that |
VLval |
is a percentage of elements in |
ccfout |
is an output object of |
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-followingRelation(Y=TS$Y,X=TS$X)
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-followingRelation(Y=TS$Y,X=TS$X)
GrangerFunc is a Granger Causality function. It tests whether X
Granger-causes Y
.
GrangerFunc( Y, X, maxLag = 1, alpha = 0.05, autoLagflag = TRUE, gamma = 0.5, family = gaussian )
GrangerFunc( Y, X, maxLag = 1, alpha = 0.05, autoLagflag = TRUE, gamma = 0.5, family = gaussian )
Y |
is a numerical time series of effect |
X |
is a numerical time series of cause |
maxLag |
is a maximum possible time delay. The default is 1. |
alpha |
is a significance level of F-test to determine whether |
autoLagflag |
is a flag for enabling the automatic lag inference function. The default is true. If it is set to be true, then maxLag is set automatically using cross-correlation. Otherwise, if it is set to be false, then the function takes the maxLag value to infer Granger causality. |
gamma |
is a parameter to determine whether |
family |
is a parameter of family of function for Generalized Linear Models function (glm). The default is |
This function returns of whether X
Granger-causes Y
.
ftest |
F-statistic of Granger causality. |
p.val |
A p-value from F-test. |
BIC_H0 |
Bayesian Information Criterion (BIC) derived from |
BIC_H1 |
Bayesian Information Criterion (BIC) derived from |
XgCsY |
The flag is true if |
XgCsY_ftest |
The flag is true if |
XgCsY_BIC |
The flag is true if |
maxLag |
A maximum possible time delay. |
H0 |
glm object of |
H1 |
glm object of |
BICDiffRatio |
Bayesian Information Criterion difference ratio: |
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-GrangerFunc(Y=TS$Y,X=TS$X)
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-GrangerFunc(Y=TS$Y,X=TS$X)
MultipleSimulationVLtimeseries is a support function for generating a set of time series TS[,1],...TS[,10]
.
TS[,1],TS[,2],TS[,3] are causes X
time series that are generated independently.
The rest of time series are Y
time series that are effects of some causes TS[,1],TS[,2],TS[,3].
TS[,1] causes TS[,4],TS[,7],TS[,8], and TS[,10].
TS[,2] causes TS[,5],TS[,7],TS[,9], and TS[,10].
TS[,3] causes TS[,6],TS[,8],TS[,9], and TS[,10].
MultipleSimulationVLtimeseries( n = 200, lag = 5, YstFixInx = 110, YfnFixInx = 170, XpointFixInx = 100, arimaFlag = TRUE, seedVal = -1 )
MultipleSimulationVLtimeseries( n = 200, lag = 5, YstFixInx = 110, YfnFixInx = 170, XpointFixInx = 100, arimaFlag = TRUE, seedVal = -1 )
n |
is length of time series. |
lag |
is a time lag between |
YstFixInx |
is the starting point of variable lag part. |
YfnFixInx |
is the end point of variable lag part. |
XpointFixInx |
is a point in X s.t. |
arimaFlag |
is ARMA model flag. If it is true, then |
seedVal |
is a seed parameter for generating random noise. |
This function returns a list of time series TS
.
# Generate simulation data TS <- MultipleSimulationVLtimeseries()
# Generate simulation data TS <- MultipleSimulationVLtimeseries()
multipleVLGrangerFunc is a function that infers Variable-lag Granger Causality of all pairwises of m
time series TS[,1],...TS[,m]
.
multipleVLGrangerFunc( TS, maxLag, alpha = 0.05, gamma = 0.3, autoLagflag = TRUE, causalFlag = 0, VLflag = TRUE, family = gaussian )
multipleVLGrangerFunc( TS, maxLag, alpha = 0.05, gamma = 0.3, autoLagflag = TRUE, causalFlag = 0, VLflag = TRUE, family = gaussian )
TS |
is a numerical time series of effect where |
maxLag |
is a maximum possible time delay. The default is 0.2*length(Y). |
alpha |
is a significance level of F-test to determine whether |
gamma |
is a parameter to determine whether |
autoLagflag |
is a flag for enabling the automatic lag inference function. The default is true. If it is set to be true, then maxLag is set automatically using cross-correlation. Otherwise, if it is set to be false, then the function takes the maxLag value to infer Granger causality. |
causalFlag |
is a choice of criterion for inferring causality:
|
VLflag |
is a flag of Granger causality choice: either |
family |
is a parameter of family of function for Generalized Linear Models function (glm). The default is |
This function returns of a list of an adjacency matrix of causality where adjMat[i,j]
is true if TS[,i]
causes TS[,j]
.
## Generate simulation data #TS <- MultipleSimulationVLtimeseries() ## Run the function #out<-multipleVLGrangerFunc(TS)
## Generate simulation data #TS <- MultipleSimulationVLtimeseries() ## Run the function #out<-multipleVLGrangerFunc(TS)
multipleVLTransferEntropy is a function that infers Variable-lag Transfer Entropy of all pairwises of m
time series TS[,1],...TS[,m]
.
multipleVLTransferEntropy( TS, maxLag, nboot = 0, lx = 1, ly = 1, VLflag = TRUE, autoLagflag = TRUE, alpha = 0.05 )
multipleVLTransferEntropy( TS, maxLag, nboot = 0, lx = 1, ly = 1, VLflag = TRUE, autoLagflag = TRUE, alpha = 0.05 )
TS |
is a numerical time series of effect where |
maxLag |
is a maximum possible time delay. The default is 0.2*length(Y). |
nboot |
is a number of times of bootstrapping for RTransferEntropy::transfer_entropy() function. |
lx , ly
|
are lag parameters of RTransferEntropy::transfer_entropy(). |
VLflag |
is a flag of Granger causality choice: either |
autoLagflag |
is a flag for enabling the automatic lag inference function. The default is true. If it is set to be true, then maxLag is set automatically using cross-correlation. Otherwise, if it is set to be false, then the function takes the maxLag value to infer Granger causality. |
alpha |
is a significant-level threshold for TE bootstrapping by Dimpfl and Peter (2013). |
This function returns of a list of an adjacency matrix of causality where adjMat[i,j]
is true if TS[,i]
causes TS[,j]
.
## Generate simulation data #out1<-SimpleSimulationVLtimeseries() #TS<-cbind(out1$X,out1$Y) ## Run the function #out2<-multipleVLTransferEntropy(TS,maxLag=1)
## Generate simulation data #out1<-SimpleSimulationVLtimeseries() #TS<-cbind(out1$X,out1$Y) ## Run the function #out2<-multipleVLTransferEntropy(TS,maxLag=1)
plotTimeSeries is a function for visualizing time series
plotTimeSeries(X, Y, strTitle = "Time Series Plot", TSnames)
plotTimeSeries(X, Y, strTitle = "Time Series Plot", TSnames)
X |
is a 1st numerical time series |
Y |
is a 2nd numerical time series. If it is not supplied, the function plots only |
strTitle |
is a string of the plot title |
TSnames |
is a list of legend of |
This function returns an object of ggplot class.
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function plotTimeSeries(Y=TS$Y,X=TS$X)
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function plotTimeSeries(Y=TS$Y,X=TS$X)
SimpleSimulationVLtimeseries is a support function for generating time series X,Y
where X
VL-Granger-causes Y
.
SimpleSimulationVLtimeseries( n = 200, lag = 5, YstFixInx = 110, YfnFixInx = 170, XpointFixInx = 100, arimaFlag = TRUE, seedVal = -1, expflag = FALSE, causalFlag = TRUE )
SimpleSimulationVLtimeseries( n = 200, lag = 5, YstFixInx = 110, YfnFixInx = 170, XpointFixInx = 100, arimaFlag = TRUE, seedVal = -1, expflag = FALSE, causalFlag = TRUE )
n |
is length of time series. |
lag |
is a time lag between |
YstFixInx |
is the starting point of variable lag part. |
YfnFixInx |
is the end point of variable lag part. |
XpointFixInx |
is a point in X s.t. |
arimaFlag |
is ARMA model flag. If it is true, then |
seedVal |
is a seed parameter for generating random noise.
If it is not -1, then the rnorm is set the random seed with |
expflag |
is the flag to set the relation between |
causalFlag |
is a flag. If it is true, then |
This function returns a list of time series X,Y
where X
VL-Granger-causes Y
.
# Generate simulation data TS <- SimpleSimulationVLtimeseries()
# Generate simulation data TS <- SimpleSimulationVLtimeseries()
TSNANNearestNeighborPropagation is a function that fills NA values with nearest real values in the past ( or future if the first position of time series is NA), for time series X
.
TSNANNearestNeighborPropagation(X)
TSNANNearestNeighborPropagation(X)
X |
is a T-by-D matrix numerical time series |
This function returns a list of following relation variables below.
Xout |
is a T-by-D matrix numerical time series that all NAN have been filled with nearest real values. |
# Load example data z<-1:20 z[2:5]<-NA z<-TSNANNearestNeighborPropagation(z)
# Load example data z<-1:20 z[2:5]<-NA z<-TSNANNearestNeighborPropagation(z)
VLGrangerFunc is a Variable-lag Granger Causality function. It tests whether X
VL-Granger-causes Y
.
VLGrangerFunc( Y, X, alpha = 0.05, maxLag, gamma = 0.5, autoLagflag = TRUE, family = gaussian )
VLGrangerFunc( Y, X, alpha = 0.05, maxLag, gamma = 0.5, autoLagflag = TRUE, family = gaussian )
Y |
is a numerical time series of effect |
X |
is a numerical time series of cause |
alpha |
is a significance level of f-test to determine whether |
maxLag |
is a maximum possible time delay. The default is 0.2*length(Y). |
gamma |
is a parameter to determine whether |
autoLagflag |
is a flag for enabling the automatic lag inference function. The default is true. If it is set to be true, then maxLag is set automatically using cross-correlation. Otherwise, if it is set to be false, then the function takes the maxLag value to infer Granger causality. |
family |
is a parameter of family of function for Generalized Linear Models function (glm). The default is |
This function returns of whether X
Granger-causes Y
.
ftest |
F-statistic of Granger causality. |
p.val |
A p-value from F-test. |
BIC_H0 |
Bayesian Information Criterion (BIC) derived from |
BIC_H1 |
Bayesian Information Criterion (BIC) derived from |
XgCsY |
The flag is true if |
XgCsY_ftest |
The flag is true if |
XgCsY_BIC |
The flag is true if |
maxLag |
A maximum possible time delay. |
H0 |
glm object of |
H1 |
glm object of |
follOut |
is a list of variables from function |
BICDiffRatio |
Bayesian Information Criterion difference ratio: |
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-VLGrangerFunc(Y=TS$Y,X=TS$X)
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-VLGrangerFunc(Y=TS$Y,X=TS$X)
VLTransferEntropy is a Variable-lag Transfer Entropy function. It tests whether X
VL-Transfer-Entropy-causes Y
.
VLTransferEntropy( Y, X, maxLag, nboot = 0, lx = 1, ly = 1, VLflag = TRUE, autoLagflag = TRUE, alpha = 0.05 )
VLTransferEntropy( Y, X, maxLag, nboot = 0, lx = 1, ly = 1, VLflag = TRUE, autoLagflag = TRUE, alpha = 0.05 )
Y |
is a numerical time series of effect |
X |
is a numerical time series of cause |
maxLag |
is a maximum possible time delay. The default is 0.2*length(Y). |
nboot |
is a number of times of bootstrapping for RTransferEntropy::transfer_entropy() function. |
lx , ly
|
are lag parameters of RTransferEntropy::transfer_entropy(). |
VLflag |
is a flag of Transfer Entropy choice: either |
autoLagflag |
is a flag for enabling the automatic lag inference function. The default is true. If it is set to be true, then maxLag is set automatically using cross-correlation. Otherwise, if it is set to be false, then the function takes the maxLag value to infer Granger causality. |
alpha |
is a significant-level threshold for TE bootstrapping by Dimpfl and Peter (2013). |
This function returns of whether X
(VL-)Transfer-Entropy-causes Y
.
TEratio |
is a Transfer Entropy ratio. If it is greater than one , then |
res |
is an object of output from RTransferEntropy::transfer_entropy() |
follOut |
is a list of variables from function |
XgCsY_trns |
The flag is true if |
pval |
It is a p-value for TE bootstrapping by Dimpfl and Peter (2013). |
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-VLTransferEntropy(Y=TS$Y,X=TS$X)
# Generate simulation data TS <- SimpleSimulationVLtimeseries() # Run the function out<-VLTransferEntropy(Y=TS$Y,X=TS$X)