Title: | Association Measurement Through Cross Rank Increments |
---|---|
Description: | Computes robust association measures that do not presuppose linearity. The xi correlation (xicor) is based on cross correlation between ranked increments. The reference for the methods implemented here is Chatterjee, Sourav (2020) <arXiv:1909.10140> This package includes the Galton peas example. |
Authors: | Susan Holmes [aut,cre], Sourav Chatterjee [aut] |
Maintainer: | Susan Holmes <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.4.1 |
Built: | 2024-11-12 03:10:06 UTC |
Source: | https://github.com/spholmes/xicor |
Inverse function to wholebinary returns the number from its expansion
backdec(rmat, sgn)
backdec(rmat, sgn)
rmat |
is a matrix of two rows, the first row of the matrix is the expansion of the integer part the second row is the binary expansion of the fractional part. |
sgn |
is the sign |
It may be necessary to make a new version of this using special functions for large integers.
Auxiliary function that takes avector and produces a single number through a Borel isomorphism using the wholebinary and backdec functions.
borelmerge(xvec)
borelmerge(xvec)
xvec |
is a vector of real numbers |
produces a single real number by converting each element
This function computes the xi coefficient between two vectors x and y.
calculateXI(xvec, yvec, simple = TRUE)
calculateXI(xvec, yvec, simple = TRUE)
xvec |
Vector of numeric values in the first coordinate. |
yvec |
Vector of numeric values in the second coordinate. |
simple |
Whether auxiliary information is kept to pass on. |
In the case simple = TRUE, function returns the value of the xi coefficient, If simple = FALSE is chosen, the function returns a list:
The xi coefficient
rearranged rank of yvec
mean(gr*(1-gr))
Auxiliary function with no checks for NA, etc.
Sourav Chatterjee, Susan Holmes
Chatterjee, S. (2020) A New Coefficient Of Correlation, <arXiv:1909.10140>.
xicor
# Compute one of the coefficients library("psychTools") data(peas) calculateXI(peas$parent,peas$child) calculateXI(peas$child,peas$parent)
# Compute one of the coefficients library("psychTools") data(peas) calculateXI(peas$parent,peas$child) calculateXI(peas$child,peas$parent)
Take fractionary part and make its binary expansion Auxiliary function used in expanding real numbers
fracbinary(x)
fracbinary(x)
x |
is a number between 0 and 1 |
Binary expansion of length 31 of the decimal input
this implementation uses the built-in function intToBits
This function computes the unidimensional graph prediction coefficient between two vectors xvec and yvec.
FRpredcor(xvec, yvec, tiemethod = "average")
FRpredcor(xvec, yvec, tiemethod = "average")
xvec |
Vector of numeric values in the first coordinate. |
yvec |
Vector of numeric values in the second coordinate. |
tiemethod |
Choice of treatment for ties, default is the "average" |
In the case simple = TRUE, function returns the value of the FR standardized coefficient.
Auxiliary function with no checks for NA, etc.
Sourav Chatterjee, Susan Holmes
Chatterjee, S. and Holmes, S (2020) Practical observations and applications of the robust prediction coefficient.
xicor FRpredcorhalf
# Compute the coefficient and compare to the xi coefficient simulCompare <- function(n = 20, B = 1000) { diffs<- rep(0,B) xvec <- 1:n for (i in 1:B) { yvec <- runif(n) diffs[i] <- FRpredcor(xvec, yvec) - xicor(xvec, yvec) } return(diffs) } simulcompare1K <- simulCompare() summary(simulcompare1K)
# Compute the coefficient and compare to the xi coefficient simulCompare <- function(n = 20, B = 1000) { diffs<- rep(0,B) xvec <- 1:n for (i in 1:B) { yvec <- runif(n) diffs[i] <- FRpredcor(xvec, yvec) - xicor(xvec, yvec) } return(diffs) } simulcompare1K <- simulCompare() summary(simulcompare1K)
This function computes the unidimensional ranked half graph prediction coefficient between two vectors xvec and yvec.
FRpredcorhalf(xvec, yvec, tiemethod = "average")
FRpredcorhalf(xvec, yvec, tiemethod = "average")
xvec |
Vector of numeric values in the first coordinate. |
yvec |
Vector of numeric values in the second coordinate. |
tiemethod |
Choice of treatment for ties, default is the "average" |
In the case simple = TRUE, function returns the value of the FR standardized coefficient.
Auxiliary function with no checks for NA, etc.
Sourav Chatterjee, Susan Holmes
Chatterjee, S. and Holmes, S (2020) Practical observations and applications of the robust prediction coefficient.
xicor FRpredcor
# Compute the coefficient and compare to the xi coefficient simulCompare <- function(n = 20, B = 1000) { diffsim <- rep(0,B) xvec <- 1:n for (i in 1:B) { yvec <- sample(n,n) diffsim[i] <- FRpredcorhalf(xvec,yvec)-xicor(xvec,yvec) } return(diffsim) } compare1K <- simulCompare() summary(compare1K)
# Compute the coefficient and compare to the xi coefficient simulCompare <- function(n = 20, B = 1000) { diffsim <- rep(0,B) xvec <- 1:n for (i in 1:B) { yvec <- sample(n,n) diffsim[i] <- FRpredcorhalf(xvec,yvec)-xicor(xvec,yvec) } return(diffsim) } compare1K <- simulCompare() summary(compare1K)
This function computes the generalized xi coefficient between two matrices xmat and ymat. There is a limitation on the size of the matrices, for the time being, xmat and ymat can only have 31 columns. If they are wider than 31, there is the option of using a dimension reduction technique to bring the number of columns down to 31, the first 31 components are then used. The function encodes the data using a binary expansion and then calls xicor on the vectors, so some of the arguments relevant for xicor can be specified, such as pvalue.
genxicor(xmat, ymat)
genxicor(xmat, ymat)
xmat |
Matrix of numeric values in the first argument. |
ymat |
Matrix of numeric values in the second argument. |
Function returns the value of the genxi coefficient. Since by default the option pvalue=TRUE is chosen, the function returns a list:
The value of the xi coefficient.
The standard deviation.
The test p-value.
This version does not use a seed as argument, if reproducibility is an issue, set a seed before calling the function.
The p-value of rejecting independence is set to TRUE.
Sourav Chatterjee, Susan Holmes
Chatterjee, S. (2022) <arXiv:2211.04702>
example_joint_calc = function(n,x=runif(n),y=runif(n),ep=runif(n)) { u = (x + y + ep) %% 1 v = ((x + y)/2 + ep) %% 1 w = (4*x/3 + 2*y/3 + ep) %% 1 z = (2*x/3 + y/3 + ep) %% 1 q = cbind(u,v,w,z) p = cbind(x,y) c1 = genxicor(u, p) c2 = genxicor(v, p) c3 = genxicor(w, p) c4 = genxicor(z, p) c5 = genxicor(q, p) return(list(marg1 = c1$xi, marg2 = c2$xi, marg3 = c3$xi, marg4 = c4$xi, joint = c5$xi, p1 = c1$pval, p2 = c2$pval, p3 = c3$pval, p4 = c4$pval, p5 = c5$pval)) } result1 <- example_joint_calc(n=10)
example_joint_calc = function(n,x=runif(n),y=runif(n),ep=runif(n)) { u = (x + y + ep) %% 1 v = ((x + y)/2 + ep) %% 1 w = (4*x/3 + 2*y/3 + ep) %% 1 z = (2*x/3 + y/3 + ep) %% 1 q = cbind(u,v,w,z) p = cbind(x,y) c1 = genxicor(u, p) c2 = genxicor(v, p) c3 = genxicor(w, p) c4 = genxicor(z, p) c5 = genxicor(q, p) return(list(marg1 = c1$xi, marg2 = c2$xi, marg3 = c3$xi, marg4 = c4$xi, joint = c5$xi, p1 = c1$pval, p2 = c2$pval, p3 = c3$pval, p4 = c4$pval, p5 = c5$pval)) } result1 <- example_joint_calc(n=10)
If the argument x is a real number the decimal portion is dropped.
numbinary(x)
numbinary(x)
x |
is a real or integer number |
the output is a binary vector of length 31
Take a matrix of two numbers given in their binary expansion one in each of the two rows and return the interleaving of the two numbers
weave(rmat, sgn)
weave(rmat, sgn)
rmat |
a matrix with two times m rows corresponding to the the expansions of the m numbers to be interleaved. |
sgn |
is the sign vector associated to the numbers to be weaved |
Auxiliary function used for generating expansion of a number, the binary expansion of length nc of the integer part is the first row and the binary expansion of length nc of the fractional part is the second row of the matrix. The sign as appended into the final list object which the function returns.
wholebinary(x, nc = 31)
wholebinary(x, nc = 31)
x |
is a decimal number |
nc |
is the length of the binary expansion and defines the number of columns of the output matrix |
This function generates a list with a binary matrix rmat with two rows and the sign sgn in a separate entry of the list.
This function computes the xi coefficient between two vectors x and y, possibly all coefficients for a matrix. If only one coefficient is computed it can be used to test independence using a Monte Carlo permutation test or through an asymptotic approximation test.
xicor( x, y = NULL, pvalue = FALSE, ties = TRUE, method = "asymptotic", nperm = 1000, factor = FALSE )
xicor( x, y = NULL, pvalue = FALSE, ties = TRUE, method = "asymptotic", nperm = 1000, factor = FALSE )
x |
Vector of numeric values in the first coordinate. |
y |
Vector of numeric values in the second coordinate. |
pvalue |
Whether or not to return the p-value of rejecting independence, if TRUE the function also returns the standard deviation of xi. |
ties |
Do we need to handle ties? If ties=TRUE the algorithm assumes that the data has ties and employs the more elaborated theory for calculating s.d. and P-value. Otherwise, it uses the simpler theory. There is no harm in putting ties = TRUE even if there are no ties. |
method |
If method = "asymptotic" the function returns P-values computed by the asymptotic theory. If method = "permutation", a permutation test with nperm permutations is employed to estimate the P-value. Usually, there is no need for the permutation test. The asymptotic theory is good enough. |
nperm |
In the case of a permutation test, |
factor |
Whether to transform integers into factors, the default is to leave them alone. |
In the case pvalue=FALSE, function returns the value of the xi coefficient, if the input is a matrix, a matrix of coefficients is returned. In the case pvalue=TRUE is chosen, the function returns a list:
The value of the xi coefficient.
The standard deviation.
The test p-value.
Dataset peas no longer available in psych, we are now using psychTools.
This version does not use a seed as argument, if reproducibility is an issue, set a seed before calling the function.
Sourav Chatterjee, Susan Holmes
Chatterjee, S. (2020) <arXiv:1909.10140>.
dcov
##---- Should be DIRECTLY executable !! ---- library("psychTools") data(peas) # Visualize the peas data library(ggplot2) ggplot(peas,aes(parent,child)) + geom_count() + scale_radius(range=c(0,5)) + xlim(c(13.5,24))+ylim(c(13.5,24))+ coord_fixed() + theme(legend.position="bottom") # Compute one of the coefficients xicor(peas$parent,peas$child,pvalue=TRUE) xicor(peas$child,peas$parent) # Compute all the coefficients xicor(peas)
##---- Should be DIRECTLY executable !! ---- library("psychTools") data(peas) # Visualize the peas data library(ggplot2) ggplot(peas,aes(parent,child)) + geom_count() + scale_radius(range=c(0,5)) + xlim(c(13.5,24))+ylim(c(13.5,24))+ coord_fixed() + theme(legend.position="bottom") # Compute one of the coefficients xicor(peas$parent,peas$child,pvalue=TRUE) xicor(peas$child,peas$parent) # Compute all the coefficients xicor(peas)