viladomatCorrelation
viladomatCorrelation.RdFunction to calculate Pearson's correlation between two spatial datasets, X and Y. To replace the analytical p-value which results in a high false positive rate for autocorrelated spatial patterns, it calculates empirical p-values from empirical null distributions generated from permuting dataset X by randomly shuffling the values and then smoothing to maintain the original degree of autocorrelation of X
Usage
viladomatCorrelation(
data,
delta,
maxDistPrctile,
nPermutations,
nThreads = 1,
BPPARAM = NULL
)Arguments
- data
matrixA N x 4 matrix of with the first column as the values of X, the second column as the values of Y, the third column as the x-coordinates, and the fourth column as the y-coordinates.- delta
numeric vectorGiven a point in dataset X, the percentage of neighbors which should be within the smoothing kernel. This can be a list or a single numeric. Input to `locfit::lp` as `nn`- maxDistPrctile
numeric: Percentile of distances between pixels to use as max distance in when calculating variograms. At greater distances the variogram is less precise because there are fewer pairs of points with that distance between them. Therefore, since the goal is to minimize the difference between the variogram of X and those of its permutations, the variogram should be subsetted to the percentile that is more robust.- nPermutations
integer: Number of permutations to generate to build the empirical null distribution. This number will determine the precision of the p-value. For example, ifnPermutations <- 100, then the smallest p-value is 0.01- nThreads
integer: Number of threads for parallelization. Default = 1. Inputting this argument when theBPPARAMargument isNULLwould set parallel execution back-end to beBiocParallel::MulticoreParam(workers = nThreads). We recommend setting this argument to be the number of cores available (parallel::detectCores(logical = FALSE)). IfBPPARAMargument is notNULL, theBPPARAMargument would overridenThreadsargument.- BPPARAM
BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users ofBiocParallelfor further flexibility for setting up parallel-execution back-end. Default is NULL. If provided, this is assumed to be an instance ofBiocParallelParam.
Value
The output is returned as a list.
deltaStarMediannumeric, the median of the deltas that minimize the residual sum of squares across each permutationdeltaStarnumeric vector of length of `nPermutations`, the delta that minimizes the residual sum of squares for each permutationpValueGlobalnumeric, empirical p-value for the Pearson's correlation of X and YnullCorGlobala B x 1 matrix, where B is `nPermutations`. This matrix is the correlation coefficients between the permutations and X that compose that null distribution used to calculate the empirical p-valuepermutationsa N x B matrix, where B is `nPermutations`. Each column is the resulting values of a permutation of X
References
adapted from: Viladomat, Júlia et al. “Assessing the significance of global and local correlations under spatial autocorrelation: a nonparametric approach.” Biometrics vol. 70,2 (2014): 409-18. doi:10.1111/biom.12139
Examples
data(quakes)
#remove duplicated positions
quakes_data <- quakes[!duplicated(cbind(quakes$lat, quakes$long)),]
data <- cbind(quakes_data$depth,
quakes_data$mag,
quakes_data$lat,
quakes_data$long)
#sequence of deltas
delta <- seq(0.1,0.9,0.1)
# maximum distance for the variogram set at the 25% percentile of
# the distribution of pairs of distances:
maxDistPrctile <- 0.25
#number of permutations
nPermutations <- 3
resultsPermuteX <- viladomatCorrelation(data, delta, maxDistPrctile, nPermutations)