Skip to contents

Function to calculate Pearson's correlation between two spatial datasets, X and Y. To replace the analytical p-value which results in a high false positive rate for autocorrelated spatial patterns, it calculates empirical p-values from empirical null distributions generated from permuting dataset X by randomly shuffling the values and then smoothing to maintain the original degree of autocorrelation of X

Usage

viladomatCorrelation(
  data,
  delta,
  maxDistPrctile,
  nPermutations,
  nThreads = 1,
  BPPARAM = NULL
)

Arguments

data

matrix A N x 4 matrix of with the first column as the values of X, the second column as the values of Y, the third column as the x-coordinates, and the fourth column as the y-coordinates.

delta

numeric vector Given a point in dataset X, the percentage of neighbors which should be within the smoothing kernel. This can be a list or a single numeric. Input to `locfit::lp` as `nn`

maxDistPrctile

numeric: Percentile of distances between pixels to use as max distance in when calculating variograms. At greater distances the variogram is less precise because there are fewer pairs of points with that distance between them. Therefore, since the goal is to minimize the difference between the variogram of X and those of its permutations, the variogram should be subsetted to the percentile that is more robust.

nPermutations

integer: Number of permutations to generate to build the empirical null distribution. This number will determine the precision of the p-value. For example, if nPermutations <- 100, then the smallest p-value is 0.01

nThreads

integer: Number of threads for parallelization. Default = 1. Inputting this argument when the BPPARAM argument is NULL would set parallel execution back-end to be BiocParallel::MulticoreParam(workers = nThreads). We recommend setting this argument to be the number of cores available (parallel::detectCores(logical = FALSE)). If BPPARAM argument is not NULL, the BPPARAM argument would override nThreads argument.

BPPARAM

BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users of BiocParallel for further flexibility for setting up parallel-execution back-end. Default is NULL. If provided, this is assumed to be an instance of BiocParallelParam.

Value

The output is returned as a list.

  • deltaStarMediannumeric, the median of the deltas that minimize the residual sum of squares across each permutation

  • deltaStarnumeric vector of length of `nPermutations`, the delta that minimizes the residual sum of squares for each permutation

  • pValueGlobalnumeric, empirical p-value for the Pearson's correlation of X and Y

  • nullCorGlobala B x 1 matrix, where B is `nPermutations`. This matrix is the correlation coefficients between the permutations and X that compose that null distribution used to calculate the empirical p-value

  • permutationsa N x B matrix, where B is `nPermutations`. Each column is the resulting values of a permutation of X

References

adapted from: Viladomat, Júlia et al. “Assessing the significance of global and local correlations under spatial autocorrelation: a nonparametric approach.” Biometrics vol. 70,2 (2014): 409-18. doi:10.1111/biom.12139

Examples


data(quakes)

#remove duplicated positions
quakes_data <- quakes[!duplicated(cbind(quakes$lat, quakes$long)),]

data <- cbind(quakes_data$depth,
              quakes_data$mag,
              quakes_data$lat,
              quakes_data$long)

#sequence of deltas
delta <- seq(0.1,0.9,0.1)

# maximum distance for the variogram set at the 25% percentile of
# the distribution of pairs of distances:
maxDistPrctile <- 0.25

#number of permutations
nPermutations <- 3

resultsPermuteX <- viladomatCorrelation(data, delta, maxDistPrctile, nPermutations)