spatialCorrelationGeneExpIterPermutations — spatialCorrelationGeneExpIterPermutations • STcompare

Function to calculate Pearson's correlation between assays from two SpatialExperiment datasets using an iterative permutation strategy. It first evaluates all genes with a smaller number of permutations, then only reruns genes that remain potentially significant with progressively larger permutation counts to refine their empirical p-values while accounting for original degree of autocorrelation.

Usage

spatialCorrelationGeneExpIterPermutations(
  input,
  alpha = 0.05,
  nPermutations = c(100, 1000),
  deltaX = NULL,
  deltaY = NULL,
  maxDistPrctile = 0.25,
  returnPermutations = FALSE,
  assayName = NULL,
  nThreads = 1,
  BPPARAM = NULL,
  verbose = TRUE,
  seed = 0,
  adjustMethod = "BH"
)

Arguments

input: list List of two SpatialExperiment objects with matched spatial locations. The first element corresponds to the first SpatialExperiment (`X`), and the second to the second SpatialExperiment (`Y`). The SpatialCoords of the two SpatialExperiment objects should be on the same coordinate framework and observations at the same coordinate location in both datasets should have the same row names. If the SpatialExperiment objects do not have shared locations, use `SEraster::rasterizeGeneExpression()` to generate SpatialExperiment objects with shared pixel locations. See assayName parameter if the SpatialExperiment objects have more than one assay.
alpha: numeric: significance threshold used to decide which genes should be rerun at the next permutation level. After each iteration, a gene is carried forward only when both pValuePermuteX and pValuePermuteY are less than alpha / nPermutations[k]. Default is 0.05.
nPermutations: numeric vector: numbers of permutations to use across iterative rounds. The vector is applied from smallest to largest. All genes are first tested with nPermutations[1]; genes passing the screening rule defined by alpha are rerun with nPermutations[2], and so on. Default is c(100, 1000).
deltaX: list: List of single numerics or list of numeric vectors to use for delta, the parameter controlling the degree of smoothing in permutations of X. The length of the list should be the same as the number of rows in the SpatialExperiment. Delta is a proportion calculated by dividing k neighbors by N total observations (columns) in X, where k is the number of neighbors in the permutation of X that should be within the radius smoothed by the Gaussian kernel to achieve the amount of autocorrelation present in the original X. If a single delta is not known, a sequence of deltas can be inputted and the best delta will be found such that it minimizes the sum of squares of the residuals between the variogram of the permutation generated from the delta and the variogram of the target. Default is NULL. If no value is supplied for deltaX, seq(0.1,0.9,0.1), the sequence of every 0.1 from 0.1 to 0.9, will be used to find the best delta for each row (gene) in X.
deltaY: list: List of single numerics or list of numeric vectors to use for delta, the parameter controlling the degree of smoothing in permutations of Y. deltaY is like deltaX but for permuting data in Y instead of X. Default is NULL. If no value is supplied for deltaY, seq(0.1,0.9,0.1), the sequence of every 0.1 from 0.1 to 0.9, will be used to find the best delta for permutations for each row (gene) in Y.
maxDistPrctile: numeric: percentile of distances between pixels to use as max distance in when calculating variograms. Default = 0.25. At greater distances the variogram is less precise because there are fewer pairs of points with that distance between them. Therefore, since the goal is to minimize the difference between the variogram of X and those of its permutations, the variogram should be subsetted to the percentile that is more robust.
returnPermutations: logical: indicate whether the dataframe returned as output will have columns with the values of the permutations used to calculate the null correlations and empirical p-values. Default is FALSE.
assayName: character or integer A character string or numeric specifying the assay in the SpatialExperiment to use. Default is NULL. If no value is supplied for assayName, then the first assay is used as a default.
nThreads: integer: Number of threads for parallelization. Default = 1. Inputting this argument when the BPPARAM argument is NULL would set parallel execution back-end to be BiocParallel::MulticoreParam(workers = nThreads). We recommend setting this argument to be the number of cores available (parallel::detectCores(logical = FALSE)). If BPPARAM argument is not NULL, the BPPARAM argument would override nThreads argument.
BPPARAM: BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users of BiocParallel for further flexibility for setting up parallel-execution back-end. Default is NULL. If provided, this is assumed to be an instance of BiocParallelParam.
verbose: logical: indicate whether to print the current permutation level together with the row number and name to show progress as the function iterates through genes.
seed: integer: Seed for the random number generator used to generate noise in the variogram matching step. Ensures reproducibility of empirical p-values regardless of parallelization back-end. Default is 0.
adjustMethod: character: multiple-testing correction method passed to stats::p.adjust() for the final pValuePermuteX and pValuePermuteY columns separately. Must be one of p.adjust.methods. Default is "BH".

Value

The output is returned as a data.frame. The rownames are the rownames of the SpatialExperiments, and each row reflects the last permutation round in which that gene was evaluated. The columns and their contents are as follows:

correlationCoefPearson's correlation coefficient.
pValueNaivethe analytical p-value naively assuming independent observations
pValuePermuteXmultiple-testing-adjusted p-value from an empirical null generated by permutations of observations in X
pValuePermuteYmultiple-testing-adjusted p-value from an empirical null generated by permutations of observations in Y
deltaStarMedianXthe median delta star (the delta which minimizes the difference between the variogram of the permutation and the variogram of observations) across permutations of X
deltaStarMedianYthe median delta star across permutations of Y
deltaStarXlist of delta star for all permutations of X
deltaStarYlist of delta star for all permutations of Y
nullCorrelationsXcorrelation coefficients for Y and all permutations of X
nullCorrelationsYcorrelation coefficients for X and all permutations of Y
permutationsX(optional) a N x B matrix, where N is the length of X and B is the final nPermutations used for that gene. Each column is the resulting values of a permutation of X
permutationsY(optional) a N x B matrix, where N is the length of Y and B is the final nPermutations used for that gene. Each column is the resulting values of a permutation of Y

Examples

data(speKidney)
if (FALSE) { # \dontrun{
rastKidney <- SEraster::rasterizeGeneExpression(
  speKidney,
  assay_name = "counts",
  resolution = 0.2,
  fun = "mean",
  BPPARAM = BiocParallel::MulticoreParam(),
  square = FALSE
)

rastGexpListAB <- list(A = rastKidney$A, B = rastKidney$B)

corr <- spatialCorrelationGeneExpIterPermutations(
  rastGexpListAB,
  nPermutations = c(100, 1000),
  nThreads = 5
)

negCorrelation
} # }