spatialCorrelationGeneExp
spatialCorrelationGeneExp.RdFunction to calculate Pearson's correlation between assays from two SpatialExperiment datasets. To replace the analytical p-value which results in a high false positive rate for autocorrelated spatial patterns, it calculates empirical p-values from empirical null distributions generated from permuting the datasets and then smoothing to maintain the original degree of autocorrelation
Usage
spatialCorrelationGeneExp(
input,
nPermutations = 100,
deltaX = NULL,
deltaY = NULL,
maxDistPrctile = 0.25,
returnPermutations = FALSE,
assayName = NULL,
nThreads = 1,
BPPARAM = NULL,
verbose = TRUE
)Arguments
- input
listList of two SpatialExperiment objects with matched spatial locations. The first element corresponds to the first SpatialExperiment (`X`), and the second to the second SpatialExperiment (`Y`). The SpatialCoords of the two SpatialExperiment objects should be on the same coordinate framework and observations at the same coordinate location in both datasets should have the same row names. If the SpatialExperiment objects do not have shared locations, use `SEraster::rasterizeGeneExpression()` to generate SpatialExperiment objects with shared pixel locations. SeeassayNameparameter if the SpatialExperiment objects have more than one assay.- nPermutations
integerordouble: number of permutations to generate to build the empirical null distribution. This number will determine the precision of the p-value. Default is100, such that the smallest p-value is 0.01- deltaX
list: List of single numerics or list of numeric vectors to use for delta, the parameter controlling the degree of smoothing in permutations of X. The length of the list should the same as the number of rows in the SpatialExperiment. Delta is a proportion calculated by dividing k neighbors by N total observations (columns) in X, where k is the number of neighbors in the permutation of X that should be within the radius smoothed by the Gaussian kernel to achieve the amount of autocorrelation present in the original X. If a single delta is not known, a sequence of deltas can be inputted and the best delta will be found such that it minimizes the sum of squares of the residuals between the variogram of the permutation generated from the delta and the variogram of the target. Default isNULL. If no value is supplied fordeltaX,seq(0.1,0.9,0.1), the sequence of every 0.1 from 0.1 to 0.9, will be used to find the best delta for each row (gene) in X.- deltaY
list: List of single numerics or list of numeric vectors to use for delta, the parameter controlling the degree of smoothing in permutations of Y.deltaYis likedeltaXbut for permuting data in Y instead of X. Default isNULL. If no value is supplied fordeltaY,seq(0.1,0.9,0.1), the sequence of every 0.1 from 0.1 to 0.9, will be used to find the best delta for permutations for each row (gene) in Y.- maxDistPrctile
numeric: percentile of distances between pixels to use as max distance in when calculating variograms. Default = 0.25. At greater distances the variogram is less precise because there are fewer pairs of points with that distance between them. Therefore, since the goal is to minimize the difference between the variogram of X and those of its permutations, the variogram should be subsetted to the percentile that is more robust.- returnPermutations
logical: indicate whether the dataframe returned as output will have a column with the values of the permutations used to calculate the null correlations and the empirical p-value. Default isFALSE- assayName
characterorintegerA character string or numeric specifying the assay in the SpatialExperiment to use. Default isNULL. If no value is supplied forassayName, then the first assay is used as a default- nThreads
integer: Number of threads for parallelization. Default = 1. Inputting this argument when theBPPARAMargument isNULLwould set parallel execution back-end to beBiocParallel::MulticoreParam(workers = nThreads). We recommend setting this argument to be the number of cores available (parallel::detectCores(logical = FALSE)). IfBPPARAMargument is notNULL, theBPPARAMargument would overridenThreadsargument.- BPPARAM
BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users ofBiocParallelfor further flexibility for setting up parallel-execution back-end. Default is NULL. If provided, this is assumed to be an instance ofBiocParallelParam.- verbose
logical: indicate whether to print row number and name to show progress as the function iterates through the rows of the SpatialExperiments to calculate a correlation coefficient and empirical p-value for each row
Value
The output is returned as a data.frame. The rownames are the
rownames of the SpatialExperiments. The names of the columns and their
contents are as follows:
correlationCoefPearson's correlation coefficient.pValueNaivethe analytical p-value naively assuming independent observationspValuePermuteXthe p-value when creating an empirical null from permutations of observations in XpValuePermuteYthe p-value when creating an empirical null from permutations of observations in YdeltaStarMedianXthe median delta star (the delta which minimizes the difference between the variogram of the permutation and the variogram of observations) across permutations of XdeltaStarMedianYthe median delta star across permutations of YdeltaStarXlist of delta star for all permutations of XdeltaStarYlist of delta star for all permutations of YnullCorrelationsXcorrelation coefficients for Y and all permuations of XnullCorrelationsYcorrelation coefficients for X and all permuations of YpermutationsX(optional) a N x B matrix, where N is the length of X and B is `nPermutations`. Each column is the resulting values of a permutation of XpermutationsY(optional) a N x B matrix, where N is the length of Y and B is `nPermutations`. Each column is the resulting values of a permutation of Y
Examples
data(speKidney)
##### Rasterize to get pixels at matched spatial locations #####
rastKidney <- SEraster::rasterizeGeneExpression(speKidney,
assay_name = 'counts', resolution = 0.2, fun = "mean",
BPPARAM = BiocParallel::MulticoreParam(), square = FALSE)
##### Use STcompare to calculate Pearson's correlation coefficient #####
rastGexpListAB <- list(A = rastKidney$A, B = rastKidney$B)
rastGexpListAC <- list(A = rastKidney$A, C = rastKidney$C)
negCorrelation <- spatialCorrelationGeneExp(rastGexpListAB, nThreads = 5)
#> 1: Gene
posCorrelation <- spatialCorrelationGeneExp(rastGexpListAC, nThreads = 5)
#> 1: Gene
negCorrelation
#> correlationCoef pValueNaive pValuePermuteX pValuePermuteY
#> Gene -0.9472813 5.652003e-136 0 0
#> deltaStarMedianX deltaStarMedianY deltaStarX deltaStarY
#> Gene 0.2 0.2 0.2, 0.3.... 0.2, 0.3....
#> nullCorrelationsX nullCorrelationsY
#> Gene -0.04977.... -0.34991....
posCorrelation
#> correlationCoef pValueNaive pValuePermuteX pValuePermuteY
#> Gene 0.9431531 1.409195e-133 0 0
#> deltaStarMedianX deltaStarMedianY deltaStarX deltaStarY
#> Gene 0.3 0.3 0.4, 0.5.... 0.3, 0.2....
#> nullCorrelationsX nullCorrelationsY
#> Gene 0.036071.... 0.129676....