Skip to contents

SEraster is a rasterization preprocessing framework that aggregates cellular information into spatial pixels to reduce resource requirements for spatial omics data analysis.

Overview

SEraster reduces the number of spatial points in spatial omics datasets for downstream analysis through a process of rasterization where single cells’ gene expression or cell-type labels are aggregated into equally sized pixels based on a user-defined resolution. Here, we refer to a particular resolution of rasterization by the side length of the pixel such that finer resolution indicates smaller pixel size and coarser resolution indicates larger pixel size.

Installation

To install SEraster, we recommend using remotes:

require(remotes)
remotes::install_github('JEFworks-Lab/SEraster')

Tutorials

Input data format

In the examples below, we assume the input data is provided as a SpatialExperiment Bioconductor object. Please refer to the SpatialExperiment package and the merfish_mousePOA dataset in the package to see how you would format your data into a SpatialExperiment object.

Example

A short example workflow is shown below.

Load packages

Load example dataset

data("merfish_mousePOA")

# check the dimension of the genes-by-cells matrix at single-cell resolution
dim(merfish_mousePOA)
[1]  155 6509
# check the number of cell-types
length(unique(colData(merfish_mousePOA)$celltype))
[1]  16

This MERFISH mouse preoptic area dataset contains 6,509 cells and 16 cell-types.

# plot at single-cell resolution
df <- data.frame(spatialCoords(merfish_mousePOA), celltype = colData(merfish_mousePOA)$celltype)
ggplot(df, aes(x = x, y = y, col = celltype)) +
  geom_point(size = 1) +
  labs(x = "x (μm)",
       y = "y (μm)",
       col = "Cell-types") +
  theme_bw() +
  theme(panel.grid = element_blank())

Getting started

Rasterize gene expression

rastGexp <- SEraster::rasterizeGeneExpression(merfish_mousePOA, assay_name="volnorm", resolution = 50)

# check the dimension of the genes-by-cells matrix after rasterizing gene expression
dim(rastGexp)
[1]  155 1301

As you can see, SEraster aggregated 6,509 single cells into 1,301 pixels.

# plot total rasterized gene expression
SEraster::plotRaster(rastGexp, name = "Total rasterized gene expression")

# plot specific gene
SEraster::plotRaster(rastGexp, feature_name = "Esr1", name = "Esr1")

Rasterize cell-type labels

rastCt <- SEraster::rasterizeCellType(merfish_mousePOA, col_name = "celltype", resolution = 50)

# check the dimension of the cell-types-by-cells matrix after rasterizing cell-type labels
dim(rastGexp)
[1]  16 1301
# plot total cell counts
SEraster::plotRaster(rastCt, name = "cell counts", option = "inferno")

# plot specific cell-type
SEraster::plotRaster(rastCt, feature_name = "Inhibitory", name = "Inhibitory neuron counts", option = "inferno")

Sample Downstream Analysis

SEraster returns rasteriezd gene-expression and cell-type information as SpatialExperiment objects that can be integrated with other existing downstream analysis tools. We demonstrate below spatial variable gene (SVG) and cell-type cooccurrence analyses as examples of such potential downstream analysis.

Spatial variable gene (SVG) analysis

Here, we use a previously developed tool called nnSVG. Please refer to nnSVG for more details about the package. We can directly input rasterized gene expression SpatialExperiment object from SEraster into nnSVG.

library(nnSVG)
# run nnSVG
set.seed(0)
rastGexp <- nnSVG(rastGexp, assay_name = "pixelval")
# number of significant SVGs
table(rowData(rastGexp)$padj <= 0.05)
## 
## FALSE  TRUE
##    17   138
# plot rasterized gene expression of top-ranked SVG
top_svg <- which(rowData(rastGexp)$rank == 1)
top_svg_name <- rownames(rowData(rastGexp))[top_svg]
SEraster::plotRaster(rastGexp, feature_name = top_svg_name, name = top_svg_name)

We can also perform cell-type specific SVG analysis by subsetting the dataset prior to applying SEraster.

# subset data
ct_interest <- "Excitatory"
spe_sub <- merfish_mousePOA[,merfish_mousePOA$celltype == ct_interest]

# run SEraster
rastGexp_sub <- SEraster::rasterizeGeneExpression(spe_sub, assay_name="volnorm", resolution = 50)

# run nnSVG
set.seed(0)
rastGexp_sub <- nnSVG(rastGexp_sub, assay_name = "pixelval")
# number of significant SVGs
table(rowData(rastGexp_sub)$padj <= 0.05)
## 
## FALSE  TRUE 
##    45   110
# plot rasterized gene expression of top-ranked SVG
top_svg <- which(rowData(rastGexp_sub)$rank == 1)
top_svg_name <- rownames(rowData(rastGexp_sub))[top_svg]
SEraster::plotRaster(rastGexp_sub, feature_name = top_svg_name, name = top_svg_name)

Cell-type cooccurrence analysis

Rasterized cell-type labels can be used to analyze pair-wise cell-type cooccurrence. To do so, we binarize the rasterized cell-type labels using a relative enrichment metric and a previously developed tool called CooccurrenceAffinity. Please refer to our paper for more details about the methodology and CooccurrenceAffinity for more details about the package.

# extract cell-type labels
ct_labels <- as.factor(colData(merfish_mousePOA)$celltype)

# compute relative enrichment (RE) metric
mat <- assay(rastCt, "pixelval")
mat_re <- do.call(rbind, lapply(rownames(rastCt), function(ct_label) {
    mat[ct_label,] / (sum(mat[ct_label,]) / sum(mat) * colSums(mat))
}))
rownames(mat_re) <- rownames(mat)

# binarize
mat_bin <- ifelse(mat_re >= 1, 1, 0)

# add RE and binarized layers to SpatialExperiment object
assays(rastCt) <- list(pixelval = assay(rastCt, "pixelval"), re = mat_re, bin = mat_bin)
ct_interest <- "Ependymal"

# plot pixel value for a cell-type of interest
plotRaster(rastCt, assay_name = "pixelval", feature_name = ct_interest, name = "cell-type counts", option = "inferno")

# plot RE value for a cell-type of interest
plotRaster(rastCt, assay_name = "re", feature_name = ct_interest, name = "RE", option = "inferno")

# plot binarized value for a cell-type of interest
plotRaster(rastCt, assay_name = "bin", feature_name = ct_interest, factor_levels = c(0,1), name = "binarized", option = "inferno")

# run CooccurrenceAffinity
ct_coocc <- CooccurrenceAffinity::affinity(data = mat_bin, row.or.col = "row", squarematrix = c("all"))

# plot maximum likelihood estimates of affinity metric (alpha MLE)
CooccurrenceAffinity::plotgg(data = ct_coocc, variable = "alpha_mle", legendlimit = "datarange")