Skip to contents

Formatting a SpatialExperiment Object for SEraster

For this tutorial, we will format a preprocessed MERFISH dataset of the mouse preotic area (POA) into a SpatialExperiment so that it can be rasterized with SEraster. The output of this tutorial is the same as the merfish_mousePOA dataset in the package.

In the original work, Moffitt J. and Bambah-Mukku D. et al. (2018), “Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region”, Science Advances, authors collected spatial transcriptomics datasets of mouse POA regions for various sexes, behavioral conditions, and bregma sections. The full dataset can be downloaded from Dryad. We only use the dataset for a bregma -0.29 slice from a female naive animal (Animal ID = 1, Animal Sex = “Female”, Behavior = “Naive”, Bregma = “-0.29”).

Load libraries

library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(Matrix)
#> 
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#> 
#>     expand
library(ggplot2)

Load the subsetted dataset

data_sub <- readRDS(file = "merfish_mousePOA_raw.RDS")
dim(data_sub)
#> [1] 6509  170

Process dataset

The subsetted has cells. For rasterizing gene expression with SEraster, the input SpatialExperiment needs to have a genes-by-cells matrix (either dense or sparse matrix) in the assay slot and a spatial coordinates matrix in the spatialCoords slot. For rasterizing cell-type labels with SEraster, in addition to the spatial coordinates matrix in the spatialCoords splot, the input SpatialExperiment also need to have a data frame with a column for cell-type labels in the colData slot. Thus, here, we extract genes-by-cells matrix, spatial coordinates matrix, and data frame with cell-type labels.

## genes-by-cells matrix
# extract the genes-by-cells matrix as a sparse matrix (dgCMatrix)
mat <- as(t(data_sub[,10:ncol(data_sub)]), "CsparseMatrix")

# remove blank genes used for quality control
blanks <- rownames(mat)[grepl("Blank", rownames(mat))]
mat <- mat[setdiff(rownames(mat),blanks),]

## spatial coordinates matrix
# extract the spatial coordinates
pos <- data_sub[,c("Centroid_X", "Centroid_Y")]
colnames(pos) <- c("x","y")

# make x,y coordinates positive
pos[,1] <- pos[,1] - min(pos[,1])
pos[,2] <- pos[,2] - min(pos[,2])

## cell-type labels
# extract the data frame with cell-type labels
meta <- data_sub[,c("Bregma", "Cell_class", "Neuron_cluster_ID")]
colnames(meta) <- c("bregma", "celltype", "neurontype")

## standardize cell IDs for the extracted objects
colnames(mat) <- rownames(pos) <- rownames(meta) <- data_sub$Cell_ID

Remove genes and cells with NaN values.

## filter genes with NaN values
bad_genes <- names(which(rowSums(is.nan(mat)) > 0)) 
mat <- mat[setdiff(rownames(mat),bad_genes),]

## filter cells with NaN values
bad_cells <- names(which(colSums(is.nan(mat)) > 0))
mat <- mat[,setdiff(colnames(mat),bad_cells)]
pos <- pos[setdiff(rownames(pos),bad_cells),]
meta <- meta[setdiff(rownames(pos),bad_cells),]

Plot total gene expression at single-cell resolution for verification.

df_plt <- data.frame(pos, total_gexp = colSums(mat))

ggplot(df_plt, aes(x = x, y = y, color = total_gexp)) +
  coord_fixed() +
  geom_point(size = 1.5, stroke = 0) +
  scale_color_viridis_c(name = "total gene expression") +
  theme_bw() +
  theme(panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())

Plot cell-type labels at single-cell resolution for verification.

df_plt <- data.frame(pos, celltype = meta$celltype)

ggplot(df_plt, aes(x = x, y = y, color = celltype)) +
  coord_fixed() +
  geom_point(size = 1.5, stroke = 0) +
  theme_bw() +
  theme(panel.grid = element_blank(),
        axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank())

Format SpatialExperiment object

Format genes-by-cells matrix, spatial coordinates matrix, and data frame with cell-type labels into a SpatialExperiment object. Here, the genes-by-cells matrix is named as “volnorm” because the loaded gene expression was already normalized by cell volume and scaled by 1000.

spe <- SpatialExperiment::SpatialExperiment(
  assays = list(volnorm = mat),
  spatialCoords = as.matrix(pos),
  colData = meta
)
Session Info
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.5
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] ggplot2_3.5.1               Matrix_1.7-0               
#>  [3] SpatialExperiment_1.12.0    SingleCellExperiment_1.24.0
#>  [5] SummarizedExperiment_1.32.0 Biobase_2.64.0             
#>  [7] GenomicRanges_1.54.1        GenomeInfoDb_1.40.1        
#>  [9] IRanges_2.38.1              S4Vectors_0.42.1           
#> [11] BiocGenerics_0.50.0         MatrixGenerics_1.14.0      
#> [13] matrixStats_1.4.0          
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.5            rjson_0.2.22            xfun_0.47              
#>  [4] bslib_0.8.0             htmlwidgets_1.6.4       lattice_0.22-6         
#>  [7] vctrs_0.6.5             tools_4.4.1             generics_0.1.3         
#> [10] tibble_3.2.1            fansi_1.0.6             highr_0.11             
#> [13] pkgconfig_2.0.3         desc_1.4.3              lifecycle_1.0.4        
#> [16] GenomeInfoDbData_1.2.12 farver_2.1.2            compiler_4.4.1         
#> [19] textshaping_0.4.0       munsell_0.5.1           htmltools_0.5.8.1      
#> [22] sass_0.4.9              yaml_2.3.10             pkgdown_2.1.1          
#> [25] pillar_1.9.0            crayon_1.5.3            jquerylib_0.1.4        
#> [28] DelayedArray_0.28.0     cachem_1.1.0            magick_2.8.4           
#> [31] abind_1.4-5             tidyselect_1.2.1        digest_0.6.37          
#> [34] dplyr_1.1.4             labeling_0.4.3          fastmap_1.2.0          
#> [37] grid_4.4.1              colorspace_2.1-1        cli_3.6.3              
#> [40] SparseArray_1.2.4       magrittr_2.0.3          S4Arrays_1.2.1         
#> [43] utf8_1.2.4              withr_3.0.1             UCSC.utils_1.0.0       
#> [46] scales_1.3.0            rmarkdown_2.28          XVector_0.44.0         
#> [49] httr_1.4.7              ragg_1.3.3              evaluate_0.24.0        
#> [52] knitr_1.48              viridisLite_0.4.2       rlang_1.1.4            
#> [55] Rcpp_1.0.13             glue_1.7.0              rstudioapi_0.16.0      
#> [58] jsonlite_1.8.8          R6_2.5.1                systemfonts_1.1.0      
#> [61] fs_1.6.4                zlibbioc_1.50.0