Formatting a SpatialExperiment Object for SEraster
Source:vignettes/formatting-SpatialExperiment-for-SEraster.Rmd
formatting-SpatialExperiment-for-SEraster.Rmd
Formatting a SpatialExperiment Object for SEraster
For this tutorial, we will format a preprocessed MERFISH dataset of
the mouse preotic area (POA) into a SpatialExperiment
so
that it can be rasterized with SEraster
. The output of this
tutorial is the same as the merfish_mousePOA
dataset in the
package.
In the original work, Moffitt J. and Bambah-Mukku D. et al. (2018), “Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region”, Science Advances, authors collected spatial transcriptomics datasets of mouse POA regions for various sexes, behavioral conditions, and bregma sections. The full dataset can be downloaded from Dryad. We only use the dataset for a bregma -0.29 slice from a female naive animal (Animal ID = 1, Animal Sex = “Female”, Behavior = “Naive”, Bregma = “-0.29”).
Load libraries
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#>
#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#>
#> findMatches
#> The following objects are masked from 'package:base':
#>
#> expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#>
#> rowMedians
#> The following objects are masked from 'package:matrixStats':
#>
#> anyMissing, rowMedians
library(Matrix)
#>
#> Attaching package: 'Matrix'
#> The following object is masked from 'package:S4Vectors':
#>
#> expand
library(ggplot2)
Process dataset
The subsetted has cells. For rasterizing gene expression with
SEraster
, the input SpatialExperiment
needs to
have a genes-by-cells matrix (either dense or sparse matrix) in the
assay
slot and a spatial coordinates matrix in the
spatialCoords
slot. For rasterizing cell-type labels with
SEraster
, in addition to the spatial coordinates matrix in
the spatialCoords
splot, the input
SpatialExperiment
also need to have a data frame with a
column for cell-type labels in the colData
slot. Thus,
here, we extract genes-by-cells matrix, spatial coordinates matrix, and
data frame with cell-type labels.
## genes-by-cells matrix
# extract the genes-by-cells matrix as a sparse matrix (dgCMatrix)
mat <- as(t(data_sub[,10:ncol(data_sub)]), "CsparseMatrix")
# remove blank genes used for quality control
blanks <- rownames(mat)[grepl("Blank", rownames(mat))]
mat <- mat[setdiff(rownames(mat),blanks),]
## spatial coordinates matrix
# extract the spatial coordinates
pos <- data_sub[,c("Centroid_X", "Centroid_Y")]
colnames(pos) <- c("x","y")
# make x,y coordinates positive
pos[,1] <- pos[,1] - min(pos[,1])
pos[,2] <- pos[,2] - min(pos[,2])
## cell-type labels
# extract the data frame with cell-type labels
meta <- data_sub[,c("Bregma", "Cell_class", "Neuron_cluster_ID")]
colnames(meta) <- c("bregma", "celltype", "neurontype")
## standardize cell IDs for the extracted objects
colnames(mat) <- rownames(pos) <- rownames(meta) <- data_sub$Cell_ID
Remove genes and cells with NaN values.
## filter genes with NaN values
bad_genes <- names(which(rowSums(is.nan(mat)) > 0))
mat <- mat[setdiff(rownames(mat),bad_genes),]
## filter cells with NaN values
bad_cells <- names(which(colSums(is.nan(mat)) > 0))
mat <- mat[,setdiff(colnames(mat),bad_cells)]
pos <- pos[setdiff(rownames(pos),bad_cells),]
meta <- meta[setdiff(rownames(pos),bad_cells),]
Plot total gene expression at single-cell resolution for verification.
df_plt <- data.frame(pos, total_gexp = colSums(mat))
ggplot(df_plt, aes(x = x, y = y, color = total_gexp)) +
coord_fixed() +
geom_point(size = 1.5, stroke = 0) +
scale_color_viridis_c(name = "total gene expression") +
theme_bw() +
theme(panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
Plot cell-type labels at single-cell resolution for verification.
df_plt <- data.frame(pos, celltype = meta$celltype)
ggplot(df_plt, aes(x = x, y = y, color = celltype)) +
coord_fixed() +
geom_point(size = 1.5, stroke = 0) +
theme_bw() +
theme(panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())
Format SpatialExperiment object
Format genes-by-cells matrix, spatial coordinates matrix, and data
frame with cell-type labels into a SpatialExperiment
object. Here, the genes-by-cells matrix is named as “volnorm” because
the loaded gene expression was already normalized by cell volume and
scaled by 1000.
spe <- SpatialExperiment::SpatialExperiment(
assays = list(volnorm = mat),
spatialCoords = as.matrix(pos),
colData = meta
)
Session Info
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.5
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] ggplot2_3.5.1 Matrix_1.7-0
#> [3] SpatialExperiment_1.12.0 SingleCellExperiment_1.24.0
#> [5] SummarizedExperiment_1.32.0 Biobase_2.64.0
#> [7] GenomicRanges_1.54.1 GenomeInfoDb_1.40.1
#> [9] IRanges_2.38.1 S4Vectors_0.42.1
#> [11] BiocGenerics_0.50.0 MatrixGenerics_1.14.0
#> [13] matrixStats_1.4.0
#>
#> loaded via a namespace (and not attached):
#> [1] gtable_0.3.5 rjson_0.2.22 xfun_0.47
#> [4] bslib_0.8.0 htmlwidgets_1.6.4 lattice_0.22-6
#> [7] vctrs_0.6.5 tools_4.4.1 generics_0.1.3
#> [10] tibble_3.2.1 fansi_1.0.6 highr_0.11
#> [13] pkgconfig_2.0.3 desc_1.4.3 lifecycle_1.0.4
#> [16] GenomeInfoDbData_1.2.12 farver_2.1.2 compiler_4.4.1
#> [19] textshaping_0.4.0 munsell_0.5.1 htmltools_0.5.8.1
#> [22] sass_0.4.9 yaml_2.3.10 pkgdown_2.1.1
#> [25] pillar_1.9.0 crayon_1.5.3 jquerylib_0.1.4
#> [28] DelayedArray_0.28.0 cachem_1.1.0 magick_2.8.4
#> [31] abind_1.4-5 tidyselect_1.2.1 digest_0.6.37
#> [34] dplyr_1.1.4 labeling_0.4.3 fastmap_1.2.0
#> [37] grid_4.4.1 colorspace_2.1-1 cli_3.6.3
#> [40] SparseArray_1.2.4 magrittr_2.0.3 S4Arrays_1.2.1
#> [43] utf8_1.2.4 withr_3.0.1 UCSC.utils_1.0.0
#> [46] scales_1.3.0 rmarkdown_2.28 XVector_0.44.0
#> [49] httr_1.4.7 ragg_1.3.3 evaluate_0.24.0
#> [52] knitr_1.48 viridisLite_0.4.2 rlang_1.1.4
#> [55] Rcpp_1.0.13 glue_1.7.0 rstudioapi_0.16.0
#> [58] jsonlite_1.8.8 R6_2.5.1 systemfonts_1.1.0
#> [61] fs_1.6.4 zlibbioc_1.50.0