Complementing single-cell clustering analysis with MERINGUE spatial analysis
Introduction
We recently published a paper in the journal Genome Research on a new bioinformatics tool MERINGUE for characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomics data with nonuniform cellular densities. For more details about MERINGUE, please check out the paper or the code repository with tutorials on Github. In this blog post, I apply MERINGUE to new MERFISH data of the mouse cortex from Vizgen.
library(MERINGUE)
Background
The mamalian brain is made up of functionally distinct regions, comprising many transcriptionally distinct cell-types. The Allen Brain Atlas provides a cool tool for browsing these brain regions in mouse.
Characterizing how these transcriptionally distinct cell-types are spatially organized can help us understand how this spatial organization relates to function. This spatially resolved transcriptomics data with MERFISH profiles the expression of 483 genes in 3 full coronal slices of the mouse brain across 3 biological replicates.
Get the data
Here, we will focus on slice #2 replicate #1.
## read in data
cd <- read.csv('datasets_mouse_brain_map_BrainReceptorShowcase_Slice2_Replicate1_cell_by_gene_S2R1.csv.gz', row.names = 1)
annot <- read.csv('datasets_mouse_brain_map_BrainReceptorShowcase_Slice2_Replicate1_cell_metadata_S2R1.csv.gz', row.names = 1)
pos <- annot[, c('center_x', 'center_y')]
pos[,2] <- -pos[,2] ## flip Y coordinates
## visualize
par(mfrow=c(1,1))
MERINGUE::plotEmbedding(pos, unclassified.cell.color='black', cex=0.1)
Each point is a cell. So based on density alone, we can already see some interesting spatial organization. Let’s use clustering analysis to identify transcriptionally distinct cell-types.
Single-cell Clustering Analysis
We are interested in identifying genes that exhibit significant spatial heterogeneity. However, given that transcriptionally distinct cell-types are spatially organized in the brain, a lot of the significant spatial heterogeneity we identify will be driven by this underlying spatial organization of cell-types. So let’s first identify cell-types and see how they’re spatially organized and then look for additional aspects of spatial heterogeneity within cell-types.
First, let’s remove blank control barcodes from the MERFISH dataset as well as poor genes and cells with few reads as needed.
## remove blanks
good.genes <- colnames(cd)[!grepl('Blank', colnames(cd))]
counts <- t(cd[, good.genes])
## filter
counts <- MERINGUE::cleanCounts(counts, verbose = FALSE)
There are many ways to identify cell-types via different types of single-cell clustering analysis. Here, we will just do a quick and dirty CPM normalization, followed by PCA dimensionality reduction, and graph based clustering. We can visualize the results using UMAP as well as on our original spatial coordinates.
## CPM normalize, log transform
mat <- MERINGUE::normalizeCounts(counts, log = TRUE)
dim(mat)
## pca, take top 30 PCs
## note to self: slow, need to find faster pca
#pcs <- irlba::prcomp_irlba(mat, n=30)
## update: RSpectra svds is much faster so switch to that
pca <- RSpectra::svds(A = t(mat), k=30,
opts = list(center = TRUE, scale = FALSE, maxitr = 2000, tol = 1e-10))
foo <- as.matrix(t(mat) %*% pca$v[,1:30])
rownames(foo) <- colnames(mat)
## graphic based clustering on PC loadings
## subsample down to 10% of cells and infer rest to speed up
com <- MUDAN::getApproxComMembership(foo, k=10,
nsubsample = nrow(foo)*0.1,
method=igraph::cluster_louvain)
table(com)
## UMAP
emb <- uwot::umap(foo)
rownames(emb) <- colnames(mat)
dim(emb)
## plot
par(mfrow=c(1,1))
MERINGUE::plotEmbedding(emb, groups=com,
show.legend=TRUE, legend.x = "topleft",
mark.clusters = TRUE,
cex=0.1, main='UMAP')
MERINGUE::plotEmbedding(pos, groups=com,
cex=0.2, main='Spatial')
So it looks like we can identify 20 transcriptionally distinct clusters or cell-types here. And they have seem to have pretty distinct spatial organizations. Neat!
Let’s focus on the cells of cluster 12 for now. Based on its spatial organization, perhaps this cluster corresponds to pyramidal neurons.
ct = 12 ## pyrimidal layer
cells <- names(com)[com == ct]
par(mfrow=c(1,1))
MERINGUE::plotEmbedding(emb, groups=com[cells],
show.legend = TRUE, legend.x = "topleft",
mark.clusters = TRUE,
cex = 0.1, main = 'UMAP cluster 12 only')
MERINGUE::plotEmbedding(pos, groups=com[cells],
cex = 0.2, main = 'Spatial cluster 12 only')
posh <- pos[cells,]
dim(posh)
math <- mat[, cells]
We can plot a few marker genes to confirm as well.
dg <- MERINGUE::getDifferentialGenes(mat, com)
head(dg[[12]][order(dg[[12]]$Z, decreasing=TRUE),], n=20)
## manually pick a few
## https://www.proteinatlas.org/ENSG00000168539-CHRM1
g <- 'Chrm1'
## https://www.proteinatlas.org/ENSG00000273079-GRIN2B
g <- 'Grin2b'
gexp <- mat[g,]
gexp <- scale(gexp)[,1]
gexp[gexp > 1.5] <- 1.5
gexp[gexp < -1.5] <- -1.5
par(mfrow=c(1,1))
MERINGUE::plotEmbedding(emb, col=gexp,
cex = 0.1, main = g)
MERINGUE::plotEmbedding(pos, col=gexp,
cex = 0.2, main = g)
Spatial Analysis
Focusing on cluster 12 cells, we can use MERINGUE to identify significantly spatially variable genes driven by more than 10% of cells.
## Get neighbor-relationships
#W <- MERINGUE::getSpatialNeighbors(posh[1:50,], filterDist = 50)
#MERINGUE::plotNetwork(posh[1:50,], W)
W <- MERINGUE::getSpatialNeighbors(posh, filterDist = 50)
## Get spatial patterns
I <- MERINGUE::getSpatialPatterns(math, W)
## Filter for patterns driven by > 10% of cells
results.filter <- MERINGUE::filterSpatialPatterns(mat = math,
I = I,
w = W,
adjustPv = TRUE,
alpha = 0.05,
minPercentCells = 0.10,
verbose = FALSE,
details=TRUE)
We identify 39 such genes. We can further see how these genes comprise spatially distinct patterns.
# Identify primary patterns
scc <- MERINGUE::spatialCrossCorMatrix(mat = as.matrix(math[rownames(results.filter),]),
weight = W)
par(mfrow=c(1,3), mar=rep(2,4))
ggroup <- MERINGUE::groupSigSpatialPatterns(pos = pos,
mat = as.matrix(math[rownames(results.filter),]),
scc = scc,
power = 1,
hclustMethod = 'ward.D',
deepSplit = 1,
zlim=c(-1.5,1.5))
We can also look at specific genes within each pattern to appreciate their spatial expression distribution.
par(mfrow=c(3,3), mar=rep(2,4))
# Look at specific genes in each groups
lapply(levels(ggroup$groups), function(gr) {
gs <- names(ggroup$groups)[ggroup$groups == gr]
## select few driven by most cells
gs <- gs[order(results.filter[gs,]$minPercentCells, decreasing=TRUE)][1:3]
lapply(gs, function(g) {
## z-score expression within cluster 12 cells
gexp <- math[g,]
gexp <- scale(gexp)[,1]
gexp[gexp > 1.5] <- 1.5
gexp[gexp < -1.5] <- -1.5
## set as NAs for rest
gexpall <- rep(NA, nrow(pos))
names(gexpall) <- rownames(pos)
gexpall[names(gexp)] <- gexp
MERINGUE::plotEmbedding(pos, col=gexpall, main=g, cex=0.05)
})
})
Visually, these patterns appear to match the organization of the CA3, CA2, CA1 regions, which we know to be comprised of pyrimidal cells that differ in theirs of their neural inputs and outputs. Cool!
Try it out for yourself!
- Try out the analysis for a different cell-type.
- Are identified spatial patterns bilaterally symmetric? Can we quantify this?
- Do certain genes exhibit spatial heterogeneity in multiple cell-types? What are these genes? Are they enriched for certain functional classifications? Are their spatial patterns consistents across cell-types?
- Older
- Newer
Recent Posts
- Using AI to find heterogeneous scientific speakers on 04 November 2024
- The many ways to calculate Moran's I for identifying spatially variable genes in spatial transcriptomics data on 29 August 2024
- Characterizing spatial heterogeneity using spatial bootstrapping with SEraster on 23 July 2024
- I use R to (try to) figure out which hospital I should go to for shoppable medical services by comparing costs through analyzing Hospital Price Transparency data on 22 April 2024
- Cross modality image alignment at single cell resolution with STalign on 11 April 2024
Related Posts
- Using AI to find heterogeneous scientific speakers
- The many ways to calculate Moran's I for identifying spatially variable genes in spatial transcriptomics data
- Characterizing spatial heterogeneity using spatial bootstrapping with SEraster
- I use R to (try to) figure out which hospital I should go to for shoppable medical services by comparing costs through analyzing Hospital Price Transparency data