Analyzing Connectivity of Spleen CODEX dataset using CRAWDAD on K-means Clusters


Andrew Y
I am a junior undergraduate student majoring in biomedical engineering and computer engineering at Johns Hopkins University. In my free time, I enjoy reading, doing crosswords, and playing guitar. I look forward to learning and getting to know you all.

Analyzing Connectivity of Spleen CODEX dataset using CRAWDAD on K-means Clusters

What data types are you visualizing?

For plot A, I am visualizing the spatial data of the x and y positions for each cell, and categorical data of the cluster the cell belongs to. I am using the geometric primitive of points to represent each cell. To encode the spatial x position, I am using the visual channel of position along the x axis. To encode the spatial y position, I am using the visual channel of position along the y axis. To encode the categorical cluster the cell belongs to, I am using the visual channel of hue.

For plot B, I am visualizing the quantitative data of the X1 and X2 tSNE embedding values, and categorical data of the cluster the cell belongs to. I am using the geometric primitive of points to represent each cell. To encode the X1 embedding value, I am using the visual channel of position along the x axis. To encode the X2 value, I am using the visual channel of position along the y axis. To encode the categorical cluster the cell belongs to, I am using the visual channel of hue.

For plot C, I am visualizing the categorical data of the neighbor cluster using the y axis position. I am visualizing the categorical data of the reference cluster using the x axis position. I am visualizing the quantitative data of the z-score using the visual channel of color hue. I am visualizing the quantitative data of the scale using the visual channel of area.

For plots D and E, I am visualizing the quantitative data of the z-score using the visual channel of y axis position. I am visualizing the quantitative data of the scale using the visual channel of x axis position.

The Gestalt principle of similarity and proximity because D and E are both line plots and are adjacent, and plots A and B which have the same color scheme are adjacent.

Please share the code you used to reproduce this data visualization.

data <-
    read.csv("genomic-data-visualization-2024/data/codex_spleen_subset.csv.gz",
             row.names = 1)

data[1:10, 1:10]
pos <- data[, 1:2]
area <- data[, 3]
pexp <- data[4:ncol(data)]

pexpnorm <- log10(pexp/area * mean(area)+1)

library(ggplot2)
ggplot(data.frame(pos, area)) + geom_point(aes(x=x, y=y, col=area))+
    scale_color_gradient(high = "darkred", low = "gray")

library(Rtsne)
set.seed(42)
emb <- Rtsne(pexpnorm, perplexity=15)$Y

set.seed(42)
tw <- sapply(1:15, function(i) {
    print(i)
    kmeans(pexpnorm, centers=i, iter.max = 50)$tot.withinss
})
plot(tw, type='o')

set.seed(42)
com <- as.factor(kmeans(pexpnorm, centers=7)$cluster)

p1 <- ggplot(data.frame(pos, pexpnorm, com)) +
    geom_point(aes(x = x, y = y, col = com), size = 1)

p2 <- ggplot(data.frame(emb, pexpnorm, com)) +
    geom_point(aes(x = X1, y = X2, col = com), size = 1)

## https://github.com/JEFworks-Lab/CRAWDAD/blob/main/docs/3_spleen.md
library(crawdad)
crawdad_df <- data.frame(x = pos[,1], y = pos[,2], com)

ncores <- 8
set.seed(42)
## convert to sp::SpatialPointsDataFrame
seq <- crawdad:::toSF(pos = crawdad_df[,c("x", "y")],
                        celltypes = crawdad_df$com)
set.seed(42)
## generate background
shuffle.list <- crawdad::makeShuffledCells(seq,
                                           scales = seq(100, 1000, by=100),
                                           perms = 3,
                                           ncores = ncores,
                                           seed = 1,
                                           verbose = TRUE)
set.seed(42)
## find trends, passing background as parameter
results <- crawdad::findTrends(seq,
                               dist = 50,
                               shuffle.list = shuffle.list,
                               ncores = ncores,
                               verbose = TRUE, 
                               returnMeans = FALSE) # for error bars

set.seed(42)
## convert results to data.frame
dat <- crawdad::meltResultsList(results, withPerms = T)

## multiple-test correction
ntests <- length(unique(dat$reference)) * length(unique(dat$reference))
psig <- 0.05/ntests # bonferroni correction
zsig <- round(qnorm(psig/2, lower.tail = F), 2)

p3 <- vizColocDotplot(dat, reorder = FALSE, zsig.thresh = zsig, zscore.limit = zsig*2, dot.sizes = c(6, 20)) +
    theme(legend.position='right',
          axis.text.x = element_text(angle = 45, h = 0))
p3

library(tidyverse)
dat_filter <- dat %>% 
    filter(reference == '2') %>% 
    filter(neighbor == '6')
p4 <- vizTrends(dat_filter, lines = T, withPerms = T, sig.thresh = zsig)


dat_filter <- dat %>% 
    filter(reference == '1') %>% 
    filter(neighbor == '2')
p5 <- vizTrends(dat_filter, lines = T, withPerms = T, sig.thresh = zsig)

library(patchwork)
p1 + p2 + p3 + p4 + p5 + plot_annotation(tag_levels = 'A') + plot_layout(nrow = 2, ncol = 3)