HW5
Tissue Structure Identified: White Pulp
Analysis of the CODEX spatial proteomics dataset (codex_spleen2.csv.gz) indicates that the dominant structure in this tissue section is white pulp. The white pulp is the organized lymphoid compartment of the spleen, surrounding central arterioles and consisting primarily of (1) B cell follicles, which support humoral immunity, and (2) the periarteriolar lymphoid sheath (PALS), a T cell–rich zone mediating cell-mediated immune responses (Mebius & Kraal, 2005).
Clustering results, marker enrichment, and spatial organization consistently support this identification.
Cell Types Identified
Cell Type 1: B Cells (Cluster 1)
Cluster 1 showed strong upregulation of CD21 (log2FC = 3.02, padj ≈ 0), CD20 (log2FC = 1.21, padj ≈ 0), and HLA-DR (log2FC = 0.97, padj ≈ 0).
CD20 is a canonical B cell marker (Tedder & Engel, 1994). CD21 supports B cell activation and germinal center formation within splenic follicles (Carroll & Isenman, 2012). HLA-DR enables antigen presentation, a defining feature of B cell function.
Spatially, Cluster 1 formed compact, rounded aggregates consistent with B cell follicles in white pulp. The transcriptional signature and spatial architecture align clearly with follicular B cells.
Cell Type 2: T Cells (Cluster 4)
Cluster 4 was characterized by elevated CD45RO (log2FC = 1.93, padj ≈ 0), CD3e (log2FC = 1.35, padj ≈ 0), and CD4 (log2FC = 0.96, padj ≈ 0).
CD3e is a definitive pan-T cell marker (Clevers et al., 1988). CD4 identifies helper T cells, and CD45RO marks activated or memory T cells.
These cells localized adjacent to and partially surrounding the B cell clusters — a pattern consistent with the PALS region of white pulp, where T cell zones flank B cell follicles (Mebius & Kraal, 2005).
Supporting Evidence: Red Pulp Contrast
Cluster 5 expressed CD163 (log2FC = 0.57, padj = 3.9e-131) and CD68 (log2FC = 0.32, padj = 7.1e-71), markers of tissue-resident macrophages.
CD163 identifies red pulp macrophages responsible for hemoglobin scavenging and erythrocyte clearance (Kristiansen et al., 2001; Nagelkerke et al., 2018). These cells localized outside the lymphoid aggregates, forming a surrounding compartment consistent with red pulp.
The spatial separation between macrophage-rich regions and B/T cell clusters reinforces the identification of white pulp as the central structure in this dataset.
Spatial Delineation by CD8 and Lyve1
Expression patterns of CD8 and Lyve1 further delineated splenic compartments:
High CD8 and low Lyve1 marked organized T cell zones within white pulp.
High Lyve1 marked the sinusoidal network characteristic of red pulp.
This reciprocal spatial pattern clearly distinguishes immune cell clusters from the surrounding vascular filtration compartment.
Analysis Pipeline
The dataset contained 28 protein markers measured across segmented spleen cells. After normalization and log10 transformation, I performed PCA (10 PCs capturing ~80% variance), followed by tSNE (perplexity = 30) for visualization. K-means clustering (K = 6, selected using the elbow method) identified distinct cell populations. Differential expression was assessed using Wilcoxon rank-sum tests with Benjamini–Hochberg correction.
Cluster assignments, spatial localization, and marker expression patterns together support the conclusion that the analyzed region represents splenic white pulp intersparsed with red pulp cells.
References
-
Mebius, R. E., & Kraal, G. (2005). Structure and function of the spleen. Nature Reviews Immunology, 5(8), 606–616. https://doi.org/10.1038/nri1669
-
Tedder, T. F., & Engel, P. (1994). CD20: a regulator of cell-cycle progression of B lymphocytes. Immunology Today, 15(9), 450–454. https://doi.org/10.1016/0167-5699(94)90276-3
-
Carroll, M. C., & Isenman, D. E. (2012). Regulation of humoral immunity by complement. Immunity, 37(2), 199–207. https://doi.org/10.1016/j.immuni.2012.08.002
-
Clevers, H., Alarcon, B., Wileman, T., & Terhorst, C. (1988). The T cell receptor/CD3 complex: a dynamic protein ensemble. Annual Review of Immunology, 6, 629–662.
-
Kristiansen, M., Graversen, J. H., Jacobsen, C., et al. (2001). Identification of the haemoglobin scavenger receptor. Nature, 409(6817), 198–201. https://doi.org/10.1038/35051594
-
Nagelkerke SQ, Bruggeman CW, den Haan JMM, Mul EPJ, van den Berg TK, van Bruggen R, Kuijpers TW. Red pulp macrophages in the human spleen are a distinct cell population with a unique expression of Fc-γ receptors. Blood Adv. 2018 Apr 24;2(8):941-953. doi: 10.1182/bloodadvances.2017015008. PMID: 29692344; PMCID: PMC5916003.
-
https://www.assaygenie.com/cd-markers-list?srsltid=AfmBOortyzhJgx1eAET28vzn9Nvh5wMj-0gwtKL4xaBxv_uwDUmbGc4V
-
https://www.labome.com/method/T-Cell-Markers-and-B-Cell-Markers.html
-
https://www.biorxiv.org/content/10.1101/2021.10.20.465151v2.full.pdf
Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
setwd("C:/Users/John-Paul/Downloads")
data <- read.csv(gzfile("codex_spleen2.csv.gz"))
head(data)
dim(data)
pos <- data[, 2:3]
# Plot x and y positions of the cells
plot(pos)
pexp <- data[, 5:ncol(data)]
rownames(pos) <- rownames(pexp) <- data[, 1]
area <- data[, 4]
names(area) <- data[, 1]
dim(pexp)
colnames(pexp)
head(area)
# Normalize
totpexp <- rowSums(pexp)
mat <- log10(pexp/totpexp * 1e6 + 1)
# Distribution
hist(totpexp, main = "Total Protein Expression", xlab = "Total Expression", breaks = 50)
hist(area, main = "Cell Area", xlab = "Area", breaks = 50)
hist(log10(totpexp + 1), main = "Log10 Total Expression", xlab = "Log10(Total + 1)", breaks = 50)
# Dimensionality reduction PCA & tSNE
pcs <- prcomp(mat, center=TRUE, scale=FALSE)
# Scree plot for PCA
variance_explained <- pcs$sdev^2 / sum(pcs$sdev^2) * 100
plot(1:30, variance_explained[1:30], type = "b", pch = 19,
xlab = "Principal Component",
ylab = "% Variance Explained",
main = "PCA Scree Plot")
# Cumulative variance plot
cumvar <- cumsum(variance_explained)
plot(1:30, cumvar[1:30], type = "b", pch = 19,
xlab = "Number of PCs",
ylab = "Cumulative % Variance Explained",
main = "Cumulative Variance Explained")
abline(h = 80, col = "red", lty = 2) # 80% threshold line
abline(h = 90, col = "blue", lty = 2) # 90% threshold line
# Find how many PCs to reach 80% and 90%
which(cumvar >= 80)[1] # PCs needed for 80% variance = 7
which(cumvar >= 90)[1] # PCs needed for 90% variance = 12
library(ggplot2)
library(patchwork)
library(reshape2)
# tSNE
toppcs <- pcs$x[, 1:10]
tsne <- Rtsne::Rtsne(toppcs, dims=2, perplexity = 30)
emb <- tsne$Y
rownames(emb) <- rownames(mat)
colnames(emb) <- c('tSNE1', 'tSNE2')
df_tsne <- data.frame(emb, pos)
p_tsne <- ggplot(df_tsne, aes(x = tSNE1, y = tSNE2)) +
geom_point(size = 0.3, alpha = 0.5) +
labs(title = "tSNE of CODEX spleen") +
theme_minimal()
p_tsne
# Elbow method to determine optimal number of clusters
wss <- numeric()
for (k in 1:15) {
set.seed(123)
wss[k] <- kmeans(pcs$x[, 1:10], centers = k, nstart = 10, iter.max = 50)$tot.withinss
}
plot(1:15, wss, type = "b", pch = 19,
xlab = "Number of Clusters K",
ylab = "Total WSS",
main = "Elbow Method (on PCs)")
K <- 6
set.seed(123) # Choose the number of clusters (try different values and use the elbow method)
km <- kmeans(pcs$x[, 1:10], centers = 6)
cluster <- as.factor(km$cluster)
table(cluster)
df_tsne$cluster <- cluster
df_tsne <- data.frame(pos, pcs$x[, 1:10], cluster, emb)
# Cluster visualization in tSNE and physical space
p1 <- ggplot(df_tsne, aes(x = PC1, y = PC2, color = cluster)) +
geom_point(size = 0.5, alpha = 0.5) +
labs(title = "Clusters in PCA Space") +
theme_minimal() + guides(color = guide_legend(override.aes = list(size = 3, alpha = 1)))
p2 <- ggplot(df_tsne, aes(x = tSNE1, y = tSNE2, color = cluster)) +
geom_point(size = 0.5, alpha = 0.5) +
labs(title = "Clusters in PCA space") +
theme_minimal() +
guides(color = guide_legend(override.aes = list(size = 3, alpha = 1)))
p3 <- ggplot(df_tsne, aes(x = x, y = y, color = cluster)) +
geom_point(size = 0.5, alpha = 0.5) +
labs(title = "Clusters in Physical Space") +
theme_minimal() +guides(color = guide_legend(override.aes = list(size = 3, alpha = 1)))
# Find Differentially expressed genes
# DE genes per cluster (Wilcoxon test)
de_results <- list()
for (cl in levels(cluster)) {
in_cl <- cluster == cl
pvals <- apply(mat, 2, function(g) wilcox.test(g[in_cl], g[!in_cl])$p.value)
fc <- colMeans(mat[in_cl, ]) - colMeans(mat[!in_cl, ])
res <- data.frame(gene = names(pvals), p.adj = p.adjust(pvals, "BH"), log2fc = fc)
de_results[[cl]] <- res[order(res$p.adj), ]
}
# Top 5 upregulated DE genes per cluster
for (cl in names(de_results)) {
cat("\n--- Cluster", cl, "top upregulated markers ---\n")
print(head(de_results[[cl]][de_results[[cl]]$log2fc > 0, ], 5))
}
##########################
# Top 3 markers per cluster and compute mean expression
top_markers_list <- lapply(de_results, function(tmp) {
tmp <- tmp[tmp$log2fc > 0, ]
tmp <- tmp[order(tmp$p.adj, -tmp$log2fc), ]
head(tmp$gene, 3)
})
top_markers <- unique(unlist(top_markers_list))
mean_exp <- sapply(top_markers, function(g) tapply(mat[, g], cluster, mean))
mean_exp <- as.matrix(mean_exp)
rownames(mean_exp) <- paste0("C", seq_len(nrow(mean_exp)))
colnames(mean_exp) <- top_markers
# Scale for heatmap
mean_exp_scaled <- scale(mean_exp)
# Melt for ggplot2
df_heat <- reshape2::melt(mean_exp_scaled)
colnames(df_heat) <- c("Cluster", "Protein", "Expression")
# Plot heatmap
p4 <- ggplot(df_heat, aes(x = Protein, y = Cluster, fill = Expression)) +
geom_tile() +
scale_fill_gradient2(low = "blue", mid = "white", high = "red") +
labs(title = "Heatmap of Top Markers per Cluster") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))
p4
########################################
# Visualizations to plot specific proteins and clusters in space
#######################################
plot_protein_space <- function(protein_name) {
ggplot(
data.frame(
x = pos[, 1],
y = pos[, 2],
expr = as.numeric(mat[, protein_name])
),
aes(x = x, y = y, color = expr)
) +
geom_point(size = 1.3, alpha = 0.5) +
scale_color_gradient(low = "lightgrey", high = "blue") +
labs(title = paste0(protein_name, " Cluster 5 (Red pulp) - Macrophage marker"), color = "Expr") +
theme_minimal() +
theme(legend.position = "right")
}
p5 <- plot_protein_space("CD21")
p6 <- plot_protein_space("CD3e")
p7 <- plot_protein_space("CD163")
highlight_cluster <- function(cl_num) {
df_tmp <- df_tsne
cl_label <- paste0("Cluster ", cl_num)
df_tmp$highlight <- ifelse(cluster == cl_num, cl_label, "Other")
color_vals <- c("Other" = "lightgrey")
color_vals[cl_label] <- "blue"
ggplot(df_tmp, aes(x = x, y = y, color = highlight)) +
geom_point(size = 0.2, alpha = 0.5) +
scale_color_manual(values = color_vals) +
labs(title = paste("Cluster", cl_num, "in space (Red Pulp)"), color = NULL) +
theme_minimal() +
theme(legend.position = "bottom")
}
p8 <- highlight_cluster(1)
p11 <- highlight_cluster(5)
#########kmeans clustering
library(dplyr)
top_markers <- top_markers[top_markers %in% colnames(mat)]
dot_data <- expand.grid(Cluster = paste0("C", 1:K), Protein = top_markers, stringsAsFactors = FALSE) %>%
rowwise() %>%
mutate(
MeanExpr = mean(mat[cluster == as.numeric(gsub("C", "", Cluster)), Protein]),
PctExpr = mean(mat[cluster == as.numeric(gsub("C", "", Cluster)), Protein] > 0) * 100
) %>%
ungroup()
p9 <- ggplot(dot_data, aes(x = Protein, y = Cluster, size = PctExpr, color = MeanExpr)) +
geom_point() +
scale_color_gradient(low = "lightgrey", high = "darkred") +
scale_size_continuous(range = c(1, 8)) +
labs(title = "Dot Plot: Key Markers by Cluster", size = "% Expressing", color = "Mean Expr") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p9
df_overlap <- data.frame(
x = pos[, 1],
y = pos[, 2],
CD8 = as.numeric(mat[, "CD8"]),
Lyve1 = as.numeric(mat[, "Lyve1"])
)
# Use the 75th percentile as a threshold for "high" expression
cd8_high <- df_overlap$CD8 >= quantile(df_overlap$CD8, 0.75)
lyve1_high <- df_overlap$Lyve1 >= quantile(df_overlap$Lyve1, 0.75)
df_overlap$region <- ifelse(cd8_high & !lyve1_high, "CD8 High",
ifelse(lyve1_high & !cd8_high, "Lyve1 High",
ifelse(cd8_high & lyve1_high, "Both High", "Low/Other")))
p10 <- ggplot(df_overlap, aes(x = x, y = y, color = region)) +
geom_point(size = 0.9, alpha = 0.7) +
scale_color_manual(values = c(
"CD8 High" = "red",
"Lyve1 High" = "blue",
"Both High" = "purple",
"Low/Other" = "grey80"
)) +
labs(title = "Overlap of CD8 and Lyve1 Expression", color = "Region") +
theme_minimal()
library(patchwork)
# Add margin to each plot (example: 5mm on all sides)
plots <- list(p1, p2, p3, p4, p9, p5, p8, p6, p7, p10, p11)
plots <- lapply(plots, function(p) p + theme(plot.margin = margin(5, 5, 5, 5, "mm")))
# Assign back to variables
p1 <- plots[[1]]; p2 <- plots[[2]]; p3 <- plots[[3]]
p4 <- plots[[4]]; p9 <- plots[[5]]
p5 <- plots[[6]]; p8 <- plots[[7]]; p6 <- plots[[8]]
p7 <- plots[[9]]; p11 <- plots[[11]]; p10 <- plots[[10]]
layout <- (p1 | p2 | p3) /
(p4 | p9) /
(p5 | p8 | p6) /
(p7 | p11 | p10) +
plot_layout(heights = c(1, 1.5, 1, 1.5), guides = "collect")
layout <- layout + plot_annotation(
title = "Primary white pulp cells and secondary red pulp cells in Spleen",
theme = theme(plot.title = element_text(size = 14, face = "bold"))
)
ggsave("hw5_jakinba1.png", layout, width = 28, height = 16, units = "in", dpi = 320)
( For the first part of my code, I used the same codes from HW4, and then for the latter part, I used AI prompts to optimize the overlap plot, heatmap, dot plot, and general and multi-panel plots.)