Cell types characterization

Hi this is a Sky Li.

Overview

Here, we make a visualization consists of five panels:

tSNE plot, where each point represents a cell. Cells are color-coded by cluster assignment, with Cluster 7 distinctly highlighted, showing how it separates from other clusters in reduced-dimensional space. The cells are clustered by k-means clustering.
Spatial Distribution of Cluster 7, which maps the cells in physical spatial coordinates
Volcano Plot of Differentially Expressed genes in Cluster 7, showing log-fold change on the x-axis and statistical significance (-log10 adjusted p-value) on the y-axis. The strongest DE genes include ITM2C, MZB1, SEC11C, SLAMF7, and TENT5C.
tSNE of ITM2C Expression. The color gradient represents the intensity of ITM2C expression, highlighting that Cluster 7 is highly enriched for this gene.
Spatial Map of ITM2C Expression, suggesting cluster 7 cell types is more enriched and free flowing across the tissue.

Cluster 7 characterization

Cluster 7 is highly enriched for ITM2C (logFC=1.64, adjusted p-val=0). ITM2C has been associated with plasma cells, memory B cells, and neuronal differentiation. Together with other highly expressed genes including MZB1, SEC11C, SLAMF7 and TENT5C, it suggests that cluster 7 corresponds to plasma cells, which are terminally differentiated B cells responsible for antibody production. Plasma cells usually localized in bone marrow or lymphoid tissues, which correspond to the diffused spatial expression pattern [1].

[1] Pilcher, W., Thomas, B. E., Bhasin, S. S., Jayasinghe, R. G., Yao, L., Gonzalez-Kozlova, E., … & Bhasin, M. (2023). Cross center single-cell RNA sequencing study of the immune microenvironment in rapid progressing multiple myeloma. npj Genomic Medicine, 8(1), 3.

Code (paste your code in between the ``` symbols)

library(ggplot2)
library(ggpubr)
library(dplyr)
library(cluster)
library(scales)
library(Rtsne)

file <- "/Users/sky2333/Downloads/genomic-data-visualization-2025/data/pikachu.csv"
data <- read.csv(file)

gene_expr <- data[, 7:ncol(data)]
sample_names <- rownames(gene_expr)
gene_names <- colnames(gene_expr)

gene_expr <- t(apply(log1p(gene_expr / rowSums(gene_expr) * 1e6), 1, scale))
rownames(gene_expr) <- sample_names
colnames(gene_expr) <- gene_names

#pca_result <- prcomp(gene_expr, center = TRUE, scale. = TRUE)
pca_result <- Rtsne(gene_expr, perplexity = 30, theta = 0.5, dims = 2, pca = TRUE, verbose = TRUE)

data$PC1 <- pca_result$Y[, 1]
data$PC2 <- pca_result$Y[, 2]

k <- 8
kmean_result <- kmeans(gene_expr, centers = k, nstart = 25)

data$Cluster <- as.factor(kmean_result$cluster)

pca_plot <- ggplot(data, aes(x = PC1, y = PC2, color = Cluster)) +
    geom_point(alpha = 0.5, size = 0.5) +
    scale_color_manual(values = hue_pal()(k)) +
    labs(title = "tSNE of Gene Expression with K-Means Clusters", x = "PC1", y = "PC2", color = "Cluster") +
    theme_minimal()

spatial_plot <- ggplot(data, aes(x = aligned_x, y = aligned_y, color = Cluster)) +
    geom_point(alpha = 0.5, size = 0.5) +
    scale_color_manual(values = hue_pal()(k)) + 
    labs(title = "Spatial Map Colored by Gene Expression Clusters", x = "X Coordinate", y = "Y Coordinate", color = "Cluster") +
    theme_minimal()


cluster_interest <- "7"
genes_pvals <- sapply(gene_names, function(gene) {
    wilcox.test(
        gene_expr[data$Cluster == cluster_interest, gene], 
        gene_expr[data$Cluster != cluster_interest, gene]
    )$p.value
})

genes_pvals_adj <- p.adjust(genes_pvals, method = "fdr")

logFC <- apply(gene_expr, 2, function(gene) {
    mean(gene[data$Cluster == cluster_interest]) - mean(gene[data$Cluster != cluster_interest])
})

de_genes <- data.frame(Gene = gene_names, p_value = genes_pvals, adj_p_value = genes_pvals_adj, logFC = logFC)

volcano_plot <- ggplot(de_genes, aes(x = logFC, y = -log10(adj_p_value))) +
    geom_point(alpha = 0.5) +
    geom_vline(xintercept = c(-1, 1), linetype = "dashed", color = "red") +
    geom_hline(yintercept = -log10(0.05), linetype = "dashed", color = "blue") +
    theme_minimal() +
    labs(title = "Volcano Plot of DE Genes", x = "Log FC", y = "-log Adjusted p-val")


top_genes <- arrange(filter(de_genes, adj_p_value < 0.05 & abs(logFC) > 1), adj_p_value)
selected_gene <- top_genes$Gene[1]

pca_gene_plot <- ggplot(data, aes(x = PC1, y = PC2, color = gene_expr[, selected_gene])) +
    geom_point(alpha = 0.5, size = 0.5) +
    scale_color_gradient(low = "blue", high = "red") +
    labs(title = paste("tSNE of expression of", selected_gene), x = "PC1", y = "PC2", color = "Expression") +
    theme_minimal()
    
spatial_gene_plot <- ggplot(data, aes(x = aligned_x, y = aligned_y, color = gene_expr[, selected_gene])) +
    geom_point(alpha = 0.5, size = 0.5) +
    scale_color_gradient(low = "blue", high = "red") +
    labs(title = paste("Spatial Map of expression of", selected_gene), x = "X Coordinate", y = "Y Coordinate", color = "Expression") +
    theme_minimal()

final_plot <- ggarrange(
    pca_plot, spatial_plot, volcano_plot, pca_gene_plot, spatial_gene_plot,
    ncol = 3, nrow = 2,
    heights = c(6, 6, 6.2)
)

ggsave("plot.jpeg", final_plot, width = 20, height = 10, dpi = 300)

10 Feb 2025

« Spatial Transcriptomics Reveals a Distinct Epithelial Cell Population Defined by ELF3 Expression: A Multi-Dimensional Analysis of the Cluster in Interest Homework 3: Differentially Expressed Genes analysis »