HWEC1: Exploring Differences Between Linear and Non-linear Dimensionality Reduction

Hi, I am Alexandra, I am a MS student in BME. I love to learn more about genetics and spatial techologies

HWEC1: Exploring Differences Between Linear and Non-linear Dimensionality Reduction

1. Figure Description.

Figure State 1: Eevee’s cell spots in PCA space, with x axis for PC1 andy y axis for PC2. Figure State 2: Eevee’s cell spots in t-SNE space, with x axis for X1 and y axis for X2.

2. Differences between linear and nonlinear dimensionality reduction

In Figure State 1, I performed PCA for dimensionality reduction on Eevee’s spatial omics spots and the top 2000 genes. In the 2D PC1 vs. PC2 visualization, the cell spots are closely positioned, forming a large, cohesive cluster. In Figure State 2, the X1 Vs. X2 in t-SNE visualization also shows that some cell spots cluster with similar ones. However, clusters 6 and 7 are distributed across several disconnected groups. Also, there is greater dispersion among cell spots within individual clusters in the PCA plot.

3. Code

library(gganimate)
library(ggplot2)

file <- "~/Downloads/eevee.csv.gz"
data <- read.csv(file)
data[1:5,1:10]

pos <- data[, 3:4]
rownames(pos) <- data$cell_id
gexp <- data[, 5:ncol(data)]
rownames(gexp) <- data$barcode
head(gexp)
head(pos)

gexp[1:5, 1:10]
dim(gexp)

# limiting to top 1000 most highly expressed genes
topgenes <- names(sort(colSums(gexp), decreasing=TRUE)[1:2000])
gexpsub <- gexp[,topgenes]
gexpsub[1:5,1:5]
dim(gexpsub)

# normalization 
norm <- gexpsub/rowSums(gexpsub) *10000
loggexp <- log10(norm+1)
dim(loggexp)


# pick k = 7
com <- kmeans(loggexp, centers=7)
clusters <- com$cluster
clusters <- as.factor(clusters) 
names(clusters) <- rownames(loggexp)
head(clusters)


#PCA
pcs <- prcomp(loggexp)
df1 <- data.frame(pcs$x[,1:2],clusters)
colnames(df1) <- c('x', 'y','Clusters')
ggplot(df1, aes(x=x, y=y, col= Clusters)) + geom_point()


#t-SNE
emb <- Rtsne::Rtsne(loggexp)
df2 <- data.frame(emb$Y, clusters)
colnames(df2) <- c('x', 'y','Clusters')
ggplot(df2, aes(x=x, y=y, col= Clusters)) + geom_point()

# combine two df
df <- rbind(
  cbind(df1, order = "PCA Space"),
  cbind(df2, order = "t-SNE Space")
)
dim(df1)
dim(df2)
head(df)
dim(df)
p <- ggplot(df, aes(x=x, y=y, col=Clusters)) + geom_point()

# make animation
anim <- p + transition_states(order) + view_follow() + ease_aes('linear') + labs(title = '{closest_state}') 
animate(anim, height=400, width=500) 

25 Feb 2025

HW EC1

« Interrogating Cell Type with CODEX Spleen Dataset Uncovering spleen tissue type »