Analysis of most expressed gene (POSTN): PCA, t-SNE, and UMAP Plots

I'm a third year PhD candidate in the Wirtz Lab. My research focuses on generating 3D reconstructions of anatomical tissues with single-cell resolution to study disease progression.

Analysis of most expressed gene (POSTN): PCA, t-SNE, and UMAP Plots

What data types are you visualizing?

I am visualizing quantitative data of the most expressed gene POSTN for each cell, and spatial data regarding the x,y positions for each cell in the embedded 2D space computed by PCA, t-SNE, and UMAP plots that group cells based in gene expression similarities.

What data encodings (geometric primitives and visual channels) are you using to visualize these data types?

I am using the geometric primitive of points to represent each cell across the different dimensionality reduction approaches. To encode the spatial positions of each cell in the resulting 2D space, I am using the visual channel of positions. To encode expression count of the POSTN gene, I am using the visual channel of color from black to red colors.

What about the data are you trying to make salient through this data visualization?

I am showing how the expression of the top gene POSTN can be represented across different dimensionality reduction plots (PCA, t-sne, and UMAP).

What Gestalt principles or knowledge about perceptiveness of visual encodings are you using to accomplish this?

I am using the Gestalt principle of continuity to visualize clusters of cells with similar gene expression profile. I am using the Gestalt principle of similarity to identify POSTN gene expression among the clusters of cells.

Code

#### author #### 
# Andre Forjaz
# jhed: aperei13

#### load packages #### 
library(Rtsne)
library(ggplot2)
library(patchwork)
library(gridExtra)
library(umap)

#### Input ####
outpth <-'~/genomic-data-visualization-2024/homework/hw3/'
data <- read.csv('~/genomic-data-visualization-2024/data/pikachu.csv.gz', row.names=1)

# data preview
data[1:5,1:8]

# coordinates
coords <- data[,2:3]

# gene expression
gexp <- data[, 6:ncol(data)]
dim(gexp)
colnames(gexp[,1:8])

#### Select top 1 gene ####
topgene <- names(sort(apply(gexp, 2, var), decreasing=TRUE)[1]) 
gexpfilter <- gexp[,topgene]
colnames(gexpfilter)
dim(gexpfilter)

#### Run pca ####
pca_res  <- prcomp(gexp)

# Make PCA plot
df_top1 <- data.frame(PC1 = pca_res$x[,1],
                       PC2 = pca_res$x[,2],
                       Gene = gexpfilter)

p1 <- ggplot(df_top1) + 
      geom_point(aes(PC1, PC2, color=Gene)) +
      scale_color_gradient(low = "black", high = "red",
                           guide = guide_colorbar(barwidth = 0.7))+
      labs(color = topgene, size = 0.5, alpha = 0.6) +
      ggtitle("PCA")+
      theme_classic()

p1
#### Run tsne ####
tsne_res <- Rtsne(gexp, perplexity = 30, dims = 2)

df_tsne <- data.frame(tSNE1 = tsne_res$Y[, 1],
                                tSNE2 = tsne_res$Y[, 2],
                                Gene = gexpfilter)


p2 <-ggplot(df_tsne) +
     geom_point(aes(tSNE1, tSNE2, color = Gene)) +
     scale_color_gradient(low = "black", high = "red",
                          guide = guide_colorbar(barwidth = 0.7))+
     labs(color = topgene, size = 0.5, alpha = 0.6,) +
     ggtitle("t-SNE") + 
     theme_classic()

p2

#### Run UMAP ####
umap_res <- umap(gexp, n_neighbors = 15, n_components = 2)

df_umap <- data.frame(UMAP1  = umap_res$layout[, 1],
                      UMAP2  = umap_res$layout[, 2],
                      Gene = gexpfilter)


p3 <-ggplot(df_umap) +
     geom_point(aes(UMAP1 , UMAP2, color = Gene)) +
     scale_color_gradient(low = "black", high = "red",
                       guide = guide_colorbar(barwidth = 0.7))+
     labs(color = topgene, size = 0.5, alpha = 0.6,) +
     ggtitle("UMAP") + 
     theme_classic()

p3

#### Plot results ####

p1 + p2 + p3+
  plot_annotation(tag_levels = 'a') +
  plot_layout(ncol = 3)

combined_plot <-p1 + p2 + p3+
                plot_annotation(tag_levels = 'a') +
                plot_layout(ncol = 3)

#### Save results ####
ggsave(paste0(outpth, "hw3_aperei13.png"), 
       plot = combined_plot,
       width = 12,
       height = 4,
       units = "in") 

####  References #### 
# 1- https://rpubs.com/crazyhottommy/pca-in-action
# 2- https://bookdown.org/sjcockell/ismb-tutorial-2023/practical-session-2.html

05 Feb 2024

« PCA with Non-normalized and Normalized Data Visualizing Changes in Clustering from t-SNE Transformation Across Key Principal Component Pairs: Exploring Extremes and Intermediates »