Spatial Distribution of Gene Expression-CD3E


Nidhi Soley
I am a first-year BME Ph.D. student at Johns Hopkins University. My research interest lies at the intersection of ML, digital health, & precision medicine. In my free time, I enjoy painting.

Spatial Distribution of Gene Expression-CD3E

What data types are you visualizing?

I am visualizing quantitative data, gene expression levels of the CD3E, which is of the total gene expression at each spot, and spatial data of aligned x and y coordinates according to the barcode.

What data encoding are you using to visualize these data types?

I am using the geometric primitive of points to show each aligned position with both x and y coordinates. To encode CD3E gene expression level, I am using the visual channel of color saturation and position (x and y axis). Specifically, lower gene expression levels are displayed in more purple color and higher gene expression levels are displayed in more yellow-green color. To encode the total gene expression at each spot, I am using the visual channel of size and position (x and y axis). Higher total gene expression are associated with larger sizes of points.

What type of data visualization is this? What about the data are you trying to make salient through this data visualization?

I aim to highlight the variations in CD3E gene expression. My data visualization seeks to make salient the spatial distribution and relative expression levels of the CD3E gene within a region, and the total transcriptional activity at each spot. The visualization provides viewers an understanding of the relationship between CD3E expression levels and the overall transcriptional activity associated with certain spatial information.

What Gestalt principles have you applied towards achieving this goal if any?

Gestalt principles applied is similarity. Points with similar colors or size (gene expression) are perceived as related groups.

Please share the code you used to reproduce this data visualization.

#inference from the data
# Barcode: Unique identifier for each spot or cell.
# Aligned_x, Aligned_y: Coordinates or positions of the spots in a spatial context, which might represent the location of each cell in a tissue section.
# Genes: Each gene is represented by a column in the matrix.
# SAMD11, NOC2L, KLHL17, ... : These are gene names.


library(ggplot2)
library(viridis)
data<- read.csv('/Users/nidhisoley/Desktop/GDataViz/eevee.csv.gz', row.names=1)
data$total_expression <- rowSums(data[, -(1:3)])
data$CD3E_log10 <- log10(data$CD3E + 1) 
# plot the data
ggplot(data) + 
  geom_point(aes(x = aligned_x, y = aligned_y, 
                 col = CD3E_log10, 
                 size = total_expression), 
             alpha = 0.5) +
  scale_color_viridis_c(option = "D", end = 0.9, 
                        name = "Log10(CD3E Expression)") +
  scale_size_continuous(name = "Total Gene Expression", 
                        range = c(1, 6)) +
  labs(
    title = "Spatial Distribution of Gene Expression",
    x = "Aligned X", 
    y = "Aligned Y"
  ) +
  theme_minimal()+
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank())