Welcome

Welcome to the Course Website for EN.580.428 Genomic Data Visualization!

As the primary mode through which analysts and audience members alike consume data, data visualization remains an important hypothesis generating and analytical technique in data-driven research to facilitate new discoveries. However, if done poorly, data visualization can also mislead, bias, and slow down progress. This hands-on course will cover the principles of perception and cognition relevant for data visualization and apply these principles to genomic data, including large-scale single-cell and spatially-resolved omics datasets, using the R statistical programming language. Students will be expected to complete class readings, create weekly data visualizations as homework assignments, and make a major class presentation.

Course Information

Course Staff: Prof. Jean Fan and Caleb Hallinan
Lectures: 8:00am-9:50am Monday, Wednesday, and Friday. See Canvas for location details.
Office Hours: 10:00am-10:50am Monday, Wednesday, and by request. See Canvas for location details.

Course Details
☞ see Course tab


All Visualizations

Identification of White Pulp Tissue Structures within CODEX Dataset

1. Create a data visualization and write a description to convince me that your interpretation is correct.

CODEX dataset analysis

We conducted normalization, standardization, dimensionality reduction, k-means clustering, and differential expression analysis to reveal two distinct cell clusters within the CODEX data. tSNE plots and marker expression heatmaps is plotted...

Interpreting CODEX data

Describe your figure briefly so we know what you are depicting (you no longer need to use precise data visualization terms as you have been doing). The data was normalized...

Exploring differences between linear and non-linear dimensionality reduction methods

Description The visualization compares three different dimensionality reduction techniques—PCA (Principal Component Analysis), t-SNE (t-Distributed Stochastic Neighbor Embedding), and UMAP (Uniform Manifold Approximation and Projection)—to visualize high-dimensional gene expression data from...

Question 4: Exploring the Effect of Varying Principal Components for Non-Linear Dimensionality Reduction

For this project, I created an animation using gganimate to visualize the effect of varying the number of principal components before applying t-SNE. I used the Pikachu dataset. First, I...

EC1: Comparing PCA and tSNE clustering methods with gganimate

What’s vizualized? A gif visualizing the cluster derived with kmeans in reduced dimensional space using linear vs non-linear methods (PCA and tSNE), as well as in the original physical space....

hw 4 DEG analysis

Description of analysis The modification from the HW3 using eevee data set is that I changed the selection of cluster based on the overall cluster visualization in physical space. Method-wise,...

HW4: Exploring Cell Type with Differentially upregulated CD4

1. Figure Description. Figure A: Total within-cluster sum of squares using different value of k. Figure B: Cluster 2 is highlighted in red in PCA space, while the remaining three...

Identifying B cell markers in imaging dataset

To begin analyzing the imaging dataset, I decided to normalize by cells’ areas, rather than use count-based normalization. Afterwards, I clustered my normalized gene expression data using k-means and determined...

Analyzing PDGFRB Gene Expression in Pikachu Dataset

Visualization Summary In this visualization, I analyzed a cluster within the Pikachu dataset responsible for cell growth, and likely cancer. This was a major change from the Eevee sequencing dataset...

Validating Sequencing-based 10x Visium Identification of T Cell Population with Imaging-Based Spatial Transcriptomics

From last week’s results and selected cluster, I identified the genes LTB, CD247, and IL7R , all of which suggest a T cell population (or similar immune cell population comprising...