Apr 10, 2018
Exploring the impact of tSNE parameters interactively
I downloaded ~2000 single cell count matrix from a PBMC dataset (Donor A) from 10X Genomics. I used R and the MUDAN package I'm developing to perform some of the heavy lifting: normalization, PCA dimensionality reduction, and subsetting to get the data into a form that wouldn't kill your browser (important later). I also used a graph-based community detection method to annotation clusters so we can see how different tSNE parameters segregate them later.
library(MUDAN) data(pbmcA) ## filter out poor genes and cells cd <- cleanCounts(pbmcA, min.reads = 10, min.detected = 10, verbose=FALSE) ## CPM normalization mat <- normalizeCounts(cd, verbose=FALSE) ## variance normalize, identify overdispersed genes matnorm.info <- normalizeVariance(mat, details=TRUE, verbose=FALSE) ## log transform matnorm <- log10(matnorm.info$mat+1) ## 30 PCs on overdispersed genes pcs <- getPcs(matnorm[matnorm.info$ods,], nGenes=length(matnorm.info$ods), nPcs=100, verbose=FALSE) ## graph-based community detection; over cluster with small k com <- getComMembership(pcs, k=10, method=igraph::cluster_louvain, verbose=FALSE) ## write out subset of data sub <- 1:500 m <- pcs[sub,] df <- data.frame(group=com[rownames(m)], m) write.csv(df, file='pbmcA.txt', quote=TRUE)
Try it out!
Using your own data
You can either look through the source code of this blog post, or check out: http://jef.works/tsne-online/.
tSNE is a visualization tool. We must be aware of the impact of parameters on our visualizations and not over-interpret clusters that appear coherent in our tSNE embeddings that may not be reflective of actually coherent or stable subpopulations in higher-dimensional space.
(thanks to Fritz Lekschas for sharing)
- I use R to (try to) figure out the cost of medical procedures by analyzing insurance data from the Transparency in Coverage Final Rule on 12 September 2022
- Annotating STdeconvolve Cell-Types with ASCT+B Tables on 30 August 2022
- Deconvolution vs Clustering Analysis: An exploration via simulation on 11 July 2022
- Coloring SVGs in R on 17 June 2022
- Deconvolution vs Clustering Analysis for Multi-cellular Pixel-Resolution Spatially Resolved Transcriptomics Data on 03 May 2022
- Exploring UMAP parameters in visualizing single-cell spatially resolved transcriptomics data on 19 January 2022
- Animating RNA velocity with moving arrows on 15 October 2021
- A tale of two cell populations: integrating RNA velocity information in single cell transcriptomic data visualization with VeloViz on 06 October 2021
- Story-telling with Data Visualization on 12 August 2021
- Complementing single-cell clustering analysis with MERINGUE spatial analysis on 21 June 2021
- Exploring UMAP parameters in visualizing single-cell spatially resolved transcriptomics data
- Animating RNA velocity with moving arrows
- A tale of two cell populations: integrating RNA velocity information in single cell transcriptomic data visualization with VeloViz
- Story-telling with Data Visualization