Apr 10, 2018
Exploring the impact of tSNE parameters interactively
I downloaded ~2000 single cell count matrix from a PBMC dataset (Donor A) from 10X Genomics. I used R and the MUDAN package I'm developing to perform some of the heavy lifting: normalization, PCA dimensionality reduction, and subsetting to get the data into a form that wouldn't kill your browser (important later). I also used a graph-based community detection method to annotation clusters so we can see how different tSNE parameters segregate them later.
library(MUDAN) data(pbmcA) ## filter out poor genes and cells cd <- cleanCounts(pbmcA, min.reads = 10, min.detected = 10, verbose=FALSE) ## CPM normalization mat <- normalizeCounts(cd, verbose=FALSE) ## variance normalize, identify overdispersed genes matnorm.info <- normalizeVariance(mat, details=TRUE, verbose=FALSE) ## log transform matnorm <- log10(matnorm.info$mat+1) ## 30 PCs on overdispersed genes pcs <- getPcs(matnorm[matnorm.info$ods,], nGenes=length(matnorm.info$ods), nPcs=100, verbose=FALSE) ## graph-based community detection; over cluster with small k com <- getComMembership(pcs, k=10, method=igraph::cluster_louvain, verbose=FALSE) ## write out subset of data sub <- 1:500 m <- pcs[sub,] df <- data.frame(group=com[rownames(m)], m) write.csv(df, file='pbmcA.txt', quote=TRUE)
Try it out!
Using your own data
You can either look through the source code of this blog post, or check out: http://jef.works/tsne-online/.
tSNE is a visualization tool. We must be aware of the impact of parameters on our visualizations and not over-interpret clusters that appear coherent in our tSNE embeddings that may not be reflective of actually coherent or stable subpopulations in higher-dimensional space.
(thanks to Fritz Lekschas for sharing)
- Animating RNA velocity with moving arrows on 15 October 2021
- A tale of two cell populations: integrating RNA velocity information in single cell transcriptomic data visualization with VeloViz on 06 October 2021
- Story-telling with Data Visualization on 12 August 2021
- Complementing single-cell clustering analysis with MERINGUE spatial analysis on 21 June 2021
- Randomly Generating Music with R on 19 April 2021
- Animating the Cell Cycle on 28 December 2020
- Using R To Find The Missing Faculty on 30 November 2020
- Using scVelo in R using Reticulate on 25 August 2020
- A Guide to Responding to Scientific Peer Review on 17 June 2020
- Quickly Creating Pseudobulks on 06 April 2020