Apr 10, 2018
Exploring the impact of tSNE parameters interactively
I downloaded ~2000 single cell count matrix from a PBMC dataset (Donor A) from 10X Genomics. I used R and the MUDAN package I'm developing to perform some of the heavy lifting: normalization, PCA dimensionality reduction, and subsetting to get the data into a form that wouldn't kill your browser (important later). I also used a graph-based community detection method to annotation clusters so we can see how different tSNE parameters segregate them later.
library(MUDAN) data(pbmcA) ## filter out poor genes and cells cd <- cleanCounts(pbmcA, min.reads = 10, min.detected = 10, verbose=FALSE) ## CPM normalization mat <- normalizeCounts(cd, verbose=FALSE) ## variance normalize, identify overdispersed genes matnorm.info <- normalizeVariance(mat, details=TRUE, verbose=FALSE) ## log transform matnorm <- log10(matnorm.info$mat+1) ## 30 PCs on overdispersed genes pcs <- getPcs(matnorm[matnorm.info$ods,], nGenes=length(matnorm.info$ods), nPcs=100, verbose=FALSE) ## graph-based community detection; over cluster with small k com <- getComMembership(pcs, k=10, method=igraph::cluster_louvain, verbose=FALSE) ## write out subset of data sub <- 1:500 m <- pcs[sub,] df <- data.frame(group=com[rownames(m)], m) write.csv(df, file='pbmcA.txt', quote=TRUE)
Try it out!
Using your own data
You can either look through the source code of this blog post, or check out: http://jef.works/tsne-online/.
tSNE is a visualization tool. We must be aware of the impact of parameters on our visualizations and not over-interpret clusters that appear coherent in our tSNE embeddings that may not be reflective of actually coherent or stable subpopulations in higher-dimensional space.
(thanks to Fritz Lekschas for sharing)
- Visually Summarizing Differential Gene Expression on 25 April 2018
- Interactive Honeybadger Laf Profiles on 15 April 2018
- Interactive Tsne on 10 April 2018
- Setting Up New Server on 10 March 2018
- Stability Testing on 28 February 2018
- Phd Program Interview And Application Tips And Advice on 26 February 2018
- Biomedical Analogies on 16 February 2018
- Interactive Visualizations With Highcharter on 10 February 2018
- Fun With Word2vec on 06 February 2018
- Simpsons Paradox on 05 January 2018