Apr 10, 2018
Exploring the impact of tSNE parameters interactively
I downloaded ~2000 single cell count matrix from a PBMC dataset (Donor A) from 10X Genomics. I used R and the MUDAN package I'm developing to perform some of the heavy lifting: normalization, PCA dimensionality reduction, and subsetting to get the data into a form that wouldn't kill your browser (important later). I also used a graph-based community detection method to annotation clusters so we can see how different tSNE parameters segregate them later.
library(MUDAN) data(pbmcA) ## filter out poor genes and cells cd <- cleanCounts(pbmcA, min.reads = 10, min.detected = 10, verbose=FALSE) ## CPM normalization mat <- normalizeCounts(cd, verbose=FALSE) ## variance normalize, identify overdispersed genes matnorm.info <- normalizeVariance(mat, details=TRUE, verbose=FALSE) ## log transform matnorm <- log10(matnorm.info$mat+1) ## 30 PCs on overdispersed genes pcs <- getPcs(matnorm[matnorm.info$ods,], nGenes=length(matnorm.info$ods), nPcs=100, verbose=FALSE) ## graph-based community detection; over cluster with small k com <- getComMembership(pcs, k=10, method=igraph::cluster_louvain, verbose=FALSE) ## write out subset of data sub <- 1:500 m <- pcs[sub,] df <- data.frame(group=com[rownames(m)], m) write.csv(df, file='pbmcA.txt', quote=TRUE)
Try it out!
Using your own data
You can either look through the source code of this blog post, or check out: http://jef.works/tsne-online/.
tSNE is a visualization tool. We must be aware of the impact of parameters on our visualizations and not over-interpret clusters that appear coherent in our tSNE embeddings that may not be reflective of actually coherent or stable subpopulations in higher-dimensional space.
(thanks to Fritz Lekschas for sharing)
- Automate testing of your R package using Travis CI, Codecov, and testthat on 17 February 2019
- Online bargain-hunting in R with rvest on 12 January 2019
- Interactive Exploration Of The Gender Pay Gap on 15 December 2018
- Nih F99 K00 Grant Tips Example And Personal Experience on 31 October 2018
- Single Cell Clustering Comparison on 28 June 2018
- Get your R package on CRAN in 10 steps on 18 June 2018
- Top 10 Must Use Terms To Get Your Next Nih Grant Funded on 11 June 2018
- Data Driven Faculty Job Search on 07 June 2018
- Visually Summarizing Differential Gene Expression on 25 April 2018
- Interactive Honeybadger Laf Profiles on 15 April 2018