Interactive Visualizations With Highcharter

Let’s start this blog post with a story. A few years ago, I went to a small meeting focused on single cell computational analyses hosted at Cold Spring Harbor. I was giving a talk on my work and had the chance to participate in a group forum on the future of the field, in particular the computational challenges and opportunities, afterwards. A bioinformatician, Timothy Hu, asked what I thought was a spectacular and fundamental question: why is it so hard to reproduce figures from publications? Especially when, in theory, all the code and data for generating the figure could (and should) be available? I talked to Tim afterwards and we envisioned a future where all figures in electronic journals would be interactive, with all relevant data and code used to generate the figure actually being embedded in the figure itself. We differed on the implementation details of such a vision - Tim wanted a system with lots of server-side operations on the cloud while I believed a system with primarily client-side operations with Javascript will be more sustainable. But back then, the vision seemed like just a dream. How would I even implement such a thing? Where do I even begin?

Now, a few years later and being much wiser (or at least more skilled programming-wise), it’s time to revisit and reassess the feasibility of this vision by attempting to implement it: Can I create a blog post with all interactive figures that have all data and code used to create the figures embedded in the blog post itself?

For demonstrative purposes, let’s analyze a PBMC dataset from 10X Genomics (pbmcA). The dataset is available as part of the MUDAN R package I’m developing.

##devtools::install_github('JEFworks/MUDAN) ## install
library(MUDAN)
data(pbmcA)

We will use highcharter, an excellent R package by Joshua Kunst for creating highcharts.js plots in R, to create plots that we can then export as javascript and embed in this blog post.

##install.packages(highcharter)
library(highcharter)

We can start with some simple filtering and QC visualizations.

## Filter out poor quality cells and genes
cd <- cleanCounts(pbmcA, min.lib.size=1000, max.lib.size=5000, min.read=10, min.detected=5)

## Look at some QC
hchart(log10(colSums(cd>0)+1)) %>%
  hc_title(text = "log10(nUMI+1)") %>%
  hc_legend(enabled = FALSE)
hchart(log10(rowSums(cd)+1)) %>%
  hc_title(text = "log10(nCells+1)") %>%
  hc_legend(enabled = FALSE)

Ok. Our data looks quite reasonable and normally distributed. Now, let’s do our standard single cell analysis and get to a tSNE embedding with detected groups that we want to visualize.

## Normalize to CPMs
mat <- normalizeCounts(cd)
## Variance normalize and log transform
matnorm.info <- normalizeVariance(mat, details=TRUE)
## Take overdispersed genes only
matnorm <- log10(matnorm.info$mat[matnorm.info$ods,]+1)
## Get top 30 PCs on all overdispersed genes
pcs <- getPcs(matnorm, nGenes=nrow(matnorm), nPcs=30)
## Graph-based community detection
com <- getComMembership(pcs, k=30, method=igraph::cluster_infomap)
## tSNE embedding
emb <- Rtsne::Rtsne(t(pcs), is_distance=FALSE, perplexity=30, verbose=TRUE)$Y
rownames(emb) <- colnames(matnorm)

## Plot embedding with infomap group labels
df <- data.frame(x=emb[,1], y=emb[,2], group=com, name=rownames(emb))
hchart(df, "scatter", hcaes(x, y, group=group, name=name)) %>%
  hc_tooltip(headerFormat = "", pointFormat = "<b>group</b>: {series.name} <br> <b>cell</b>: {point.name}")

Hey, look some clusters! (play with the interactive plot by toggling the visibility of clusters)

We can perform differential expression analysis among these clusters and visualize significantly differentially expressed genes as a heatmap.

## Differential expression analysis
dg <- getDifferentialGenes(t(cd), com)
dg.sig <- lapply(dg, function(x) {
  ## get only significantly upregulated
  x <- x[x$Z>3,]
  x <- x[x$highest,]
  rownames(x)
})
## average within groups
mat.summary <- do.call(cbind, lapply(levels(com), function(ct) {
  cells <- which(com==ct)
  rowMeans(matnorm[, cells])
}))
colnames(mat.summary) <- levels(com)
dim(mat.summary)
## cluster groups
hc <- hclust(dist(t(mat.summary)))
dg.sig.genes <- na.omit(intersect(unlist(dg.sig[hc$labels[hc$order]]), rownames(matnorm)))
m <- mat.summary[dg.sig.genes,hc$labels[hc$order]]
## scale for better visualization
m <- t(scale(t(m)))
## reshape for highcharter
mm <- reshape2::melt(m)
colnames(mm) <- c('gene', 'group', 'value')

## Visualize as a heatmap
hchart(m, "heatmap", hcaes(x = gene, y = group, value = value)) %>% 
  hc_colorAxis(stops = color_stops(10, rev(RColorBrewer::brewer.pal(10, "RdBu")))) %>% 
  hc_title(text = "Differentiall Expressd Genes") %>% 
  hc_legend(layout = "vertical", verticalAlign = "top", align = "right", valueDecimals = 0) %>%
  hc_plotOptions(series = list(dataLabels = list(enabled = FALSE)))

Heatmaps are always a bit hard to deal with because the rownames are just so dense. Now with an interactive plot, you can just hover to see the gene name and group assignment!

We can then dig into the expression of one of these genes and visualize as either a box-plot across our identified groups or in the tSNE embedding.

## Boxplot of one diff gene
gexp <- log10(mat['CD3E', ]+1)
hcboxplot(x = gexp, var = com,  outliers = FALSE) %>%
  hc_chart(type = "column") %>%
  hc_title(text = "CD3E Expression") 

## Plot expression in tSNE
colors <- gexp
colors <- colors/max(colors)
gradientPalette <- colorRampPalette(c("gray80", "red"), space = "Lab")(1024)
cols <- gradientPalette[colors * (length(gradientPalette) - 1) + 1]
names(cols) <- names(gexp)
df <- data.frame(x=emb[,1], y=emb[,2], value=gexp, color=cols, name=rownames(emb))
hchart(df, "scatter", hcaes(x, y, color=color, value=value, name=name)) %>%
  hc_tooltip(headerFormat = "", pointFormat = "<b>cell</b>: {point.name} <br> <b>CD3E exp</b>: {point.value}")