Explore by: Date | Tags
  1. Interactive Honeybadger Laf Profiles

    Interactive visualization of allelic expression patterns in single-cell RNA-seq data I recently developed an R package called HoneyBADGER that infers copy number alteration and loss-of-heterozygocity events in single cells based on persistent allelic imbalance in single-cell RNA-sequencing data. Intuitively, if a cell has a copy number alteration such as a deletion of a particular chromosomal region, then, within this region, we should only observe expression from the non-deleted allele. In contrast, in a neutral diploid...


    Continue reading …
  2. Interactive Tsne

    Exploring the impact of tSNE parameters interactively t-SNE is a popular dimensionality reduction method for, among many other things, visualizing transcriptional subpopulations from single-cell RNA-seq data. However, the appropriateness of parameters used are not often clear and may result in misleading embeddings. In particular, the choice of how many features (either genes or PCs), how many effective neighbors (perplexity), and distance metric, all affect the resulting 2D tSNE embedding. Building on previously developed Javascript-based implementations...


    Continue reading …
  3. Setting Up New Server

    Setting up on a new server I recently started my post-doctoral fellowship in a new lab. And this means getting set up on a new server! Here’s a step-by-step reminder to myself of what to do when I inevitably need to do this again. 1. Avoid typing out long logins by editing your ~/.ssh/config Host od User jefworks Hostname odyssey.rc.fas.harvard.edu ForwardX11 yes Now instead of doing ssh jefworks@odyssey.rc.fas.harvard.edu I can just do ssh od 2....


    Continue reading …
  4. Stability Testing

    Stability testing: How do you know whether your single-cell clusters are ‘real’? In single-cell RNA-seq analysis, we are often looking to identify transcriptional subpopulations that may be interpreted as distinct cell-types and subtypes or cell-states. For each cell, we measure its expression of thousands of genes, which we can use as features to cluster these cells into transcriptionally-similar clusters. But, one questions that keeps popping up in my mind: How do you know whether your...


    Continue reading …
  5. Phd Program Interview And Application Tips And Advice

    PhD program application and interview tips and advice (with sample interview questions) I recently graduated with my PhD in Bioinformatics and Integrative Genomics from Harvard University and am reflecting on my time as a grad student. I dug up some old notes from when I was still in the application and interview stage and thought they may be useful for students currently going through the application process. A little background about my application journey: I...


    Continue reading …
  6. Biomedical Analogies

    Can you solve these biomedical analogies? More fun with word2vec! See my previous post on getting started with word2vec to run these examples yourself. Can you solve these biomedical analogies? Or can a machine learning / artificial intelligence algorithm solve them better than you? Put yourself to the test! I put myself to the test and this is what I got. 1. Receptor ligand relationships Recall that IGF1R is the receptor for IGF1. IGF1 is...


    Continue reading …
  7. Interactive Visualizations With Highcharter

    Let’s start this blog post with a story. A few years ago, I went to a small meeting focused on single cell computational analyses hosted at Cold Spring Harbor. I was giving a talk on my work and had the chance to participate in a group forum on the future of the field, in particular the computational challenges and opportunities, afterwards. A bioinformatician, Timothy Hu, asked what I thought was a spectacular and fundamental question:...


    Continue reading …
  8. Fun With Word2vec

    Fun with Word2vec: Exploring the application of deep learning on biomedical literature I’m currently learning more about immunology so I can apply it to my analyses of the tumor micro-environment. However, there’s quite a lot of background literature to catch up on! I must have spent a whole day just trying to figure out all of these CD markers. What expresses CD31415926 again? Has this cell-type been characterized before? If so, what are some other...


    Continue reading …
  9. Simpsons Paradox

    Illustrating the impact of Simpson’s Paradox and the need to Single Cell measurements Simpson’s paradox is a phenomenon in probability and statistics in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. It is often cited as a reason for single cell measurements. Here, we illustrate Simpson’s paradox with an explicit hypothetical example relevant to single cell RNA-seq. Consider a tumor sample pre- and post-...


    Continue reading …
  10. Barcode Doublets

    Barcode doublets in single cell RNA-seq data When we perform single cell RNA-seq, we expect that the derived gene expression quantifications come from a single cell. Publications have shown that this is not always the case, as two cells may be physically captured together, resulting in a physical ‘doublet’ or ‘multiplet’ capture that is estimated to occur at a rate of 1.6%. Thus the derived gene expression quantifications would be a reflection of two cells,...


    Continue reading …