1. Biomedical Analogies

    Can you solve these biomedical analogies? More fun with word2vec! See my previous post on getting started with word2vec to run these examples yourself. Can you solve these biomedical analogies? Or can a machine learning / artificial intelligence algorithm solve them better than you? Put yourself to the test! I put myself to the test and this is what I got. 1. Receptor ligand relationships Recall that IGF1R is the receptor for IGF1. IGF1 is...


    Continue reading …
  2. Interactive Visualizations With Highcharter

    Let’s start this blog post with a story. A few years ago, I went to a small meeting focused on single cell computational analyses hosted at Cold Spring Harbor. I was giving a talk on my work and had the chance to participate in a group forum on the future of the field, in particular the computational challenges and opportunities, afterwards. A bioinformatician, Timothy Hu, asked what I thought was a spectacular and fundamental question:...


    Continue reading …
  3. Fun With Word2vec

    Fun with Word2vec: Exploring the application of deep learning on biomedical literature I’m currently learning more about immunology so I can apply it to my analyses of the tumor micro-environment. However, there’s quite a lot of background literature to catch up on! I must have spent a whole day just trying to figure out all of these CD markers. What expresses CD31415926 again? Has this cell-type been characterized before? If so, what are some other...


    Continue reading …
  4. Simpsons Paradox

    Illustrating the impact of Simpson’s Paradox and the need to Single Cell measurements Simpson’s paradox is a phenomenon in probability and statistics in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. It is often cited as a reason for single cell measurements. Here, we illustrate Simpson’s paradox with an explicit hypothetical example relevant to single cell RNA-seq. Consider a tumor sample pre- and post-...


    Continue reading …
  5. Barcode Doublets

    Barcode doublets in single cell RNA-seq data When we perform single cell RNA-seq, we expect that the derived gene expression quantifications come from a single cell. Publications have shown that this is not always the case, as two cells may be physically captured together, resulting in a physical ‘doublet’ or ‘multiplet’ capture that is estimated to occur at a rate of 1.6%. Thus the derived gene expression quantifications would be a reflection of two cells,...


    Continue reading …
  6. Animated Svg Butterfly

    Something for fun: An animated butterfly made using pure SVG, CSS, and Javascript. (My first codepen post!) See the Pen Animated SVG Butterfly by Jean Fan (@JEFworks) on CodePen.


    Continue reading …
  7. Nih F Series Grant Tips And Example

    NIH F-series Grant Tips and Example I successfully applied to the NIH F31 grant in 2015. I recently terminated it to accept a new NIH F99/K00 grant. Here are some of the things I’ve learned and some advice for writing an outstanding NIH grant as a PhD Student. I’ve included my NIH F31 grant and full summary statements as an example. Enjoy and good luck! Introduction The NIH F-series of grants is appropriate for trainees,...


    Continue reading …
  8. Nsf Grfp Application Tips And Example

    Note: This is a post from my old website that I’ve since shut down due to moving my website to Github. The original post is from 2013, yet it still seems reasonably relevant today. In this version, I have added a few updates (annotated as UPDATE) based on what I know now as I am nearing the end of my graduate career. Hopefully it will still be a useful resource to young aspiring scientists looking...


    Continue reading …
  9. Connected Barplot For Series Data Visualization

    Connected Barplot for Series Data Visualization in R Ideal for visualizing proportion changes over time. Gist Results


    Continue reading …
  10. Graph Based Community Detection For Clustering Analysis

    Graph-based community detection for clustering analysis in R Introduction In single cell analyses, we are often trying to identify groups of transcriptionally similar cells, which we may interpret as distinct cell types or cell states. We may also be interested in identifying groups of transcriptionally coordinated genes, which we may interpret as functional regulatory modules or pathways. In either case, we are looking at some high dimensional data and trying to identify clusters. We can...


    Continue reading …