1. Nsf Grfp Application Tips And Example

    Note: This is a post from my old website that I’ve since shut down due to moving my website to Github. The original post is from 2013, yet it still seems reasonably relevant today. In this version, I have added a few updates (annotated as UPDATE) based on what I know now as I am nearing the end of my graduate career. Hopefully it will still be a useful resource to young aspiring scientists looking...


    Continue reading …
  2. Connected Barplot For Series Data Visualization

    Connected Barplot for Series Data Visualization in R Ideal for visualizing proportion changes over time. Gist Results


    Continue reading …
  3. Graph Based Community Detection For Clustering Analysis

    Graph-based community detection for clustering analysis in R Introduction In single cell analyses, we are often trying to identify groups of transcriptionally similar cells, which we may interpret as distinct cell types or cell states. We may also be interested in identifying groups of transcriptionally coordinated genes, which we may interpret as functional regulatory modules or pathways. In either case, we are looking at some high dimensional data and trying to identify clusters. We can...


    Continue reading …
  4. Machine Learning Feature Selection For Diffexp Analysis

    Feature Selection for Differential Expression Analysis and Marker Selection Introduction In transcriptomics analysis, we are often interested in identifying differentially expressed genes. For example, if we have two conditions or cell types, we may be interested in what genes are significantly upregulated in condition A vs. B or cell type X vs. Y and vice versa. Differential expression analysis is often performed as Wilcox tests, T-tests, or other similar tests for differences in distribution. However,...


    Continue reading …
  5. 5 Useful Bash Aliases And Functions

    5 Useful Bash Aliases and Functions For Lazy Bioinformaticians Continuing on our theme of making bioinformatics more sexy with buzzfeed-esque blog post titles, here are 5 useful bash aliases and functions so you can remember fewer non-intuitive options, type fewer keys for the same output, and overall be more productive and efficient in your bioinformatics analysis :D ie. have more time to look at dank memes. I’ll try to keep to aliases and functions that...


    Continue reading …
  6. A Practical Introduction To Finite Mixture Models

    A practical introduction to finite mixture modeling with flexmix in R Introduction Finite mixture models are very useful when applied to data where observations originate from various groups and the group affiliations are not known. For example, in single cell RNA-seq data, transcripts in each cell can be modeled as a mixture of two probabilistic processes: 1) a negative binomial process for when a transcript is amplified and detected at a level correlating with its...


    Continue reading …
  7. 5 Must Dos For Efficient Bioinformatics

    5 must-dos for efficient bioinformatics My colleague Kamil was joking about how we need to make bioinformatics sexier and more click-bait-y with those ridiculous buzz-feed-esque headlines like ‘N ways to X your Y’ and ‘M best Ws that will K your J’. So here are my 5 must-dos for efficient bioinformatics that I’ve tried to get all my students to adopt. Get a text editor that can send commands to the terminal The biggest efficiency...


    Continue reading …
  8. Custom Sequencing Library Bioinformatics

    Bioinformatics for custom sequencing library constructions Sequencing has become so streamlined that we often just use standard library prep kits, made for particular sequencers, followed by proprietary bioinformatics software for demultiplexing and quantification. But, if we want to design custom library structure, perhaps for use in multiplexed droplet-based approaches, we will need to come up with our own bioinformatics pipelines. In this tutorial, I will take you through a recent experience I had analyzing reads...


    Continue reading …
  9. Multiclass Diffential Expression Analysis

    Multi-class / Multi-group Differential Expression Analysis Introduction In transcriptomics analysis, we are often interested in identifying differentially expressed genes. For example, if we have two conditions or cell types, we may be interested in what genes are significantly upregulated in condition A vs. B or cell type X vs. Y and vice versa. However, what happens when you have multiple conditions or many cell types? In this tutorial, I will use simulated data to demonstrate...


    Continue reading …
  10. How To Be A Research Parasite

    How to be a “research parasite”: a guide to analyzing public sequencing data from GEO In this tutorial, I will take you through my workflow for obtaining public sequencing data available on NCBI GEO. Let’s say for example, I am interested in analyzing the single cell RNA-seq data found in this paper: Single-Cell Analysis Reveals a Close Relationship between Differentiating Dopamine and Subthalamic Nucleus Neuronal Lineages In the paper, the authors note that “The accession...


    Continue reading …