My long-term research interests involve the development of a comprehensive understanding of key genetic, epigenetic, and other regulatory mechanisms driving cellular identity and heterogeneity, particularly in the context of cancer and how this heterogeneity shapes tumor progression, therapeutic resistance, and ultimately clinical impact. In order to understand this heterogeneity, novel statistical methods and user-friendly computational software must be developed to enable biologists and researchers to harness the power of big data and other products of future technological advancements.


Statistical methods and software for analyses of single cell data

While heterogeneity within cellular systems has long been widely recognized, only recently have technological advances enabled measurements to be made on a single cell level. Applying traditional bulk analysis methods on single cells has met with varied degrees of success due to the high levels of technical as well as biological stochasticity and noise inherent in single cell measurements. Therefore, novel statistical methods are needed to identify and characterize heterogeneity in single cells. In the Kharchenko lab, I have focused on developing methods for analyzing single cell data, including differential expression analysis methods that takes into account sources of technical noise inherent to single cell RNA-seq data, clustering methods to identify pathways and gene sets that exhibit coordinated variability, and methods for spatial placement of cell subpopulations based on expression signatures. This work has led to the development of various statistical methods available as software for the scientific community.

A more complete understanding of chronic lymphocytic leukemia

Advancements in high-throughput sequencing technologies have uncovered tremendous genetic, epigenetic, and transcriptional heterogeneity in chronic lymphocytic leukemia (CLL) but its impact on clinical course is not well understood. I have established a close collaboration with the Wu lab at the Dana-Farber Cancer Institute, where I have focused on developing and applying bioinformatics methods for (1) assessing variability of single cell gene expression, (2) calling mutations from single cell qt-qPCR data, (3) differential expression and gene set enrichment tests for both bulk and single cell, RNA-sequencing and targeted qt-qPCR data. Our collaboration has led to many scientific findings that contribute to a more complete understanding of CLL.

Improving the representation of women in STEM

Women remain vastly underrepresented in STEM fields, in particular the computational sciences. Improving the representation of women in STEM is pertinent to workplace diversity, gender equality, and American innovation. This issue can be broadly addressed through increasing the inflow and decreasing the outflow of women into STEM fields. Outreach efforts by those in STEM are needed to help foster a positive STEM identity in girls. Antiquated work-place expectations and policies must also be updated to meet the needs of modern-day working individuals.