Publications (ORCiD | Google Scholar)

We are committed to open access. If you find a manuscript that we have co-authored but is not available here, please contact us directly and we will be happy to send you an electronic copy.

* Denotes equal contribution
^ Denotes corresponding author
JEFworks lab members are underlined

Table of Contents

scatterbar - an R package for visualizing proportional data across spatially resolved coordinates

Dee Velazquez, Jean Fan^

Abstract: Displaying proportional data across many spatially resolved coordinates is a challenging but important data visualization task, particularly for spatially resolved transcriptomics data. Scatter pie plots are one type of commonly used data visualization for such data but present perceptual challenges that may lead to difficulties in interpretation. Increasing the visual saliency of such data visualizations can help viewers more accurately identify proportional trends and compare proportional differences across spatial locations. We developed scatterbar, an open-source R package that extends ggplot2, to visualize proportional data across many spatially resolved coordinates using scatter stacked bar plots. We apply scatterbar to visualize deconvolved cell-type proportions from a spatial transcriptomics dataset of the mouse olfactory bulb to demonstrate how scatter stacked bar plots can enhance the distinguishability of proportional distributions compared to scatter pie plots.

Paper: bioRxiv

Relevant code:

SEraster - a rasterization preprocessing framework for scalable spatial omics data analysis

Gohta Aihara, Kalen Clifton, Mayling Chen, Zhuoyan Li, Lyla Atta, Brendan F. Miller, Rahul Satija, John W Hickey, Jean Fan^

Abstract: Spatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells. To enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce resource requirements while maintaining high performance. We further integrate SEraster with existing analysis tools to characterize cell-type spatial cooccurrence. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as cooccurring cell-types that recapitulate expected organ structures.

Paper: Bioinformatics. June 20, 2024. doi.org/10.1093/bioinformatics/btae412 | Pubmed

Relevant code:

Gene count normalization in single-cell imaging-based spatially resolved transcriptomics

Lyla Atta, Kalen Clifton, Manjari Anant, Gohta Aihara, and Jean Fan^

Abstract: Recent advances in imaging-based spatially resolved transcriptomics technologies now enable high-throughput profiling of targeted genes and their locations in fixed tissues. Normalization of gene expression data is often needed to account for technical factors that may confound underlying biological signals. Here, we investigate the potential impact of different gene count normalization methods with different targeted gene panels in the analysis and interpretation of im-SRT data. Using different simulated gene panels that overrepresent genes expressed in specific tissue anatomical regions or cell types, we find that normalization methods that use scaling factors derived from gene counts differentially impact normalized gene expression magnitudes in a region- or cell type-specific manner. We show that these normalization-induced effects may reduce the reliability of downstream differential gene expression and fold change analysis, introducing false positive and false negative results when compared to results obtained from gene panels that are more representative of the gene expression of the tissue's component cell types. These effects are not observed without normalization or when scaling factors are not derived from gene counts, such as with cell volume normalization. Overall, we caution that the choice of normalization method and gene panel may impact the biological interpretation of the im-SRT data.

Paper: Genome Biology. June 12, 2024. doi.org/10.1186/s13059-024-03303-w | Pubmed

Relevant code:

Single-cell morphology encodes functional subtypes of senescence in aging human dermal fibroblasts

Pratik Kamat, Nico Macaluso, Chanhong Min, Yukang Li, Anshika Agrawal, Aaron Winston, Lauren Pan, Bartholomew Starich, Teasia Stewart, Pei-Hsun Wu, Jean Fan, Jeremy Walston, Jude M Phillip

Abstract: Cellular senescence is an established driver of aging, exhibiting context-dependent phenotypes across multiple biological length-scales. Despite its mechanistic importance, profiling senescence within cell populations is challenging. This is in part due to the limitations of current biomarkers to robustly identify senescent cells across biological settings, and the heterogeneous, non-binary phenotypes exhibited by senescent cells. Using a panel of primary dermal fibroblasts, we combined live single-cell imaging, machine learning, multiple senescence induction conditions, and multiple protein-based senescence biomarkers to show the emergence of functional subtypes of senescence. Leveraging single-cell morphologies, we defined eleven distinct morphology clusters, with the abundance of cells in each cluster being dependent on the mode of senescence induction, the time post-induction, and the age of the donor. Of these eleven clusters, we identified three bona-fide senescence subtypes (C7, C10, C11), with C10 showing the strongest age-dependence across a cohort of fifty aging individuals. To determine the functional significance of these senescence subtypes, we profiled their responses to senotherapies, specifically focusing on Dasatinib + Quercetin (D+Q). Results indicated subtype-dependent responses, with senescent cells in C7 being most responsive to D+Q. Altogether, we provide a robust single-cell framework to identify and classify functional senescence subtypes with applications for next-generation senotherapy screens, and the potential to explain heterogeneous senescence phenotypes across biological settings based on the presence and abundance of distinct senescence subtypes.

Paper: bioRxiv


STalign - alignment of spatial transcriptomics data using diffeomorphic metric mapping

Kalen Clifton*, Manjari Anant*, Gohta Aihara, Lyla Atta, Osagie K Aimiuwu, Justus M Kebschull, Michael I Miller, Daniel Tward^, Jean Fan^

Abstract: Spatial transcriptomics (ST) technologies enable high throughput gene expression characterization within thin tissue sections. However, comparing spatial observations across sections, samples, and technologies remains challenging. To address this challenge, we developed STalign to align ST datasets in a manner that accounts for partially matched tissue sections and other local non-linear distortions using diffeomorphic metric mapping. We apply STalign to align ST datasets within and across technologies as well as to align ST datasets to a 3D common coordinate framework. We show that STalign achieves high gene expression and cell-type correspondence across matched spatial locations that is significantly improved over manual and landmark-based affine alignments. Applying STalign to align ST datasets of the mouse brain to the 3D common coordinate framework from the Allen Brain Atlas, we highlight how STalign can enable the interrogation of compositional heterogeneity across anatomical structures.

Paper: Nature Communications. December 8, 2023. doi.org/10.1038/s41467-023-43915-7 | Pubmed

Relevant code:

Soluble PD-L1 reprograms blood monocytes to prevent cerebral edema and facilitate recovery after ischemic stroke

Jennifer E Kim, Ryan P Lee, Eli Yazigi, Lyla Atta, James Feghali, Ayush Pant, Aanchal Jain, Idan Levitan, Eileen Kim, Kisha Patel, Nivedha Kannapadi, Pavan Shah, Adnan Bibic, Zhipeng Hou, Justin M Caplan, L Fernando Gonzalez, Judy Huang, Risheng Xu, Jean Fan, Betty Tyler, Henry Brem, Vassiliki A Boussiotis, Lauren Jantzie, Shenandoah Robinson, Raymond C Koehler, Michael Lim, Rafael J Tamargo, Christopher M Jackson

Abstract: Acute cerebral ischemia triggers a profound inflammatory response. While macrophages polarized to an M2-like phenotype clear debris and facilitate tissue repair, aberrant or prolonged macrophage activation is counterproductive to recovery. The inhibitory immune checkpoint Programmed Cell Death Protein 1 (PD-1) is upregulated on macrophage precursors (monocytes) in the blood after acute cerebrovascular injury. To investigate the therapeutic potential of PD-1 activation, we immunophenotyped circulating monocytes from patients and found that PD-1 expression was upregulated in the acute period after stroke. Murine studies using a temporary middle cerebral artery (MCA) occlusion (MCAO) model showed that intraperitoneal administration of soluble Programmed Death Ligand-1 (sPD-L1) significantly decreased brain edema and improved overall survival. Mice receiving sPD-L1 also had higher performance scores short-term, and more closely resembled sham animals on assessments of long-term functional recovery. These clinical and radiographic benefits were abrogated in global and myeloid-specific PD-1 knockout animals, confirming PD-1+ monocytes as the therapeutic target of sPD-L1. Single-cell RNA sequencing revealed that treatment skewed monocyte maturation to a non-classical Ly6Clo, CD43hi, PD-L1+ phenotype. These data support peripheral activation of PD-1 on inflammatory monocytes as a therapeutic strategy to treat neuroinflammation after acute ischemic stroke.

Paper: Brain, Behavior, and Immunity. December 7, 2023. doi.org/10.1016/j.bbi.2023.12.007 | Pubmed


Single cell and spatial transcriptomics analysis of kidney double negative T lymphocytes in normal and ischemic mouse kidneys

Sepideh Gharaie, Kyungho Lee, Kathleen Noller, Emily K. Lo, Brendan Miller, Hyun Jun Jung, Andrea M. Newman-Rivera, Johanna T. Kurzhagen, Nirmish Singla, Paul A. Welling, Jean Fan, Patrick Cahan, Sanjeev Noel, Hamid Rabb

Abstract: T cells are important in the pathogenesis of acute kidney injury (AKI), and TCR+CD4-CD8- (double negative-DN) are T cells that have regulatory properties. However, there is limited information on DN T cells compared to traditional CD4+ and CD8+ cells. To elucidate the molecular signature and spatial dynamics of DN T cells during AKI, we performed single-cell RNA sequencing (scRNA-seq) on sorted murine DN, CD4+, and CD8+ cells combined with spatial transcriptomic profiling of normal and post AKI mouse kidneys. scRNA-seq revealed distinct transcriptional profiles for DN, CD4+, and CD8+ T cells of mouse kidneys with enrichment of Kcnq5, Klrb1c, Fcer1g, and Klre1 expression in DN T cells compared to CD4+ and CD8+ T cells in normal kidney tissue. We validated the expression of these four genes in mouse kidney DN, CD4+ and CD8+ T cells using RT-PCR and Kcnq5, Klrb1, and Fcer1g genes with the NIH human kidney precision medicine project (KPMP). Spatial transcriptomics in normal and ischemic mouse kidney tissue showed a localized cluster of T cells in the outer medulla expressing DN T cell genes including Fcer1g. These results provide a template for future studies in DN T as well as CD4+ and CD8+ cells in normal and diseased kidneys.

Paper: Scientific Reports. Nov 23, 2023. doi.org/10.1038/s41598-023-48213-2 | Pubmed


Characterizing cell-type spatial relationships across length scales in spatially resolved omics data

Rafael dos Santos Peixoto, Brendan F. Miller, Maigan A. Brusko, Lyla Atta, Manjari Anant, Mark A. Atikinson, Todd M. Brusko, Clive H. Wasserdall, Jean Fan^

Abstract: Spatially resolved omics technologies provide molecular profiling of cells while preserving their organization within tissues, allowing for the evaluation of cell-type spatial relationships. We developed CRAWDAD to quantify cell-type spatial relationships across length scales. We highlight the utility of such multi-scale characterization on simulated data, recapitulate expected cell-type spatial relationships in tissues such as the mouse brain and embryo, and delineate functionally relevant spatial-defined cell-type subsets in the human spleen.

Paper: bioRxiv

Relevant code:

Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP)

HuBMAP Consortium (including Brendan Miller, Lyla Atta, Rafael dos Santos Peixoto, and Jean Fan)

Abstract: The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase - the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.

Paper: Nature Cell Biology. July 19, 2023. doi.org/10.1038/s41556-023-01194-w | Pubmed


Why it's worth making computational methods easy to use

Jean Fan

Abstract: Jean Fan and her team launched a digital campaign using YouTube, GitHub and blogs to make a computational-biology tool accessible to all.

Paper: Nature. April 27, 2023. doi.org/10.1038/d41586-023-01440-z | Pubmed


Cross-modality mapping using image varifolds to align tissue-scale atlases to molecular-scale measures with application to 2D brain sections

Kaitlin M. Stouffer, Alain Trouvé, Laurent Younes, Michael Kunst, Lydia Ng, Hongkui Zeng, Manjari Anant, Jean Fan, Yongsoo Kim, Xiaoyin Chen, Mara Rue, Michael I. Miller

Abstract: This paper explicates a solution to building correspondences between molecular-scale transcriptomics and tissue-scale atlases. This problem arises in atlas construction and cross-specimen/technology alignment where specimens per emerging technology remain sparse and conventional image representations cannot efficiently model the high dimensions from subcellular detection of thousands of genes. We address these challenges by representing spatial transcriptomics data as generalized functions encoding position and high-dimensional feature (gene, cell type) identity. We map onto low-dimensional atlas ontologies by modeling regions as homogeneous random fields with unknown transcriptomic feature distribution. We solve simultaneously for the minimizing geodesic diffeomorphism of coordinates through LDDMM and for these latent feature densities. We map tissue-scale mouse brain atlases to gene-based and cell-based transcriptomics data from MERFISH and BARseq technologies and to histopathology and cross-species atlases to illustrate integration of diverse molecular and cellular datasets into a single coordinate system as a means of comparison and further atlas construction.

Paper: Nature Communications. April 25, 2024. doi.org/10.1038/s41467-024-47883-4 | Pubmed


Reference-free cell type deconvolution of pixel-resolution spatially resolved transcriptomics data

Brendan F Miller, Feiyang Huang, Lyla Atta, Arpan Sahoo, Jean Fan^

Abstract: Recent technological advancements have enabled spatially resolved transcriptomic profiling but at multi-cellular pixel resolution, thereby hindering the identification of cell-type spatial co-localization patterns. We developed STdeconvolve as an unsupervised approach to deconvolve underlying cell-types comprising such multi-cellular pixel resolution spatially resolved transcriptomics datasets. We show that STdeconvolve effectively recovers the putative transcriptomic profiles of cell-types and their proportional representation within spatially resolved pixels without reliance on external single-cell transcriptomics references.

Paper: Nature Communications. April 29, 2022. doi.org/10.1038/s41467-022-30033-z | Pubmed | PDF

Relevant code:

Single cell analysis reveals immune dysfunction from the earliest stages of CLL that can be reversed by ibrutinib

Noelia Purroy Zuriguel, Yuzhou Evelyn Tong, Camilla K Lemvigh, Nicoletta Cieri, Shuqiang Li, Erin M Parry, Wandi Zhang, Laura Z Rassenti, Thomas J Kipps, Susan L Slager, Neil E Kay, Connie Lesnick, Tait D Shanafelt, Paolo Ghia, Lydia Scarfo, Kenneth J Livak, Peter V Kharchenko, Donna Neuberg, Lars Ronn Olsen, Jean Fan, Satyen H Gohil, Catherine J Wu^

Abstract:

Paper: Blood. January 12, 2022. doi.org/10.1182/blood.2021013926 | Pubmed | PDF


VeloViz - RNA-velocity informed embeddings for visualizing cellular trajectories

Lyla Atta, Arpan Sahoo, Jean Fan^

Abstract: Single cell transcriptomic technologies enable genome-wide gene expression measurements in individual cells but can only provide a static snapshot of cell states. RNA velocity analysis can infer cell state changes from single cell transcriptomics data. To interpret these cell state changes as part of underlying cellular trajectories, current approaches rely on visualization with principal components, t-distributed stochastic neighbor embedding, and other 2D embeddings derived from the observed single cell transcriptional states. However, these 2D embeddings can yield different representations of the underlying cellular trajectories, hindering the interpretation of cell state changes. We developed VeloViz to create RNA-velocity-informed 2D and 3D embeddings from single cell transcriptomics data. Using both real and simulated data, we demonstrate that VeloViz embeddings are able to consistently capture underlying cellular trajectories across diverse trajectory topologies, even when intermediate cell states may be missing. By taking into consideration the predicted future transcriptional states from RNA velocity analysis, VeloViz can help visualize a more reliable representation of underlying cellular trajectories.

Paper: Bioinformatics. September 28, 2021. doi.org/10.1093/bioinformatics/btab653 | Pubmed | PDF

Relevant code:

Computational challenges and opportunities in spatially resolved transcriptomic data analysis

Lyla Atta, and Jean Fan^

Abstract: Spatially resolved transcriptomic data demand new computational analysis methods to derive biological insights. Here, we comment on these associated computational challenges as well as highlight the opportunities for standardized benchmarking metrics and data-sharing infrastructure in spurring innovation moving forward.

Paper: Nature Communications. September 6, 2021. doi.org/10.1038/s41467-021-25557-9 | Pubmed | PDF


Rewiring of human neurodevelopmental gene regulatory programs by human accelerated regions

Kelly M. Girskis*, Andrew B. Stergachis*, Ellen M. DeGennaro*, Ryan N. Doan, Xuyu Qian, Matthew B. Johnson, Peter P. Wang, Gabrielle M. Sejourne, M. Aurel Nagy, Elizabeth A. Pollina, André M. M. Sousa, Taehwan Shin, Connor J. Kenny, Julia L. Scotellaro, Brian M. Debo, Dilenny M. Gonzalez, Lariza M. Rento, Rebecca C. Yeh, Janet H. T. Song, Marc Beaudin, Jean Fan, Peter V. Kharchenko, Nenad Sestan, Michael E. Greenberg, Christopher A. Walsh^

Abstract: Human accelerated regions (HARs) are the fastest-evolving regions of the human genome, and many are hypothesized to function as regulatory elements that drive human-specific gene regulatory programs. We interrogate the in vitro enhancer activity and in vivo epigenetic landscape of more than 3,100 HARs during human neurodevelopment, demonstrating that many HARs appear to act as neurodevelopmental enhancers and that sequence divergence at HARs has largely augmented their neuronal enhancer activity. Furthermore, we demonstrate PPP1R17 to be a putative HAR-regulated gene that has undergone remarkable rewiring of its cell type and developmental expression patterns between non-primates and primates and between non-human primates and humans. Finally, we show that PPP1R17 slows neural progenitor cell cycle progression, paralleling the cell cycle length increase seen predominantly in primate and especially human neurodevelopment. Our findings establish HARs as key components in rewiring human-specific neurodevelopmental gene regulatory programs and provide an integrated resource to study enhancer activity of specific HARs.

Paper: Neuron. September 2, 2021. doi.org/10.1016/j.neuron.2021.08.005 | Pubmed | PDF


Multi Scale Diffeomorphic Metric Mapping of Spatial Transcriptomics Datasets

Michael I. Miller, Jean Fan, Daniel J. Tward

Abstract: Spatially resolved transcriptomic imaging is a family of promising new technologies that can produce a series of images that quantify gene expression at every pixel. These technologies, such as multiplex error-robust fluorescence in situ hybridization (MERFISH) which is the focus of this work, produce data that is inherently multi scale. They describe molecules at nanometer resolution, cell types at micron resolution, and tissue types at millimeter resolution. To harness the potential of these techniques, new mathematical and computational tools are required to quantify similarities and differences between images across experimental conditions. In this work we demonstrate the application of multi scale diffeomorphic metric mapping to MERFISH images. This recently developed framework uses varifold measures on reproducing kernel Hilbert spaces to describe shape and signal across spatial scales, and computes distances between samples in a Riemannian setting. Using experimental data from serial sections of the mouse preoptic hypothalamus, we use this technique to compute optimal nonrigid alignments between neighboring sections. This approach will ultimately be extended to 3D reconstruction and alignment to common coordinates of a brain atlas.

Paper: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops


Interactions between cancer cells and immune cells drive transitions to mesenchymal-like states in glioblastoma

Toshiro Hara*, Rony Chanoch-Myers*, Nathan D. Mathewson, Chad Myskiw, Lyla Atta, Lillian Bussema, Stephen W. Eichhorn, Alissa C. Greenwald, Gabriela S. Kinker, Christopher Rodman, L. Nicolas Gonzalez Castro, Hiroaki Wakimoto, Orit Rozenblatt-Rosen, XiaoweiZhuang, Jean Fan, Tony Hunter, Inder M. Verma, Kai W. Wucherpfennig, Aviv Regev, Mario L. Suvà^, Itay Tirosh^

Abstract: The mesenchymal subtype of glioblastoma is thought to be determined by both cancer cell-intrinsic alterations and extrinsic cellular interactions, but remains poorly understood. Here, we dissect glioblastoma-to-microenvironment interactions by single-cell RNA sequencing analysis of human tumors and model systems, combined with functional experiments. We demonstrate that macrophages induce a transition of glioblastoma cells into mesenchymal-like (MES-like) states. This effect is mediated, both in vitro and in vivo, by macrophage-derived oncostatin M (OSM) that interacts with its receptors (OSMR or LIFR) in complex with GP130 on glioblastoma cells and activates STAT3. We show that MES-like glioblastoma states are also associated with increased expression of a mesenchymal program in macrophages and with increased cytotoxicity of T cells, highlighting extensive alterations of the immune microenvironment with potential therapeutic implications.

Paper: Cancer Cell. June 3, 2021. doi:10.1016/j.ccell.2021.05.002 | Pubmed | PDF


Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomics data with nonuniform cellular densities

Brendan F Miller, Dhananjay Bambah-Mukku, Catherine Dulac, Xiaowei Zhuang, Jean Fan^

Abstract: Recent technological advances have enabled spatially resolved measurements of expression profiles for hundreds to thousands of genes in fixed tissues at single-cell resolution. However, scalable computational analysis methods able to take into consideration the inherent 3D spatial organization of cell types and nonuniform cellular densities within tissues are still lacking. To address this, we developed MERINGUE, a computational framework based on spatial auto-correlation and cross-correlation analysis to identify genes with spatially heterogeneous expression patterns, infer putative cell-cell communication, and perform spatially informed cell clustering in 2D and 3D in a density-agnostic manner using spatially resolved transcriptomics data. We applied MERINGUE to a variety of spatially resolved transcriptomics datasets including multiplexed error-robust fluorescence in situ hybridization (MERFISH), spatial transcriptomics, Slide-Seq, and aligned in situ hybridization (ISH) data. We anticipate that such statistical analysis of spatially resolved transcriptomics data will facilitate our understanding of the interplay between cell state and spatial organization in tissue development and disease.

Paper: Genome Research. Advance May 25, 2021, doi:10.1101/gr.271288.120 | Pubmed | PDF

Relevant code:

Spatial organization of the transcriptome in individual neurons

Guiping Wang, Cheen-Euong Ang, Jean Fan, Andrew Wang, Jeffrey R. Moffitt, Xiaowei Zhuang^

Abstract: Neurons are highly polarized cells with complex neurite morphology. Spatial organization and local translation of RNAs in dendrites and axons play an important role in many neuronal functions. Here we performed super-resolution spatial profiling of RNAs inside individual neurons at the genome scale using multiplexed error-robust fluorescence in situ hybridization (MERFISH), and mapped the spatial organization of up to ~4,200 RNA species (genes) across multiple length scales, ranging from sub-micrometer to millimeters. Our data generated a quantitative intra-neuronal atlas of RNAs with distinct transcriptome compositions in somata, dendrites, and axons, and revealed diverse sub-dendritic distribution patterns of RNAs. Moreover, our spatial analysis identified distinct groups of genes exhibiting specific spatial clustering of transcripts at the sub-micrometer scale that were dependent on protein synthesis and differentially dependent on synaptic activity. Overall, these data provide a rich resource for characterizing the subcellular organization of the transcriptome in neurons with high spatial resolution.

Paper: bioRxiv


Single-cell transcriptomics in cancer - computational challenges and opportunities

Jean Fan^, Kamil Slowikowski, Fan Zhang

Abstract: Intratumor heterogeneity is a common characteristic across diverse cancer types and presents challenges to current standards of treatment. Advancements in high-throughput sequencing and imaging technologies provide opportunities to identify and characterize these aspects of heterogeneity. Notably, transcriptomic profiling at a single-cell resolution enables quantitative measurements of the molecular activity that underlies the phenotypic diversity of cells within a tumor. Such high-dimensional data require computational analysis to extract relevant biological insights about the cell types and states that drive cancer development, pathogenesis, and clinical outcomes. In this review, we highlight emerging themes in the computational analysis of single-cell transcriptomics data and their applications to cancer research. We focus on downstream analytical challenges relevant to cancer research, including how to computationally perform unified analysis across many patients and disease states, distinguish neoplastic from nonneoplastic cells, infer communication with the tumor microenvironment, and delineate tumoral and microenvironmental evolution with trajectory and RNA velocity analysis. We include discussions of challenges and opportunities for future computational methodological advancements necessary to realize the translational potential of single-cell transcriptomic profiling in cancer.

Paper: Nature Experimental and Molecular Medicine. Sept 15, 2020, doi.org:10.1038/s12276-020-0422-0 | Pubmed | PDF


Fast, sensitive and accurate integration of single-cell data with Harmony

Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-ru Loh, and Soumya Raychaudhuri^

Abstract: The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.

Paper: Nature Methods. November 18, 2019. doi:10.1038/s41592-019-0619-0 | Pubmed | PDF

Relevant code:

Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression

Chenglong Xia*, Jean Fan*, George Emanuel*, Junjie Hao, and Xiaowei Zhuang^

Abstract: The spatial organization of RNAs within cells and spatial patterning of cells within tissues play crucial roles in many biological processes. Here, we demonstrate that multiplexed error-robust FISH (MERFISH) can achieve near-genome-wide, spatially resolved RNA profiling of individual cells with high accuracy and high detection efficiency. Using this approach, we identified RNA species enriched in different subcellular compartments, observed transcriptionally distinct cell states corresponding to different cell-cycle phases, and revealed spatial patterning of transcriptionally distinct cells. Spatially resolved transcriptome quantification within cells further enabled RNA velocity and pseudotime analysis, which revealed numerous genes with cell cycle-dependent expression. We anticipate that spatially resolved transcriptome analysis will advance our understanding of the interplay between gene regulation and spatial context in biological systems.

Paper: PNAS. Sept 9, 2019, doi:10.1073/pnas.1912459116 | Pubmed | PDF


A Murine Model of Chronic Lymphocytic Leukemia Based on B Cell-Restricted Expression of Sf3b1 Mutation and Atm Deletion

Shanye Yin, Rutendo G. Gambe, Jing Sun, Aina Zurita Martinez, Zachary J.Cartun, Fara Faye D.Regis, Youzhong Wan, Jean Fan, Angela N.Brooks, Sarah E. M. Herman, Elisaten Hacken, AmaroTaylor-Weiner, Laura Z. Rassenti, Emanuela M. Ghia, Thomas J. Kipps, Esther A. Obeng, Carrie L. Cibulskis, Donna Neuberg, Dean R.Campagna, Mark D. Fleming, Benjamin L. Ebert, Adrian Wiestner, Ignaty Leshchiner, James A. DeCaprio, Gad Getz, Robin Reed, Ruben D. Carrasco, Catherine J. Wu^, Lili Wang^

Abstract: SF3B1 is recurrently mutated in chronic lymphocytic leukemia (CLL), but its role in the pathogenesis of CLL remains elusive. Here, we show that conditional expression of Sf3b1-K700E mutation in mouse B cells disrupts pre-mRNA splicing, alters cell development, and induces a state of cellular senescence. Combination with Atm deletion leads to the overcoming of cellular senescence and the development of CLL-like disease in elderly mice. These CLL-like cells show genome instability and dysregulation of multiple CLL-associated cellular processes, including deregulated B cell receptor signaling, which we also identified in human CLL cases. Notably, human CLLs harboring SF3B1 mutations exhibit altered response to BTK inhibition. Our murine model of CLL thus provides insights into human CLL disease mechanisms and treatment.

Paper: Cancer Cell. Feb 11, 2019. doi:10.1016/j.ccell.2018.12.013 | Pubmed | PDF


RNA velocity of single cells

Gioele La Manno, Ruslan Soldatov, Amit Zeisel, Emelie Braun, Hannah Hochgerner, Viktor Petukhov, Katja Lidschreiber, Maria E Kastriti, Peter Lönnerberg, Alessandro Furlan, Jean Fan, Lars E Borm, Zehua Liu, David van Bruggen, Jimin Guo, Xiaoling He, Roger Barker, Erik Sundström, Gonçalo Castelo-Branco, Patrick Cramer, Igor Adameyko, Sten Linnarsson^, Peter V Kharchenko^

Abstract: RNA abundance is a powerful indicator of the state of individual cells. Single-cell RNA sequencing can reveal RNA abundance with high quantitative accuracy, sensitivity and throughput1. However, this approach captures only a static snapshot at a point in time, posing a challenge for the analysis of time-resolved phenomena such as embryogenesis or tissue regeneration. Here we show that RNA velocity—the time derivative of the gene expression state—can be directly estimated by distinguishing between unspliced and spliced mRNAs in common single-cell RNA sequencing protocols. RNA velocity is a high-dimensional vector that predicts the future state of individual cells on a timescale of hours. We validate its accuracy in the neural crest lineage, demonstrate its use on multiple published datasets and technical platforms, reveal the branching lineage tree of the developing mouse hippocampus, and examine the kinetics of transcription in human embryonic brain. We expect RNA velocity to greatly aid the analysis of developmental lineages and cellular dynamics, particularly in humans.

Paper: Nature. August 8, 2018. doi:10.1038/s41586-018-0414-6 | Pubmed | PDF

Relevant code:

Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data

Jean Fan*, Hae-Ock Lee*, Soohyun Lee, Da-eun Ryu, Semin Lee, Catherine Xue, Seok Jin Kim, Kihyun Kim, Nikolas Barkas, Peter J Park, Woong-Yang Park^ and Peter V Kharchenko^

Abstract: Characterization of intratumoral heterogeneity is critical to cancer therapy, as presence of phenotypically diverse cell populations commonly fuels relapse and resistance to treatment. Although genetic variation is a well-studied source of intratumoral heterogeneity, the functional impact of most genetic alterations remains unclear. Even less understood is the relative importance of other factors influencing heterogeneity, such as epigenetic state or tumor microenvironment. To investigate the relationship between genetic and transcriptional heterogeneity in a context of cancer progression, we devised a computational approach called HoneyBADGER to identify copy number variation and loss-of-heterozygosity in individual cells from single-cell RNA-sequencing data. By integrating allele and normalized expression information, HoneyBADGER is able to identify and infer the presence of subclone-specific alterations in individual cells and reconstruct underlying subclonal architecture. Examining several tumor types, we show that HoneyBADGER is effective at identifying deletion, amplifications, and copy-neutral loss-of-heterozygosity events, and is capable of robustly identifying subclonal focal alterations as small as 10 megabases. We further apply HoneyBADGER to analyze single cells from a progressive multiple myeloma patient to identify major genetic subclones that exhibit distinct transcriptional signatures relevant to cancer progression. Surprisingly, other prominent transcriptional subpopulations within these tumors did not line up with the genetic subclonal structure, and were likely driven by alternative, non-clonal mechanisms. These results highlight the need for integrative analysis to understand the molecular and phenotypic heterogeneity in cancer.

Paper: Genome Research. June 13, 2018. doi:10.1101/gr.228080.117 | Pubmed | PDF

Relevant code:

Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain

Blue B Lake*, Song Chen*, Brandon C Sos*, Jean Fan*, Gwendolyn E Kaeser, Yun C Yung, Thu E Duong, Derek Gao, Jerold Chun^, Peter V Kharchenko^, Kun Zhang^

Abstract: Detailed characterization of the cell types in the human brain requires scalable experimental approaches to examine multiple aspects of the molecular state of individual cells, as well as computational integration of the data to produce unified cell-state annotations. Here we report improved high-throughput methods for single-nucleus droplet-based sequencing (snDrop-seq) and single-cell transposome hypersensitive site sequencing (scTHS-seq). We used each method to acquire nuclear transcriptomic and DNA accessibility maps for >60,000 single cells from human adult visual cortex, frontal cortex, and cerebellum. Integration of these data revealed regulatory elements and transcription factors that underlie cell-type distinctions, providing a basis for the study of complex processes in the brain, such as genetic programs that coordinate adult remyelination. We also mapped disease-associated risk variants to specific cellular populations, which provided insights into normal and pathogenic cellular processes in the human brain. This integrative multi-omics approach permits more detailed single-cell interrogation of complex organs and tissues.

Paper: Nature Biotechnology. December 11, 2017. doi:10.1038/nbt.4038 | Pubmed | PDF

Relevant code:

Integrated single-cell genetic and transcriptional analysis suggests novel drivers of chronic lymphocytic leukemia

Lili Wang*, Jean Fan*, Joshua M. Francis, George Georghiou, Sarah Hergert, Shuqiang Li, Rutendo Gambe, Chensheng W. Zhou, Chunxiao Yang, Sheng Xiao, Paola Dal Cin, Michaela Bowden, Dylan Kotliar, Sachet A. Shukla, Jennifer R. Brown, Donna Neuberg, Dario R. Alessi, Cheng-Zhong Zhang^, Peter V. Kharchenko, Kenneth J. Livak, Catherine J. Wu^

Abstract: Intra-tumoral genetic heterogeneity has been characterized across cancers by genome sequencing of bulk tumors, including chronic lymphocytic leukemia (CLL). In order to more accurately identify subclones, define phylogenetic relationships, and probe genotype–phenotype relationships, we developed methods for targeted mutation detection in DNA and RNA isolated from thousands of single cells from five CLL samples. By clearly resolving phylogenic relationships, we uncovered mutated LCP1 and WNK1 as novel CLL drivers, supported by functional evidence demonstrating their impact on CLL pathways. Integrative analysis of somatic mutations with transcriptional states prompts the idea that convergent evolution generates phenotypically similar cells in distinct genetic branches, thus creating a cohesive expression profile in each CLL sample despite the presence of genetic heterogeneity. Our study highlights the potential for single-cell RNA-based targeted analysis to sensitively determine transcriptional and mutational profiles of individual cancer cells, leading to increased understanding of driving events in malignancy.

Paper: Genome Research. May 22, 2017. doi/10.1101/gr.217331.116 | Pubmed | PDF


Transcriptomic Characterization of SF3B1 Mutation Reveals Its Pleiotropic Effects in Chronic Lymphocytic Leukemia

Lili Wang*, Angela N. Brooks*, Jean Fan*, Youzhong Wan*, Rutendo Gambe, Shuqiang Li, Sarah Hergert, Shanye Yin, Samuel S. Freeman, Joshua Z. Levin, Lin Fan, Michael Seiler, Silvia Buonamici, Peter G. Smith, Kevin F. Chau, Carrie L. Cibulskis, Wandi Zhang, Laura Z. Rassenti, Emanuela M. Ghia, Thomas J. Kipps, Stacey Fernandes, Donald B. Bloch, Dylan Kotliar, Dan A. Landau, Sachet A. Shukla, Jon C. Aster, Robin Reed, David S. DeLuca, Jennifer R. Brown, Donna Neuberg, Gad Getz, Kenneth J. Livak, Matthew M. Meyerson, Peter V. Kharchenko, Catherine J. Wu^

Abstract: Mutations in SF3B1, which encodes a spliceosome component, are associated with poor outcome in chronic lymphocytic leukemia (CLL), but how these contribute to CLL progression remains poorly understood. We undertook a transcriptomic characterization of primary human CLL cells to identify transcripts and pathways affected by SF3B1 mutation. Splicing alterations, identified in the analysis of bulk cells, were confirmed in single SF3B1-mutated CLL cells and also found in cell lines ectopically expressing mutant SF3B1. SF3B1 mutation was found to dysregulate multiple cellular functions including DNA damage response, telomere maintenance, and Notch signaling (mediated through KLF8 upregulation, increased TERC and TERT expression, or altered splicing of DVL2 transcript, respectively). SF3B1 mutation leads to diverse changes in CLL-related pathways.

Paper: Cancer Cell. November 3, 2016. doi.org/10.1016/j.ccell.2016.10.005 | Pubmed | PDF


Cell-Type-Specific Alternative Splicing Governs Cell Fate in the Developing Cerebral Cortex

Xiaochang Zhang^, Ming Hui Chen, Xuebing Wu, Andrew Kodani, Jean Fan, Ryan Doan, Manabu Ozawa, Jacqueline Ma, Nobuaki Yoshida, Jeremy F. Reiter, Douglas L. Black, Peter V. Kharchenko, Phillip A. Sharp, Christopher A. Walsh^

Abstract: Alternative splicing is prevalent in the mammalian brain. To interrogate the functional role of alternative splicing in neural development, we analyzed purified neural progenitor cells (NPCs) and neurons from developing cerebral cortices, revealing hundreds of differentially spliced exons that preferentially alter key protein domains—especially in cytoskeletal proteins—and can harbor disease-causing mutations. We show that Ptbp1 and Rbfox proteins antagonistically govern the NPC-to-neuron transition by regulating neuron-specific exons. Whereas Ptbp1 maintains apical progenitors partly through suppressing a poison exon of Flna in NPCs, Rbfox proteins promote neuronal differentiation by switching Ninein from a centrosomal splice form in NPCs to a non-centrosomal isoform in neurons. We further uncover an intronic human mutation within a PTBP1-binding site that disrupts normal skipping of the FLNA poison exon in NPCs and causes a brain-specific malformation. Our study indicates that dynamic control of alternative splicing governs cell fate in cerebral cortical development.

Paper: Cell. August 25, 2016. doi.org/10.1016/j.cell.2016.07.025 | Pubmed | PDF

Relevant code:

Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition

Jan A. Burger*^, Dan A. Landau*, Amaro Taylor-Weiner*, Ivana Bozic*, Huidan Zhang*, Kristopher Sarosiek, Lili Wang, Chip Stewart, Jean Fan, Julia Hoellenriegel, Mariela Sivina, Adrian M. Dubuc, Cameron Fraser, Yulong Han, Shuqiang Li, Kenneth J. Livak, Lihua Zou, Youzhong Wan, Sergej Konoplev, Carrie Sougnez, Jennifer R. Brown, Lynne V. Abruzzo, Scott L. Carter, Michael J. Keating, Matthew S. Davids, William G. Wierda, Kristian Cibulskis, Thorsten Zenz, Lillian Werner, Paola Dal Cin, Peter Kharchencko, Donna Neuberg, Hagop Kantarjian, Eric Lander, Stacey Gabriel, Susan O’Brien, Anthony Letai, David A. Weitz, Martin A. Nowak, Gad Getz, Catherine J. Wu^

Abstract: Resistance to the Bruton’s tyrosine kinase (BTK) inhibitor ibrutinib has been attributed solely to mutations in BTK and related pathway molecules. Using whole-exome and deep-targeted sequencing, we dissect evolution of ibrutinib resistance in serial samples from five chronic lymphocytic leukaemia patients. In two patients, we detect BTK-C481S mutation or multiple PLCG2 mutations. The other three patients exhibit an expansion of clones harbouring del(8p) with additional driver mutations (EP300, MLL2 and EIF2A), with one patient developing trans-differentiation into CD19-negative histiocytic sarcoma. Using droplet-microfluidic technology and growth kinetic analyses, we demonstrate the presence of ibrutinib-resistant subclones and estimate subclone size before treatment initiation. Haploinsufficiency of TRAIL-R, a consequence of del(8p), results in TRAIL insensitivity, which may contribute to ibrutinib resistance. These findings demonstrate that the ibrutinib therapy favours selection and expansion of rare subclones already present before ibrutinib treatment, and provide insight into the heterogeneity of genetic changes associated with ibrutinib resistance.

Paper: Nature Communications. May 20, 2016. doi:10.1038/ncomms11589 | Pubmed | PDF

Relevant code:

Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis

Jean Fan, Neeraj Salathia, Rui Liu, Gwendolyn E Kaeser, Yun C Yung, Joseph L Herman, Fiona Kaper, Jian-Bing Fan, Kun Zhang, Jerold Chun, Peter V Kharchenko^

Abstract: The transcriptional state of a cell reflects a variety of biological factors, from cell-type-specific features to transient processes such as the cell cycle, all of which may be of interest. However, identifying such aspects from noisy single-cell RNA-seq data remains challenging. We developed pathway and gene set overdispersion analysis (PAGODA) to resolve multiple, potentially overlapping aspects of transcriptional heterogeneity by testing gene sets for coordinated variability among measured cells.

Paper: Nature Methods. Jan 18, 2016. doi:10.1038/nmeth.3734 | Pubmed | PDF

Relevant code:

Locally Disordered Methylation Forms the Basis of Intratumor Methylome Variation in Chronic Lymphocytic Leukemia

Dan A. Landau*, Kendell Clement*, Michael J. Ziller, Patrick Boyle, Jean Fan, Hongcang Gu, Kristen Stevenson, Carrie Sougnez, Lili Wang, Shuqiang Li, Dylan Kotliar, Wandi Zhang, Mahmoud Ghandi, Levi Garraway, Stacey M. Fernandes, Kenneth J. Livak, Stacey Gabriel, Andreas Gnirke, Eric S. Lander, Jennifer R. Brown, Donna Neuberg, Peter V. Kharchenko, Nir Hacohen, Gad Getz, Alexander Meissner, Catherine J. Wu^

Abstract: Intratumoral heterogeneity plays a critical role in tumor evolution. To define the contribution of DNA methylation to heterogeneity within tumors, we performed genome-scale bisulfite sequencing of 104 primary chronic lymphocytic leukemias (CLLs). Compared with 26 normal B cell samples, CLLs consistently displayed higher intrasample variability of DNA methylation patterns across the genome, which appears to arise from stochastically disordered methylation in malignant cells. Transcriptome analysis of bulk and single CLL cells revealed that methylation disorder was linked to low-level expression. Disordered methylation was further associated with adverse clinical outcome. We therefore propose that disordered methylation plays a similar role to that of genetic instability, enhancing the ability of cancer cells to search for superior evolutionary trajectories.

Paper: Cancer Cell. Dec 8, 2014. doi:10.1016/j.ccell.2014.10.012 | Pubmed | PDF