Transcriptomics: A method to obtain global gene regulatory mechanisms

Paclitaxel is a high-value anticancer drug, but its production in cell-based systems is limited by low and variable yields. To enhance the product yield, genetic modification and process optimization are essential, which requires the knowledge on global gene regulatory mechanisms associated with enhanced paclitaxel production.

In this project, I designed RNA-seq experiments and built a scalable transcriptomic analysis pipeline to uncover global gene expression and regulatory patterns associated with enhanced paclitaxel production in Taxus suspension cultures. The analysis revealed coordinated regulatory pathways that positively interacted with paclitaxel biosynthetic network, highlighting candidate genes that can serve as targets for rational genetic engineering.

RNA-seq DESeq2 Pathway/GO Network analysis

Back to Research

Problem statement

Taxus species natively produce paclitaxel as part of their defense response to biotic stress in natural environments. Traditional synthesis methods that rely on direct extraction of paclitaxel from tree bark require sacrificing the entire tree, placing the species at the risk of endangerment. With the advancement of plant cell culture (PCC) technology, Taxus PCC have emerged as a sustainable alternative to produce the drug in vitro. However, the commercial production using this method is limited by low and inconsistent yield due to the inherent heterogenous nature of the cultures and unoptimized cellular system. To make this production platform commercially feasible, targeted cellular engineering through genetic modifications is essential.

Solution overview

RNA-seq is a cutting-edge technology that captures the dynamic cellular responses under controlled experimental conditions, enabling to uncover global gene regulatory mechanisms associated with specific phenotypic states. It can efficiently capture both the effects of different treatment conditions and intrinsic phenotypic variability across cell populations. In this project, we have prepared RNA-seq libraries from multiple cell culture conditions grouped by distinct phenotypes and stimulated with production-relevant conditions to obtain paired-end read sequencing data. From this data, we developed a scalable transcriptomic analysis workflow to build a gene regulatory network associated with enhanced paclitaxel production.

My role

I led the computational part of this project, which included read quality control and processing, alignment of the processed reads to the reference genome, preparation of differential gene expression datasets, transcripts annotation, and biological interpretation through data analysis and visualization. I built a reproducible read processing pipeline using tools such as FastQC, TrimGalore, Hisat2, DESeq2. In addition, I developed a customized RNA-seq data analysis pipeline, starting from the preparation of the differential gene expression datasets to the construction of pathway-gene network in R with results visualized using ggplot and Cytoscape.

Technical approach

My work in this project involved two main components. First, the preparation of differential gene expression (DGE) data sets, and second, the development of an RNA-seq analysis pipeline.

DGE data set preparation

After obtaining the raw read sequences, I processed the reads by trimming low-quality bases (q<30) and adapter sequences using FastQC and TrimGaore, retaining approximately 72% high quality reads across eight samples (NCBI under the accession number PRJNA839234). Following this, I aligned the processed reads to the Taxus genome available in NCBI using Hisat2, resulting in an average of 84% of the reads aligned to the genome. I estimated transcript abundance of the aligned reads using FeatureCounts tool and calculated differential gene expressions using DESeq2 package. Finally, transcripts were annotated using BLAST tool against Uniprot database, assigned pathways using KAAS server to obtain fully annotated DGE data sets.

RNA-seq analysis pipeline

First, I feature engineered the annotated DGE data sets by assigning each individual pathway to pathway groups and each gene ontology (GO) term to their broader groups to make the datasets easily interpretable. Then, I conducted differential gene expression analysis through multiple visualization techniques including Venn and volcano plots to obtain the overall gene expression profile across different samples. After picking the essential gene set that showed upregulation or downregulation in samples that produced higher paclitaxel compared to the control, I conducted pathway enrichment analysis of the gene set to obtain essential active pathways in the sample. Finally, I built a pathway-gene regulatory network highlighting regulatory pathways and genes might be contributing to the enhanced paclitaxel production.

Key outcomes

Discovered 4 (four) regulatory pathways co-expressed with the paclitaxel biosynthetic pathway associated with enhanced paclitaxel production
Identified more than 30 candidate genes responsible for drug export across the cell membrane, suggesting a potential mechanism for increased extracellular accumulation and overall yield
Proposed potential genetic engineering strategies to improve this production platform, including overexpression of two coordinated pathways through engineering transcription factors identified from this analysis, and enhancement of drug export capacity through transporter gene overexpression

Impact and relevance

This study aims to investigate the gene regulatory mechanisms involved in higher paclitaxel production in Taxus PCC platform by examining DGE data. By identifying candidate genes and regulatory elements, this work lays the groundwork for improving paclitaxel yield through targeted interventions. Future studies could integrate multi-omics approaches, such as proteomics and metabolomics, to further unravel the complex regulatory networks governing paclitaxel biosynthesis. Additionally, exploring the role of putative transporter genes, specific to paclitaxel, through functional validation could open new avenues to enhance extracellular paclitaxel yield using Taxus PCC for industrial-scale production.

Tools & skills

RNA-seq data analysisTranscriptomicsDifferential gene expression analysisGene regulatory and pathway analysisNetwork modeling and visualizationReproducible data workflowsR programmingPython scriptingDESeq2HISAT2ggplotCytoscape