GSH_Automation is a bioinformatics automation pipeline designed to
simplify the downloading, preprocessing, and preparation of genomic
datasets from multiple sources like Ensembl, COSMIC, miRGene, Orthologs,
EnhancerAtlas, UCSC, and more.
This repository provides a choice-based menu system (main.py) to
enable users to selectively execute tasks, ensuring modularity and
control over each dataset.
- Automated Data Downloads: Fetch genomic and annotation datasets from trusted sources (Ensembl, COSMIC, miRGene, EnhancerAtlas, UCSC, etc.).\
- Choice-Based Menu System: Interactive CLI for step-by-step control over data downloads and processing.\
- Environment Setup: Easy setup via Conda with required dependencies listed.\
- Modular Scripts: Each dataset and preprocessing step is isolated
in its own module under the
scripts/folder.
GSH_Automation/
│
├── main.py # Entry point for running the pipeline (choice-based menu system)
├── config/
│ ├── setting.py #url links
├── scripts/ # Contains all modular scripts for downloading/processing data
│ ├── ensembl.py # Downloads Ensembl dataset
│ ├── cosmic.py # Downloads COSMIC dataset
│ ├── mirgene.py # Downloads miRGene dataset
│ ├── orthologs.py # Downloads Orthologs data
│ ├── enhanceratlas.py # Downloads EnhancerAtlas data
│ ├── liftover.py # Runs UCSC Liftover for genome coordinate mapping
│ ├── cosmic_env.py # Handles COSMIC environment-specific downloads
│ ├── rna_files.py # Downloads lncRNA and tRNA files
│ ├── gaps_ftp.py # Downloads UCSC Gaps FTP data
│ └── wget.py # Downloads additional chromosome info via WGET
│
└── README.md # Documentation (this file)The main.py script provides an interactive choice-based menu
system.
This design ensures flexibility by letting users run specific tasks
independently instead of executing the entire pipeline at once.
==== Choose from the Menu ====
1. Download Ensembl data
2. Download miRGene data
3. Download Orthologs data
4. Download COSMIC data
5. Download EnhancerAtlas (retain dr.bed)
6. Run Liftover on dr.bed file
7. Download lncRNA and tRNA files
8. USCS_Gaps_FTP
9. WGET
10. Exit
- Enter the corresponding number to run a specific module.
- Example: typing
1runs the Ensembl download script (scripts/ensembl.py). - You can perform multiple tasks in sequence, and exit the pipeline
anytime by choosing option
10.
git clone https://github.com/TheOfficialBug/GSH_Automation.git
cd GSH_AutomationWe recommend using Conda to manage dependencies.
conda create -n gsh_env python=3.9 -y
conda activate gsh_envInstall required Python packages:
pip install -r requirements.txtIf requirements.txt is not available, install packages as needed
(e.g., requests, biopython, etc.).
Run the main script to start the choice-based menu system:
python main.pyFollow the on-screen prompts to download and process datasets as required.