Welcome to the GDC Documentation!
This project provides tools and pipelines for interacting with the Genomic Data Commons (GDC). It is designed to run efficiently on high-performance clusters like the UMN MSI Agate cluster.
Contents:
Getting Started
Recommended Learning Pathway:
Installation - Set up software environment
Usage - Learn how to run the pipeline
Tutorial: Assembling 1000 Genomes Reference Data - Download reference data
Tutorial: Quality Control Pipeline in Practice - Run quality control
Tutorial: Ancestry Classification in Practice - Classify ancestry
Tutorial: Heritability Estimation with Multi-Ancestry Simulation - Estimate heritability (optional)
Quick Setup (MSI/UMN HPC):
module use /path/to/GDCGenomicsQC/envs
module load gdcgenomicsqc
conda activate snakemake
cd GDCGenomicsQC
snakemake --version
See also: Usage for detailed instructions on running the pipeline with module load or local snakemake.