Welcome to the GDC Documentation!

This project provides tools and pipelines for interacting with the Genomic Data Commons (GDC). It is designed to run efficiently on high-performance clusters like the UMN MSI Agate cluster.

Contents:

Getting Started

Recommended Learning Pathway:

Installation - Set up software environment
Usage - Learn how to run the pipeline
Tutorial: Assembling 1000 Genomes Reference Data - Download reference data
Tutorial: Quality Control Pipeline in Practice - Run quality control
Tutorial: Ancestry Classification in Practice - Classify ancestry
Tutorial: Heritability Estimation with Multi-Ancestry Simulation - Estimate heritability (optional)

Quick Setup (MSI/UMN HPC):

module use /path/to/GDCGenomicsQC/envs
module load gdcgenomicsqc
conda activate snakemake
cd GDCGenomicsQC
snakemake --version

See also: Usage for detailed instructions on running the pipeline with module load or local snakemake.

Indices and tables