Welcome to the GDC Documentation!

GDC Logo

This project provides tools and pipelines for interacting with the Genomic Data Commons (GDC). It is designed to run efficiently on high-performance clusters like the UMN MSI Agate cluster.

Getting Started

Recommended Learning Pathway:

  1. Installation - Set up software environment

  2. Usage - Learn how to run the pipeline

  3. Tutorial: Assembling 1000 Genomes Reference Data - Download reference data

  4. Tutorial: Quality Control Pipeline in Practice - Run quality control

  5. Tutorial: Ancestry Classification in Practice - Classify ancestry

  6. Tutorial: Heritability Estimation with Multi-Ancestry Simulation - Estimate heritability (optional)

Quick Setup (MSI/UMN HPC):

module use /path/to/GDCGenomicsQC/envs
module load gdcgenomicsqc
conda activate snakemake
cd GDCGenomicsQC
snakemake --version

See also: Usage for detailed instructions on running the pipeline with module load or local snakemake.

Indices and tables