MendelImpute.jl
Fast genotype imputation, phasing, and admixture estimation!
MendelImpute.jl
is the fastest and most memory-efficient software for phasing and genotype imputation, as of 2020. It is also capable of local and global ancestry estimation.
Given a target genotype file (phased or unphased and may contain missing data) and a reference haplotype file (phased, no missing), our software imputes every SNP in the reference file to the target file, outputing phased or unphased genotypes. Like many other software, SNPs typed in target must all be present in the reference panel.
Package Features
- Built-in support for imputing genotypes stored in VCF files (
.vcf
,.vcf.gz
) or PLINK files. - Out-of-the-box multithreaded (shared memory) parallelism.
- Admixture estimation, with code examples to make pretty plots!
- Ultra-compressed file for phased genotypes.
- Imputation on dosage data.
Installation
Download and install Julia. Within Julia, copy and paste the following:
using Pkg
Pkg.add(PackageSpec(url="https://github.com/OpenMendel/SnpArrays.jl.git"))
Pkg.add(PackageSpec(url="https://github.com/OpenMendel/VCFTools.jl.git"))
Pkg.add(PackageSpec(url="https://github.com/OpenMendel/MendelImpute.jl.git"))
This package supports Julia v1.5
+.
Manual Outline
- Preparing Target Data
- Preparing Reference Haplotype Panel
- Detailed Example
- Step 1: generating realistic reference and target data
- Step 2: generating
.jlso
compressed reference panel - Step 3: Run imputation and phasing
- Step 3.5: (only for simulated data) check imputation accuracy
- Post-imputation: per-SNP Imputation Quality Score
- Post-imputation: per-sample Imputation Quality score
- Performance gotchas
- Gotcha 1: Run MendelImpute in parallel
- Gotcha 2:
max_d
too high (or too low) - Gotcha 3: Do you have enough memory (RAM)?
- Estimating ancestry
- Ultra-compressed format
- Run MendelImpute as script
- API