Convert genotype/haplotype/dosage to numeric arrays
Most often we need to convert genetic data to numeric arrays for statistical analysis.
Example VCF file
We need an example VCF file for demonstation. You can manually download it from link (877KB) and put the file in your current working directory. Or, within Julia,
isfile("test.08Jun17.d8b.vcf.gz") || download("http://faculty.washington.edu/browning/beagle/test.08Jun17.d8b.vcf.gz",
joinpath(pwd(), "test.08Jun17.d8b.vcf.gz"))
stat("test.08Jun17.d8b.vcf.gz")
StatStruct(mode=0o100644, size=876514)
The first 5 markers in this VCF file are
using VCFTools
fh = openvcf("test.08Jun17.d8b.vcf.gz", "r")
for l in 1:35
println(readline(fh))
end
close(fh)
##fileformat=VCFv4.1
##INFO=<ID=LDAF,Number=1,Type=Float,Description="MLE Allele Frequency Accounting for LD">
##INFO=<ID=AVGPOST,Number=1,Type=Float,Description="Average posterior probability from MaCH/Thunder">
##INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from MaCH/Thunder">
##INFO=<ID=ERATE,Number=1,Type=Float,Description="Per-marker Mutation rate from MaCH/Thunder">
##INFO=<ID=THETA,Number=1,Type=Float,Description="Per-marker Transition rate from MaCH/Thunder">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Alternate Allele Count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Allele Count">
##ALT=<ID=DEL,Description="Deletion">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage from MaCH/Thunder">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihoods">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README">
##INFO=<ID=AF,Number=1,Type=Float,Description="Global Allele Frequency based on AC/AN">
##INFO=<ID=AMR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from AMR based on AC/AN">
##INFO=<ID=ASN_AF,Number=1,Type=Float,Description="Allele Frequency for samples from ASN based on AC/AN">
##INFO=<ID=AFR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from AFR based on AC/AN">
##INFO=<ID=EUR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from EUR based on AC/AN">
##INFO=<ID=VT,Number=1,Type=String,Description="indicates what type of variant the line represents">
##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data">
##reference=GRCh37
##reference=GRCh37
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00104 HG00106 HG00108 HG00109 HG00110 HG00111 HG00112 HG00113 HG00114 HG00116 HG00117 HG00118 HG00119 HG00120 HG00121 HG00122 HG00123 HG00124 HG00125 HG00126 HG00127 HG00128 HG00129 HG00130 HG00131 HG00133 HG00134 HG00135 HG00136 HG00137 HG00138 HG00139 HG00140 HG00141 HG00142 HG00143 HG00146 HG00148 HG00149 HG00150 HG00151 HG00152 HG00154 HG00155 HG00156 HG00158 HG00159 HG00160 HG00171 HG00173 HG00174 HG00176 HG00177 HG00178 HG00179 HG00180 HG00182 HG00183 HG00185 HG00186 HG00187 HG00188 HG00189 HG00190 HG00231 HG00232 HG00233 HG00234 HG00235 HG00236 HG00237 HG00238 HG00239 HG00240 HG00242 HG00243 HG00244 HG00245 HG00246 HG00247 HG00249 HG00250 HG00251 HG00252 HG00253 HG00254 HG00255 HG00256 HG00257 HG00258 HG00259 HG00260 HG00261 HG00262 HG00263 HG00264 HG00265 HG00266 HG00267 HG00268 HG00269 HG00270 HG00271 HG00272 HG00273 HG00274 HG00275 HG00276 HG00277 HG00278 HG00280 HG00281 HG00282 HG00284 HG00285 HG00306 HG00309 HG00310 HG00311 HG00312 HG00313 HG00315 HG00318 HG00319 HG00320 HG00321 HG00323 HG00324 HG00325 HG00326 HG00327 HG00328 HG00329 HG00330 HG00331 HG00332 HG00334 HG00335 HG00336 HG00337 HG00338 HG00339 HG00341 HG00342 HG00343 HG00344 HG00345 HG00346 HG00349 HG00350 HG00351 HG00353 HG00355 HG00356 HG00357 HG00358 HG00359 HG00360 HG00361 HG00362 HG00364 HG00366 HG00367 HG00369 HG00372 HG00373 HG00375 HG00376 HG00377 HG00378 HG00381 HG00382 HG00383 HG00384 HG00403 HG00404 HG00406 HG00407 HG00418 HG00419 HG00421 HG00422 HG00427 HG00428
22 20000086 rs138720731 T C 100 PASS AC=7;RSQ=0.8454;AVGPOST=0.9983;AA=T;AN=2184;LDAF=0.0040;THETA=0.0001;VT=SNP;SNPSOURCE=LOWCOV;ERATE=0.0003;AF=0.0032;AFR_AF=0.01 GT:DS:GL 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.04,-1.05,-5.00 0/0:0.000:-0.07,-0.85,-5.00 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.06,-0.87,-5.00 0/0:0.000:-0.03,-1.14,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.23,-0.45,-1.28 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.11,-0.65,-4.40 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.18,-0.47,-2.54 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.01,-1.74,-5.00 0/0:0.000:-0.00,-3.66,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-2.53,-5.00 0/0:0.000:-0.09,-0.73,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-3.11,-5.00 0/0:0.000:-0.06,-0.89,-5.00 0/0:0.000:-0.09,-0.71,-4.10 0/0:0.000:-0.11,-0.65,-4.40 0/0:0.000:-0.18,-0.47,-2.34 0/0:0.000:-0.22,-0.45,-1.32 0/0:0.000:-0.02,-1.29,-5.00 0/0:0.000:-0.03,-1.15,-5.00 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.00,-3.34,-5.00 0/0:0.000:-0.12,-0.61,-3.19 0/0:0.000:-0.11,-0.67,-4.40 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.18,-0.48,-2.15 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.10,-0.67,-3.62 0/0:0.000:-0.03,-1.14,-5.00 0/0:0.000:-0.09,-0.73,-4.40 0/0:0.000:-0.07,-0.84,-4.40 0/0:0.000:-0.18,-0.48,-2.46 0/0:0.000:-0.0292813,-1.18575,-5 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.67,-5.00 0/0:0.000:-0.18,-0.47,-2.40 0/0:0.000:-0.03,-1.25,-5.00 0/0:0.000:-0.11,-0.66,-3.44 0/0:0.000:-0.09,-0.73,-4.70 0/0:0.000:-0.0418663,-1.03687,-4.39794 0/0:0.000:-0.08,-0.79,-3.14 0/0:0.000:-0.00,-2.30,-5.00 0/0:0.000:-0.00,-2.54,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.06,-0.86,-5.00 0/0:0.000:-0.09,-0.71,-4.70 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.01,-1.88,-5.00 0/0:0.000:-0.09,-0.71,-4.70 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.10,-0.67,-4.40 0/0:0.000:-0.01,-1.51,-5.00 0/0:0.000:-0.02,-1.40,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.00,-2.02,-5.00 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.050:-0.18,-0.47,-2.73 0/0:0.000:-0.17,-0.49,-2.97 0/0:0.000:-0.10,-0.68,-4.40 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.12,-0.62,-3.38 0/0:0.000:-0.00,-2.06,-5.00 0/0:0.000:-0.16,-0.51,-2.66 0/0:0.000:-0.11,-0.64,-4.22 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.01,-1.64,-5.00 0/0:0.000:-0.00,-2.85,-5.00 0/0:0.000:-0.02,-1.38,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.0311436,-1.15989,-5 0/0:0.000:-0.36,-0.42,-0.73 0/0:0.000:-0.01,-1.88,-5.00 0/0:0.000:-0.05,-0.92,-5.00 0/0:0.000:-0.03,-1.16,-5.00 0/0:0.000:-0.04,-1.04,-5.00 0/0:0.000:-0.13,-0.59,-5.00 0/0:0.000:-0.02,-1.36,-5.00 0/0:0.000:-0.16,-0.51,-2.36 0/0:0.000:-0.02,-1.31,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.00,-4.40,-5.00 0/0:0.000:-0.03,-1.16,-5.00 0/0:0.000:-0.09,-0.73,-3.70 0/0:0.000:-0.19,-0.47,-1.77 0/0:0.000:-0.00,-3.32,-5.00 0/0:0.000:-0.17,-0.51,-2.00 0/0:0.000:-0.00,-2.17,-5.00 0/0:0.000:-0.00,-2.91,-5.00 0/0:0.000:-0.10,-0.71,-4.10 0/0:0.000:-0.03,-1.12,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.67,-5.00 0/0:0.000:-0.00,-2.09,-5.00 0/0:0.000:-0.04,-1.09,-5.00 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.02,-1.41,-5.00 0/0:0.000:-0.10,-0.69,-3.80 0/0:0.000:-0.01,-1.54,-5.00 0/0:0.000:-0.03,-1.16,-5.00 0/0:0.000:-0.09,-0.73,-4.70 0/0:0.000:-0.09,-0.74,-4.70 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.08,-0.78,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.10,-0.67,-4.40 0/0:0.000:-0.01,-1.71,-5.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.02,-1.26,-5.00 0/0:0.000:-0.04,-1.10,-5.00 0/0:0.000:-0.02,-1.27,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.00,-2.00,-5.00 0/0:0.000:-0.10,-0.67,-4.22 0/0:0.050:-0.18,-0.47,-2.34 0/0:0.000:-0.05,-1.00,-5.00 0/0:0.000:-0.11,-0.65,-3.85 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.08,-0.76,-5.00 0/0:0.000:-0.19,-0.47,-2.14 0/0:0.000:-0.00,-1.99,-5.00 0/0:0.000:-0.18,-0.47,-2.46 0/0:0.000:-0.09,-0.74,-4.40 0/0:0.450:-0.05,-0.94,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.18,-0.47,-2.34 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.02,-1.41,-5.00 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.18,-0.47,-1.95 0/0:0.000:-0.19,-0.46,-2.17 0/0:0.000:-0.03,-1.13,-5.00 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.18,-0.48,-2.23 0/0:0.000:-0.23,-0.45,-1.31 0/0:0.000:-0.11,-0.64,-3.92 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.11,-0.66,-4.22 0/0:0.000:-0.12,-0.61,-2.38 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.40,-0.45,-0.60 0/0:0.000:-0.00,-2.98,-5.00 0/0:0.000:-0.13,-0.59,-2.09 0/0:0.000:-0.02,-1.37,-5.00 0/0:0.000:-0.477139,-0.477113,-0.477113 0/0:0.000:-0.04,-1.10,-5.00 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.01,-1.51,-5.00 0/0:0.000:-0.01,-1.67,-5.00 0/0:0.000:-0.08,-0.75,-4.40 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.12,-0.63,-3.92 0/0:0.000:-0.01,-1.74,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.46,-2.60 0/0:0.000:-0.19,-0.46,-2.62 0/0:0.000:-0.11,-0.65,-4.70 0/0:0.000:-0.11,-0.66,-4.70 0/0:0.050:-0.18,-0.49,-2.04 0/0:0.050:-0.10,-0.67,-4.40 0/0:0.000:-0.01,-1.62,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.23,-0.41,-1.51 0/0:0.000:-0.18,-0.48,-2.18 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.10,-0.68,-4.10 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.18,-0.48,-2.14
22 20000146 rs73387790 G A 100 PASS LDAF=0.0169;RSQ=0.9482;THETA=0.0004;AA=G;AN=2184;AVGPOST=0.9972;VT=SNP;SNPSOURCE=LOWCOV;AC=36;ERATE=0.0003;AF=0.02;AFR_AF=0.07;EUR_AF=0.0013 GT:DS:GL 0/0:0.000:-0.00,-2.68,-5.00 0/0:0.000:-0.07,-0.82,-5.00 0/0:0.000:-0.13,-0.60,-3.05 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.18,-0.47,-3.08 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.16,-0.51,-2.63 0/0:0.000:-0.01,-1.76,-5.00 0/0:0.000:-0.10,-0.67,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.11,-0.66,-4.40 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.47,-2.30 0/0:0.000:-0.00,-2.80,-5.00 0/0:0.000:-0.00,-2.02,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.01,-1.76,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.02,-1.40,-5.00 0/0:0.000:-0.00,-2.09,-5.00 0/0:0.000:-0.10,-0.70,-4.10 0/0:0.000:-0.22,-0.46,-1.27 0/0:0.000:-0.18,-0.48,-2.39 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.47,-2.27 0/0:0.000:-0.07,-0.85,-5.00 0/0:0.000:-0.00,-2.53,-5.00 0/0:0.000:-0.00,-2.83,-5.00 0/0:0.000:-0.22,-0.46,-1.24 0/0:0.000:-0.19,-0.46,-2.27 0/0:0.000:-0.10,-0.68,-4.40 0/0:0.000:-0.09,-0.73,-4.22 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.15,-0.55,-2.64 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.08,-0.76,-4.70 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.06,-0.86,-5.00 0/0:0.000:-0.029681,-1.18006,-5 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.48,-2.33 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.00,-2.04,-5.00 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.0843997,-0.755377,-3.00877 0/0:0.000:-0.21,-0.47,-1.33 0/0:0.000:-0.00,-3.22,-5.00 0/0:0.000:-0.00,-3.70,-5.00 0/0:0.000:-0.05,-0.96,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.01,-1.86,-5.00 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.01,-1.51,-5.00 0/0:0.000:-0.03,-1.25,-5.00 0/0:0.000:-0.01,-1.92,-5.00 0/0:0.000:-0.00,-2.58,-5.00 0/0:0.000:-0.06,-0.89,-5.00 0/0:0.000:-0.05,-1.00,-5.00 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.01,-1.72,-5.00 0/0:0.000:-0.00,-2.25,-5.00 0/0:0.000:-0.02,-1.43,-5.00 0/0:0.000:-0.18,-0.48,-2.03 0/0:0.000:-0.18,-0.47,-2.72 0/0:0.000:-0.09,-0.72,-4.70 0/0:0.000:-0.18,-0.47,-2.42 0/0:0.000:-0.19,-0.46,-2.20 0/0:0.000:-0.24,-0.44,-1.19 0/0:0.000:-0.01,-1.75,-5.00 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.19,-0.46,-2.27 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.06,-0.87,-5.00 0/0:0.000:-0.00,-3.40,-5.00 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.01,-1.72,-5.00 0/0:0.000:-0.0148341,-1.47392,-5 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.03,-1.16,-5.00 0/0:0.000:-0.04,-1.03,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.09,-0.73,-4.40 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.02,-1.26,-5.00 0/0:0.000:-0.03,-1.25,-5.00 0/0:0.000:-0.19,-0.45,-2.12 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.05,-0.96,-5.00 0/0:0.000:-0.00,-4.70,-5.00 0/0:0.000:-0.13,-0.58,-3.06 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.17,-0.51,-2.09 0/0:0.000:-0.00,-2.29,-5.00 0/0:0.000:-0.38,-0.43,-0.67 0/0:0.000:-0.00,-2.81,-5.00 0/0:0.000:-0.00,-3.25,-5.00 0/0:0.000:-0.22,-0.46,-1.26 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.01,-1.54,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-2.04,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.10,-0.68,-3.70 0/0:0.000:-0.09,-0.72,-5.00 0/0:0.000:-0.01,-1.77,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.01,-1.51,-5.00 0/0:0.000:-0.16,-0.51,-2.74 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.18,-0.48,-2.26 0/0:0.000:-0.18,-0.48,-2.37 0/0:0.000:-0.18,-0.48,-2.27 0/0:0.000:-0.00,-2.58,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.00,-2.34,-5.00 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.08,-0.78,-4.70 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.18,-0.48,-2.36 0/0:0.000:-0.01,-1.53,-5.00 0/0:0.000:-0.18,-0.48,-2.25 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.09,-0.73,-5.00 0/0:0.000:-0.02,-1.41,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.18,-0.47,-2.41 0/0:0.000:-0.09,-0.73,-4.40 0/0:0.000:-0.00,-2.00,-5.00 0/0:0.000:-0.19,-0.46,-2.19 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.05,-0.98,-5.00 0/0:0.000:-0.18,-0.47,-2.31 0/0:0.000:-0.09,-0.73,-4.10 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.10,-0.67,-3.66 0/0:0.000:-0.12,-0.63,-4.70 0/0:0.000:-0.16,-0.51,-2.28 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.01,-1.75,-5.00 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.10,-0.68,-4.22 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.12,-0.62,-3.08 0/0:0.000:-0.19,-0.45,-2.25 0/0:0.000:-0.01,-1.77,-5.00 0/0:0.000:-0.13,-0.60,-2.15 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.04,-1.05,-5.00 0/0:0.000:-0.19,-0.46,-2.30 0/0:0.000:-0.19,-0.46,-2.28 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.190265,-0.457324,-2.2321 0/0:0.000:-0.23,-0.46,-1.21 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.00,-2.44,-5.00 0/0:0.000:-0.01,-1.73,-5.00 0/0:0.000:-0.01,-1.53,-5.00 0/0:0.000:-0.05,-0.96,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.19,-0.46,-2.62 0/0:0.000:-0.00,-2.25,-5.00 0/0:0.000:-0.19,-0.46,-2.23 0/0:0.000:-0.11,-0.65,-4.70 0/0:0.000:-0.18,-0.46,-2.68 0/0:0.000:-0.11,-0.66,-4.40 0/0:0.000:-0.19,-0.46,-2.43 0/0:0.000:-0.18,-0.49,-2.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.20,-0.45,-2.05 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.10,-0.68,-4.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.10,-0.69,-4.22 0/0:0.000:-0.11,-0.65,-4.40 0/0:0.000:-0.11,-0.67,-3.85
22 20000199 rs183293480 A C 100 PASS LDAF=0.0009;THETA=0.0004;AN=2184;AVGPOST=0.9990;VT=SNP;AA=A;RSQ=0.6274;SNPSOURCE=LOWCOV;AC=1;ERATE=0.0003;AF=0.0005;EUR_AF=0.0013 GT:DS:GL 0/0:0.000:-0.00,-2.04,-5.00 0/0:0.000:-0.07,-0.82,-3.47 0/0:0.000:-0.07,-0.83,-5.00 0/0:0.000:-0.03,-1.12,-5.00 0/0:0.000:-0.11,-0.64,-4.10 0/0:0.000:-0.12,-0.62,-3.85 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.01,-1.54,-5.00 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.16,-0.50,-3.30 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.10,-0.70,-5.00 0/0:0.000:-0.19,-0.46,-2.46 0/0:0.000:-0.16,-0.51,-2.67 0/0:0.000:-0.00,-2.57,-5.00 0/0:0.000:-0.00,-2.85,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-2.55,-5.00 0/0:0.000:-0.00,-2.02,-5.00 0/0:0.050:-0.48,-0.48,-0.48 0/0:0.000:-0.23,-0.46,-1.24 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.10,-0.68,-4.40 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.13,-0.61,-2.23 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.26,-0.43,-1.11 0/0:0.000:-0.04,-1.01,-5.00 0/0:0.000:-0.12,-0.62,-3.28 0/0:0.000:-0.00,-2.27,-5.00 0/0:0.000:-0.00,-2.42,-5.00 0/0:0.000:-0.04,-1.08,-5.00 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.04,-1.03,-5.00 0/0:0.000:-0.10,-0.70,-4.10 0/0:0.000:-0.18,-0.48,-2.22 0/0:0.000:-0.02,-1.41,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.14,-0.57,-2.12 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.12,-0.62,-2.28 0/0:0.000:-0.0149779,-1.4698,-5 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.06,-0.89,-5.00 0/0:0.000:-0.39,-0.23,-2.25 0/0:0.000:-0.00,-3.40,-5.00 0/0:0.000:-0.07,-0.86,-4.40 0/0:0.000:-0.18,-0.48,-2.35 0/0:0.000:-0.00967891,-1.65679,-5 0/0:0.000:-0.22,-0.46,-1.24 0/0:0.000:-0.00,-2.27,-5.00 0/0:0.000:-0.00,-3.12,-5.00 0/0:0.000:-0.01,-1.73,-5.00 0/0:0.000:-0.02,-1.39,-5.00 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.09,-0.72,-4.10 0/0:0.000:-0.10,-0.71,-4.70 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.04,-1.11,-5.00 0/0:0.000:-0.03,-1.14,-5.00 0/0:0.000:-0.00,-2.42,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.11,-0.65,-3.85 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.01,-1.72,-5.00 0/0:0.000:-0.00,-2.08,-5.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.16,-0.52,-2.11 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.44,-0.46,-0.54 0/0:0.000:-0.07,-0.83,-5.00 0/0:0.000:-0.16,-0.51,-2.42 0/0:0.000:-0.18,-0.48,-2.30 0/0:0.000:-0.19,-0.46,-2.27 0/0:0.000:-0.06,-0.89,-5.00 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.00,-3.15,-5.00 0/0:0.000:-0.12,-0.61,-4.10 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.00935928,-1.67121,-5 0/0:0.000:-0.18,-0.48,-2.13 0/0:0.000:-0.07,-0.85,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.03,-1.15,-5.00 0/0:0.000:-0.09,-0.71,-4.70 0/0:0.000:-0.19,-0.46,-2.52 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.09,-0.73,-4.10 0/0:0.000:-0.11,-0.64,-4.40 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.19,-0.47,-1.83 0/0:0.000:-0.00,-2.76,-5.00 0/0:0.000:-0.09,-0.73,-3.92 0/0:0.000:-0.15,-0.55,-2.43 0/0:0.000:-0.18,-0.48,-1.89 0/0:0.000:-0.00,-3.18,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.02,-1.43,-5.00 0/0:0.000:-0.22,-0.40,-5.00 0/0:0.000:-0.07,-0.84,-4.70 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.10,-0.69,-3.85 0/0:0.000:-0.01,-1.84,-5.00 0/0:0.000:-0.00,-3.07,-5.00 0/0:0.000:-0.10,-0.69,-3.85 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.10,-0.69,-3.80 0/0:0.000:-0.48,-0.18,-4.10 0/0:0.000:-0.02,-1.47,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.18,-0.48,-2.34 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.48,-2.37 0/0:0.000:-0.10,-0.70,-4.40 0/0:0.000:-0.00,-2.44,-5.00 0/0:0.000:-0.01,-1.78,-5.00 0/0:0.000:-0.00,-2.06,-5.00 0/0:0.000:-0.00,-2.28,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.01,-1.51,-5.00 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.11,-0.64,-4.70 0/0:0.000:-0.19,-0.46,-2.54 0/0:0.000:-0.02,-1.38,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.02,-1.26,-5.00 0/0:0.000:-0.18,-0.47,-2.47 0/0:0.000:-0.16,-0.51,-2.43 0/0:0.000:-0.01,-1.73,-5.00 0/0:0.000:-0.19,-0.46,-2.79 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.18,-0.48,-2.35 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.19,-0.46,-2.72 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.16,-0.51,-2.71 0/0:0.000:-0.01,-1.80,-5.00 0/0:0.000:-0.10,-0.69,-3.66 0/0:0.000:-0.10,-0.69,-4.22 0/0:0.000:-0.23,-0.44,-1.26 0/0:0.000:-0.19,-0.46,-2.12 0/0:0.000:-0.04,-1.07,-5.00 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.46,-2.20 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.11,-0.65,-3.44 0/0:0.000:-0.05,-1.00,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.01,-1.77,-5.00 0/0:0.000:-0.189687,-0.458221,-2.2426 0/0:0.000:-0.01,-1.79,-5.00 0/0:0.050:-0.24,-0.42,-1.39 0/0:0.000:-0.19,-0.45,-4.70 0/0:0.000:-0.00,-2.15,-5.00 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.18,-0.48,-2.23 0/0:0.000:-0.00,-2.11,-5.00 0/0:0.000:-0.11,-0.66,-4.70 0/0:0.000:-0.01,-1.59,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.46,-2.63 0/0:0.000:-0.11,-0.66,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.46,-2.43 0/0:0.000:-0.19,-0.47,-1.84 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.46,-2.21 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.82,-5.00 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.00,-2.26,-5.00 0/0:0.000:-0.10,-0.69,-4.00
22 20000291 rs185807825 G T 100 PASS ERATE=0.0005;AVGPOST=0.9983;AA=G;AN=2184;LDAF=0.0015;VT=SNP;SNPSOURCE=LOWCOV;RSQ=0.5564;AC=2;THETA=0.0003;AF=0.0009;ASN_AF=0.0035 GT:DS:GL 0/0:0.000:-0.00,-2.06,-5.00 0/0:0.000:-0.07,-0.83,-5.00 0/0:0.000:-0.02,-1.27,-5.00 0/0:0.000:-0.01,-1.77,-5.00 0/0:0.000:-0.19,-0.45,-2.14 0/0:0.000:-0.11,-0.66,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/0:0.000:-0.02,-1.33,-5.00 0/0:0.000:-0.18,-0.48,-2.43 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.18,-0.46,-2.76 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.03,-1.15,-5.00 0/0:0.000:-0.07,-0.83,-4.40 0/0:0.000:-0.02,-1.33,-5.00 0/0:0.000:-0.10,-0.68,-4.40 0/0:0.000:-0.00,-2.28,-5.00 0/0:0.000:-0.01,-1.83,-5.00 0/0:0.000:-0.13,-0.61,-2.24 0/0:0.000:-0.00,-3.70,-5.00 0/0:0.000:-0.00,-2.21,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.22,-0.46,-1.25 0/0:0.000:-0.00,-2.81,-5.00 0/0:0.000:-0.02,-1.44,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.08,-0.79,-3.21 0/0:0.000:-0.01,-1.72,-5.00 0/0:0.000:-0.21,-0.46,-1.42 0/0:0.000:-0.08,-0.79,-4.40 0/0:0.000:-0.04,-1.06,-5.00 0/0:0.000:-0.02,-1.42,-5.00 0/0:0.000:-0.00,-3.85,-5.00 0/0:0.000:-0.07,-0.83,-5.00 0/0:0.000:-0.01,-1.80,-5.00 0/0:0.000:-0.11,-0.66,-4.40 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.19,-0.46,-2.54 0/0:0.000:-0.03,-1.22,-5.00 0/0:0.000:-0.05,-1.00,-5.00 0/0:0.000:-0.13,-0.60,-2.19 0/0:0.000:-0.0021071,-2.31515,-5 0/0:0.000:-0.10,-0.68,-4.40 0/0:0.000:-0.10,-0.68,-4.22 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.00,-3.38,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.20,-0.45,-1.99 0/0:0.000:-0.0193877,-1.35992,-5 0/0:0.000:-0.02,-1.28,-5.00 0/0:0.000:-0.00,-2.46,-5.00 0/0:0.000:-0.00,-2.75,-5.00 0/0:0.000:-0.18,-0.47,-2.36 0/0:0.000:-0.10,-0.68,-4.00 0/0:0.000:-0.02,-1.34,-5.00 0/0:0.000:-0.00,-3.27,-5.00 0/0:0.250:-0.03,-1.19,-5.00 0/0:0.000:-0.07,-0.84,-5.00 0/0:0.000:-0.01,-1.52,-5.00 0/0:0.000:-0.11,-0.65,-3.40 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.00,-1.99,-5.00 0/0:0.000:-0.01,-1.61,-5.00 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.10,-0.69,-4.22 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.01,-1.59,-5.00 0/0:0.000:-0.10,-0.69,-3.36 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.08,-0.77,-4.10 0/0:0.000:-0.05,-1.00,-5.00 0/0:0.000:-0.10,-0.67,-3.80 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.06,-0.86,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.11,-0.66,-4.70 0/0:0.000:-0.00,-2.68,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.01,-1.84,-5.00 0/0:0.000:-0.0293556,-1.18469,-5 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.19,-0.46,-2.37 0/0:0.000:-0.10,-0.68,-4.10 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.18,-0.48,-2.06 0/0:0.000:-0.06,-0.87,-5.00 0/0:0.000:-0.03,-1.13,-5.00 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.00,-3.85,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.07,-0.83,-4.40 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-2.82,-5.00 0/0:0.000:-0.08,-0.76,-3.80 0/0:0.000:-0.00,-3.47,-5.00 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.11,-0.65,-3.59 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.02,-1.43,-5.00 0/0:0.000:-0.00,-2.73,-5.00 0/0:0.000:-0.06,-0.86,-4.22 0/0:0.000:-0.14,-0.57,-2.84 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.01,-1.76,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.01,-1.79,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.01,-1.87,-5.00 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.08,-0.79,-5.00 0/0:0.000:-0.06,-0.91,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.00,-2.89,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.20,-0.44,-2.07 0/0:0.000:-0.00,-2.65,-5.00 0/0:0.000:-0.02,-1.27,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.01,-1.79,-5.00 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.35,-0.41,-0.78 0/0:0.000:-0.09,-0.75,-3.70 0/0:0.000:-0.05,-0.98,-5.00 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.08,-0.77,-5.00 0/0:0.000:-0.04,-1.03,-5.00 0/0:0.000:-0.02,-1.47,-5.00 0/0:0.000:-0.03,-1.16,-5.00 0/0:0.000:-0.01,-1.57,-5.00 0/0:0.000:-0.06,-0.89,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.00,-2.30,-5.00 0/0:0.000:-0.05,-0.98,-5.00 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.06,-0.87,-5.00 0/0:0.000:-0.16,-0.51,-2.37 0/0:0.000:-0.19,-0.46,-2.23 0/0:0.050:-0.27,-0.34,-3.22 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.08,-0.77,-4.40 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.07,-0.82,-5.00 0/0:0.000:-0.12,-0.61,-2.91 0/0:0.000:-0.00,-1.98,-5.00 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.07,-0.82,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.12,-0.61,-3.15 0/0:0.000:-0.00,-2.08,-5.00 0/0:0.000:-0.02,-1.33,-5.00 0/0:0.000:-0.01,-1.74,-5.00 0/0:0.050:-0.11227,-0.642561,-4.22185 0/0:0.000:-0.22,-0.46,-1.28 0/0:0.000:-0.10,-0.70,-4.10 0/0:0.000:-0.10,-0.67,-3.62 0/0:0.000:-0.03,-1.18,-5.00 0/0:0.000:-0.03,-1.24,-5.00 0/0:0.000:-0.10,-0.70,-4.10 0/0:0.000:-0.01,-1.80,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.02,-1.47,-5.00 0/0:0.000:-0.06,-0.88,-5.00 0/0:0.000:-0.03,-1.13,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.46,-2.69 0/0:0.000:-0.03,-1.13,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.18,-0.48,-2.41 0/0:0.000:-0.18,-0.47,-2.85 0/0:0.050:-0.48,-0.48,-0.48 0/0:0.000:-0.08,-0.76,-4.70 0/0:0.000:-0.10,-0.69,-4.00 0/0:0.000:-0.11,-0.65,-3.62 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.70,-5.00 0/0:0.000:-0.00,-2.57,-5.00
22 20000428 rs55902548 G T 100 PASS AC=323;AVGPOST=0.9983;AA=G;AN=2184;VT=SNP;RSQ=0.9949;LDAF=0.1473;SNPSOURCE=LOWCOV;ERATE=0.0003;THETA=0.0003;AF=0.15;ASN_AF=0.0017;AMR_AF=0.15;AFR_AF=0.31;EUR_AF=0.15 GT:DS:GL 1/0:1.000:-5.00,0.00,-5.00 0/0:0.000:-0.35,-0.43,-0.73 0/1:1.000:-1.81,-0.01,-2.95 0/0:0.000:-0.01,-1.79,-5.00 0/0:0.000:-0.06,-0.86,-5.00 1/0:1.000:-0.19,-0.46,-2.18 0/0:0.000:-0.10,-0.68,-5.00 0/1:1.000:-4.40,-0.03,-1.12 0/1:1.000:-5.00,-0.69,-0.10 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.48,-0.48,-0.48 0/1:1.000:-5.00,-0.01,-1.77 0/0:0.000:-0.18,-0.48,-2.57 0/0:0.000:-0.02,-1.31,-5.00 0/0:0.000:-0.11,-0.65,-4.70 0/0:0.000:-0.10,-0.68,-4.70 0/0:0.000:-0.01,-1.72,-5.00 1/0:1.000:-5.00,0.00,-5.00 1/0:1.000:-1.38,-0.02,-2.61 0/1:1.000:-5.00,-1.40,-0.02 0/0:0.000:-0.00,-2.97,-5.00 0/0:0.000:-0.19,-0.47,-2.15 0/0:0.000:-0.44,-0.46,-0.54 0/0:0.000:-0.00,-2.52,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.01,-1.77,-5.00 0/1:0.750:-0.22,-0.46,-1.26 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.21,-0.46,-1.42 0/0:0.000:-0.19,-0.45,-2.20 0/0:0.000:-0.12,-0.63,-3.36 0/0:0.000:-0.00,-2.54,-5.00 0/0:0.000:-0.00,-2.88,-5.00 1/0:1.000:-2.10,-0.00,-4.00 1/0:1.250:-5.00,-1.62,-0.01 0/1:1.000:-3.06,-0.47,-0.18 0/1:1.000:-5.00,-0.87,-0.06 1/1:2.000:-5.00,-0.84,-0.07 0/0:0.000:-0.05,-0.95,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.14,-0.58,-2.17 0/0:0.000:-0.01,-1.77,-5.00 0/0:0.000:-0.06,-0.91,-5.00 1/0:1.000:-5,0,-5 0/0:0.000:-0.05,-0.95,-5.00 1/0:1.000:-4.70,-0.70,-0.10 0/0:0.000:-0.09,-0.72,-4.70 0/0:0.000:-0.01,-1.54,-5.00 1/1:2.000:-5.00,-0.68,-0.10 0/0:0.000:-0.18,-0.48,-2.33 0/0:0.000:-0.00465443,-1.97224,-5 0/0:0.000:-0.48,-0.48,-0.48 1/0:1.000:-5.00,-0.93,-0.05 0/0:0.000:-0.00,-2.82,-5.00 1/1:2.000:-5.00,-1.67,-0.01 0/0:0.000:-0.05,-0.95,-5.00 1/0:1.000:-5.00,-0.87,-0.06 0/0:0.000:-0.01,-1.78,-5.00 0/0:0.000:-0.01,-1.83,-5.00 0/1:1.000:-5.00,-0.00,-3.70 0/1:0.950:-0.15,-0.53,-2.39 0/0:0.000:-0.09,-0.71,-4.70 0/0:0.000:-0.18,-0.48,-2.56 0/0:0.000:-0.00,-2.60,-5.00 0/0:0.000:-0.01,-1.49,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.04,-1.09,-5.00 0/0:0.000:-0.00,-2.40,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.01,-1.54,-5.00 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.01,-1.84,-5.00 0/1:1.000:-2.39,-0.30,-0.31 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.11,-0.64,-3.44 0/0:0.000:-0.02,-1.42,-5.00 0/0:0.000:-0.01,-1.76,-5.00 0/0:0.000:-0.03,-1.17,-5.00 0/1:1.000:-5.00,-0.03,-1.18 0/0:0.000:-0.03,-1.21,-5.00 0/0:0.000:-0.10,-0.70,-5.00 0/1:1.000:-5.00,0.00,-5.00 0/0:0.000:-0.01,-1.58,-5.00 0/1:1.000:-5.00,0.00,-5.00 1/0:1.000:-5,-5.21375e-05,-3.92082 0/0:0.000:-0.05,-0.92,-5.00 0/0:0.000:-0.02,-1.45,-5.00 0/1:1.000:-5.00,-0.00,-4.22 0/0:0.000:-0.09,-0.73,-4.22 0/0:0.000:-0.00,-2.03,-5.00 0/0:0.000:-0.03,-1.20,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.18,-0.48,-2.03 0/0:0.000:-0.01,-1.59,-5.00 0/0:0.000:-0.37,-0.42,-0.71 0/0:0.000:-0.02,-1.45,-5.00 0/0:0.000:-0.00,-3.36,-5.00 1/0:1.000:-3.85,-0.05,-0.94 1/0:1.000:-5.00,-0.84,-0.07 1/0:1.000:-0.06,-0.90,-5.00 0/0:0.000:-0.00,-2.56,-5.00 0/0:0.000:-0.19,-0.47,-1.73 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.00,-3.74,-5.00 0/0:0.000:-0.22,-0.46,-1.26 1/0:1.000:-5.00,-0.00,-3.92 0/0:0.000:-0.10,-0.69,-3.85 0/0:0.000:-0.00,-2.76,-5.00 0/0:0.000:-0.01,-1.70,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.06,-0.92,-5.00 0/0:0.000:-0.01,-1.73,-5.00 0/0:0.000:-0.05,-1.00,-5.00 1/0:1.000:-2.32,-0.01,-1.58 0/1:0.950:-0.09,-0.73,-4.70 0/1:1.000:-2.25,-0.02,-1.52 0/0:0.000:-0.01,-1.50,-5.00 0/0:0.000:-0.10,-0.70,-4.40 0/0:0.000:-0.05,-0.99,-5.00 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.03,-1.25,-5.00 0/0:0.000:-0.10,-0.67,-4.10 0/1:1.000:-5.00,-0.00,-4.22 0/0:0.000:-0.18,-0.48,-2.02 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.00,-4.70,-5.00 0/0:0.000:-0.03,-1.23,-5.00 0/0:0.000:-0.05,-0.93,-5.00 0/0:0.000:-0.03,-1.20,-5.00 1/0:1.000:-0.37,-0.24,-2.87 0/0:0.000:-0.05,-0.94,-5.00 0/0:0.000:-0.18,-0.48,-2.36 0/0:0.000:-0.18,-0.48,-1.83 0/0:0.000:-0.18,-0.48,-2.23 0/0:0.000:-0.01,-1.78,-5.00 0/0:0.000:-0.06,-0.90,-5.00 0/0:0.000:-0.07,-0.82,-5.00 0/0:0.000:-0.10,-0.70,-4.70 0/0:0.000:-0.01,-1.48,-5.00 0/0:0.000:-0.02,-1.46,-5.00 0/0:0.000:-0.18,-0.48,-2.11 0/0:0.000:-0.10,-0.69,-4.70 0/0:0.000:-0.18,-0.47,-2.84 0/0:0.000:-0.03,-1.21,-5.00 1/0:1.000:-4.40,-0.00,-4.70 0/1:0.800:-0.03,-1.20,-5.00 0/0:0.000:-0.05,-0.98,-5.00 0/0:0.000:-0.10,-0.68,-3.70 0/0:0.000:-0.10,-0.67,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.10,-0.68,-3.59 0/0:0.000:-0.02,-1.26,-5.00 0/0:0.000:-0.48,-0.48,-0.48 0/0:0.000:-0.01,-1.80,-5.00 1/0:1.000:-5.00,-0.66,-0.11 0/0:0.000:-0.12,-0.61,-2.49 1/1:2.000:-5.00,-1.30,-0.02 0/0:0.000:-0.04,-1.07,-5.00 0/0:0.000:-0.18,-0.48,-2.19 0/0:0.000:-0.07,-0.85,-5.00 1/0:1.000:-2.07,-0.04,-1.10 0/1:1.000:-0.09,-0.74,-4.70 0/1:1.000:-5.00,-0.00,-2.47 0/0:0.000:-0.02,-1.26,-5.00 1/0:1.000:-2.25,-0.01,-1.55 0/0:0.000:-0.00,-3.17,-5.00 0/0:0.000:-0.203967,-0.438136,-1.99396 0/0:0.000:-0.12,-0.62,-3.18 0/0:0.000:-0.03,-1.19,-5.00 0/0:0.000:-0.01,-1.66,-5.00 1/0:1.000:-5.00,-0.00,-3.74 1/0:1.000:-1.43,-0.02,-4.70 0/0:0.000:-0.05,-0.96,-5.00 0/0:0.000:-0.09,-0.74,-4.70 0/0:0.000:-0.48,-0.48,-0.48 1/0:1.000:-5.00,-0.18,-0.47 0/0:0.000:-0.17,-0.50,-2.70 0/1:1.000:-5.00,-0.00,-3.80 0/1:1.000:-2.22,-0.01,-1.97 0/1:1.000:-5.00,-0.67,-0.10 0/0:0.000:-0.21,-0.43,-2.01 0/0:0.000:-0.19,-0.47,-1.83 0/0:0.000:-0.10,-0.69,-4.40 0/0:0.000:-0.11,-0.64,-4.10 0/0:0.000:-0.18,-0.47,-2.80 0/0:0.000:-0.01,-1.47,-5.00 0/0:0.000:-0.22,-0.43,-1.61 0/0:0.000:-0.02,-1.47,-5.00 0/0:0.000:-0.01,-1.70,-5.00 0/0:0.000:-0.05,-0.97,-5.00 0/0:0.000:-0.02,-1.28,-5.00
We'd like to convert the genotype data (GT field) into the matrix of minor alleles.
Genetic model
There are differnt SNP models. The additive SNP model essentially counts the number of minor allele (0, 1 or 2) per genotype. Other SNP models are dominant and recessive, both in terms of the minor allele. VCFTools.jl
always assume the ALT
allele in the VCF file is the minor allele. Thus genotypes are translated to real number according to
Genotype | VCF GT | model=:additive | model=:dominant | model=:recessive |
---|---|---|---|---|
ALT, ALT | 1/1, 1$\vert$1 | 2 | 1 | 1 |
REF, ALT | 0/1, 0$\vert$1 | 1 | 1 | 0 |
REF, REF | 0/0, 0$\vert$0 | 0 | 0 | 0 |
missing | ./., .$\vert$. | missing | missing | missing |
To properly record the missing genotypes, VCFTools convert VCF GT data to matrix A
where element type of A
is either a numeric number, or missing value. In Julia, this means eltype(A) <: Union{Missing, Real}
where <:
means "is a subtype".
Convert GT data into a numeric genotype matrix
Convert GT data in VCF file test.08Jun17.d8b.vcf.gz to a Matrix{Union{Missing, Int8}}
. VCFTools.jl
will copy the 0
s and 1
s of the file directly into A
without checking if ALT or REF is the minor allele.
@time A = convert_gt(Int8, "test.08Jun17.d8b.vcf.gz"; model = :additive, impute = false, center = false, scale = false)
1.760590 seconds (2.06 M allocations: 118.037 MiB, 1.18% gc time, 95.14% compilation time)
191×1356 Matrix{Union{Missing, Int8}}:
0 0 0 0 1 0 0 0 0 0 0 0 2 … 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 2 … 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 … 0 0 1 0 1 0 0 0 0 1 1 0
0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 1 0
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮
0 0 0 0 1 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 0 0 0 0 0 2 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 2 … 0 0 0 0 0 0 0 0 0 0 0 0
We can also optionally impute missing genotypes according to allele frequency, centers the dosages around 2MAF, and scales the dosages by sqrt(2MAF*(1-MAF))
.
@time A = convert_gt(Float64, "test.08Jun17.d8b.vcf.gz"; model = :additive, impute = true, center = true, scale = true)
0.148316 seconds (523.76 k allocations: 42.592 MiB, 4.85% gc time, 47.83% compilation time)
191×1356 Matrix{Union{Missing, Float64}}:
0.0 0.0 0.0 0.0 1.41301 0.0 … 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 1.41301 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 1.41301 0.0 … 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 2.36899 2.36899 0.0
0.0 0.0 0.0 0.0 1.41301 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 1.41301 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 … 0.0 0.0 2.36899 2.36899 0.0
0.0 0.0 0.0 0.0 1.41301 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 2.36899 2.36899 0.0
⋮ ⋮ ⋱ ⋮
0.0 0.0 0.0 0.0 1.41301 0.0 0.0 0.0 2.36899 2.36899 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 … 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 … 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 0.0 0.0 -0.390016 -0.390016 0.0
0.0 0.0 0.0 0.0 -0.586138 0.0 … 0.0 0.0 -0.390016 -0.390016 0.0
Convert GT data into haplotypes panels
In certain applications such as genotype imputation, the genotypes are phased. In this case, the alleles are separated by a '|'
so we can distinguish heterozygote genotypes (1|0
vs 0|1
). Haplotypes can be imported via convert_ht
@time H = convert_ht(Int8, "test.08Jun17.d8b.vcf.gz")
0.160939 seconds (732.86 k allocations: 52.687 MiB, 5.00% gc time, 50.10% compilation time)
382×1356 Matrix{Int8}:
0 0 0 0 1 0 0 0 0 0 0 0 1 … 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 1 … 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 1 … 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮
0 0 0 0 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Note there are 382 rows to represent the haplotypes of 191 samples. The first sample occupies row 1 & 2, the second sample occupies 3&4...etc.
VCFTools.jl
does not check if the allele separators are '|'
when you call convert_ht
. One must be cautious to run this function only on phased data.
Convert DS data into dosages
Often data contains dosage information. In this case, matrix values can be any number between 0 and 2. One can similarly import dosage into a numeric matrix.
@time D = convert_ds(Float64, "test.08Jun17.d8b.vcf.gz"; key="DS", impute=false, center=false, scale=false)
0.204783 seconds (1.56 M allocations: 121.663 MiB, 10.37% gc time, 31.82% compilation time)
191×1356 Matrix{Union{Missing, Float64}}:
0.0 0.0 0.0 0.0 1.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 1.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0
⋮ ⋮ ⋱ ⋮ ⋮
0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.05 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Extract data marker-by-maker or window-by-window
Large VCF files easily generate numeric arrays that cannot fit into computer memory. Many analyses only need to loop over markers or sets of markers. Previous functions for importing genotypes/haplotypes/dosages have equivalent functions to achieve this:
copy_gt!
loops over genotypescopy_ht!
loops over haplotypescopy_ds!
loops over dosages
For example, to loop over all genotype markers in the VCF file test.08Jun17.d8b.vcf.gz:
using VariantCallFormat
# initialize VCF reader
people, snps = nsamples("test.08Jun17.d8b.vcf.gz"), nrecords("test.08Jun17.d8b.vcf.gz")
reader = VCF.Reader(openvcf("test.08Jun17.d8b.vcf.gz"))
# pre-allocate vector for marker data
g = zeros(Union{Missing, Float64}, people)
for j = 1:snps
copy_gt!(g, reader; model = :additive, impute = true, center = true, scale = true)
# do statistical anlaysis
end
close(reader)
To loop over markers in windows of size 25:
# initialize VCF reader
people, snps = nsamples("test.08Jun17.d8b.vcf.gz"), nrecords("test.08Jun17.d8b.vcf.gz")
reader = VCF.Reader(openvcf("test.08Jun17.d8b.vcf.gz"))
# pre-allocate matrix for marker data
windowsize = 25
g = zeros(Union{Missing, Float64}, people, windowsize)
nwindows = ceil(Int, snps / windowsize)
for j = 1:nwindows
copy_gt!(g, reader; model = :additive, impute = true, center = true, scale = true)
# do statistical anlaysis
end
close(reader)
┌ Warning: Reached end of reader; columns 7-25 are set to missing values
└ @ VCFTools /Users/biona001/.julia/dev/VCFTools/src/convert.jl:72
As the warning suggests, the last window has less than 25 markers. The remaining columns in the matrix g
are set to missing values.
Sample ID, Chromosome, SNP position, REF/ALT alleles
To extract sample ID without looping over the entire VCF file, you can do
ids = sampleID("test.08Jun17.d8b.vcf.gz")
191-element Vector{String}:
"HG00096"
"HG00097"
"HG00099"
"HG00100"
"HG00101"
"HG00102"
"HG00103"
"HG00104"
"HG00106"
"HG00108"
"HG00109"
"HG00110"
"HG00111"
⋮
"HG00383"
"HG00384"
"HG00403"
"HG00404"
"HG00406"
"HG00407"
"HG00418"
"HG00419"
"HG00421"
"HG00422"
"HG00427"
"HG00428"
However, extracting each SNP's (record) chromosome, SNP position, or REF/ALT alleles, one must loop over the entire VCF file. This is achieved using the optional argument save_snp_info = true
that can be supplied to functions convert_gt
, convert_ht
, and convert_ds
.
@time X, X_sampleID, X_chr, X_pos, X_ids, X_ref, X_alt = convert_gt(Float64,
"test.08Jun17.d8b.vcf.gz", save_snp_info=true)
0.088368 seconds (536.26 k allocations: 43.550 MiB, 9.98% gc time)
(Union{Missing, Float64}[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ["HG00096", "HG00097", "HG00099", "HG00100", "HG00101", "HG00102", "HG00103", "HG00104", "HG00106", "HG00108" … "HG00403", "HG00404", "HG00406", "HG00407", "HG00418", "HG00419", "HG00421", "HG00422", "HG00427", "HG00428"], ["22", "22", "22", "22", "22", "22", "22", "22", "22", "22" … "22", "22", "22", "22", "22", "22", "22", "22", "22", "22"], [20000086, 20000146, 20000199, 20000291, 20000428, 20000683, 20000771, 20000793, 20000810, 20000814 … 20099406, 20099579, 20099654, 20099659, 20099660, 20099674, 20099716, 20099752, 20099891, 20099941], [["rs138720731"], ["rs73387790"], ["rs183293480"], ["rs185807825"], ["rs55902548"], ["rs142720028"], ["rs114690707"], ["rs189842693"], ["rs147349046"], ["rs183154520"] … ["rs41281429"], ["rs145947632"], ["rs9605066"], ["rs142467695"], ["rs74605905"], ["rs145967409"], ["rs139838034"], ["rs73389792"], ["rs1048659"], ["rs113958995"]], ["T", "G", "A", "G", "G", "A", "A", "T", "C", "T" … "G", "CCA", "C", "C", "C", "T", "C", "G", "C", "T"], [["C"], ["A"], ["C"], ["T"], ["T"], ["G"], ["C"], ["C"], ["T"], ["C"] … ["C"], ["C"], ["T"], ["T"], ["T"], ["C"], ["G"], ["T"], ["G"], ["A"]])