Convert genotype/haplotype/dosage to numeric arrays

Most often we need to convert genetic data to numeric arrays for statistical analysis.

Example VCF file

We need an example VCF file for demonstation. You can manually download it from link (877KB) and put the file in your current working directory. Or, within Julia,

isfile("test.08Jun17.d8b.vcf.gz") || download("http://faculty.washington.edu/browning/beagle/test.08Jun17.d8b.vcf.gz", 
    joinpath(pwd(), "test.08Jun17.d8b.vcf.gz"))
stat("test.08Jun17.d8b.vcf.gz")
StatStruct(mode=0o100644, size=876514)

The first 5 markers in this VCF file are

using VCFTools

fh = openvcf("test.08Jun17.d8b.vcf.gz", "r")
for l in 1:35
    println(readline(fh))
end
close(fh)
##fileformat=VCFv4.1
##INFO=<ID=LDAF,Number=1,Type=Float,Description="MLE Allele Frequency Accounting for LD">
##INFO=<ID=AVGPOST,Number=1,Type=Float,Description="Average posterior probability from MaCH/Thunder">
##INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from MaCH/Thunder">
##INFO=<ID=ERATE,Number=1,Type=Float,Description="Per-marker Mutation rate from MaCH/Thunder">
##INFO=<ID=THETA,Number=1,Type=Float,Description="Per-marker Transition rate from MaCH/Thunder">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Alternate Allele Count">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Allele Count">
##ALT=<ID=DEL,Description="Deletion">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage from MaCH/Thunder">
##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihoods">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README">
##INFO=<ID=AF,Number=1,Type=Float,Description="Global Allele Frequency based on AC/AN">
##INFO=<ID=AMR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from AMR based on AC/AN">
##INFO=<ID=ASN_AF,Number=1,Type=Float,Description="Allele Frequency for samples from ASN based on AC/AN">
##INFO=<ID=AFR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from AFR based on AC/AN">
##INFO=<ID=EUR_AF,Number=1,Type=Float,Description="Allele Frequency for samples from EUR based on AC/AN">
##INFO=<ID=VT,Number=1,Type=String,Description="indicates what type of variant the line represents">
##INFO=<ID=SNPSOURCE,Number=.,Type=String,Description="indicates if a snp was called when analysing the low coverage or exome alignment data">
##reference=GRCh37
##reference=GRCh37
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00096	HG00097	HG00099	HG00100	HG00101	HG00102	HG00103	HG00104	HG00106	HG00108	HG00109	HG00110	HG00111	HG00112	HG00113	HG00114	HG00116	HG00117	HG00118	HG00119	HG00120	HG00121	HG00122	HG00123	HG00124	HG00125	HG00126	HG00127	HG00128	HG00129	HG00130	HG00131	HG00133	HG00134	HG00135	HG00136	HG00137	HG00138	HG00139	HG00140	HG00141	HG00142	HG00143	HG00146	HG00148	HG00149	HG00150	HG00151	HG00152	HG00154	HG00155	HG00156	HG00158	HG00159	HG00160	HG00171	HG00173	HG00174	HG00176	HG00177	HG00178	HG00179	HG00180	HG00182	HG00183	HG00185	HG00186	HG00187	HG00188	HG00189	HG00190	HG00231	HG00232	HG00233	HG00234	HG00235	HG00236	HG00237	HG00238	HG00239	HG00240	HG00242	HG00243	HG00244	HG00245	HG00246	HG00247	HG00249	HG00250	HG00251	HG00252	HG00253	HG00254	HG00255	HG00256	HG00257	HG00258	HG00259	HG00260	HG00261	HG00262	HG00263	HG00264	HG00265	HG00266	HG00267	HG00268	HG00269	HG00270	HG00271	HG00272	HG00273	HG00274	HG00275	HG00276	HG00277	HG00278	HG00280	HG00281	HG00282	HG00284	HG00285	HG00306	HG00309	HG00310	HG00311	HG00312	HG00313	HG00315	HG00318	HG00319	HG00320	HG00321	HG00323	HG00324	HG00325	HG00326	HG00327	HG00328	HG00329	HG00330	HG00331	HG00332	HG00334	HG00335	HG00336	HG00337	HG00338	HG00339	HG00341	HG00342	HG00343	HG00344	HG00345	HG00346	HG00349	HG00350	HG00351	HG00353	HG00355	HG00356	HG00357	HG00358	HG00359	HG00360	HG00361	HG00362	HG00364	HG00366	HG00367	HG00369	HG00372	HG00373	HG00375	HG00376	HG00377	HG00378	HG00381	HG00382	HG00383	HG00384	HG00403	HG00404	HG00406	HG00407	HG00418	HG00419	HG00421	HG00422	HG00427	HG00428
22	20000086	rs138720731	T	C	100	PASS	AC=7;RSQ=0.8454;AVGPOST=0.9983;AA=T;AN=2184;LDAF=0.0040;THETA=0.0001;VT=SNP;SNPSOURCE=LOWCOV;ERATE=0.0003;AF=0.0032;AFR_AF=0.01	GT:DS:GL	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.04,-1.05,-5.00	0/0:0.000:-0.07,-0.85,-5.00	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.06,-0.87,-5.00	0/0:0.000:-0.03,-1.14,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.23,-0.45,-1.28	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.11,-0.65,-4.40	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.18,-0.47,-2.54	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.01,-1.74,-5.00	0/0:0.000:-0.00,-3.66,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-2.53,-5.00	0/0:0.000:-0.09,-0.73,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-3.11,-5.00	0/0:0.000:-0.06,-0.89,-5.00	0/0:0.000:-0.09,-0.71,-4.10	0/0:0.000:-0.11,-0.65,-4.40	0/0:0.000:-0.18,-0.47,-2.34	0/0:0.000:-0.22,-0.45,-1.32	0/0:0.000:-0.02,-1.29,-5.00	0/0:0.000:-0.03,-1.15,-5.00	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.00,-3.34,-5.00	0/0:0.000:-0.12,-0.61,-3.19	0/0:0.000:-0.11,-0.67,-4.40	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.18,-0.48,-2.15	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.10,-0.67,-3.62	0/0:0.000:-0.03,-1.14,-5.00	0/0:0.000:-0.09,-0.73,-4.40	0/0:0.000:-0.07,-0.84,-4.40	0/0:0.000:-0.18,-0.48,-2.46	0/0:0.000:-0.0292813,-1.18575,-5	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.67,-5.00	0/0:0.000:-0.18,-0.47,-2.40	0/0:0.000:-0.03,-1.25,-5.00	0/0:0.000:-0.11,-0.66,-3.44	0/0:0.000:-0.09,-0.73,-4.70	0/0:0.000:-0.0418663,-1.03687,-4.39794	0/0:0.000:-0.08,-0.79,-3.14	0/0:0.000:-0.00,-2.30,-5.00	0/0:0.000:-0.00,-2.54,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.06,-0.86,-5.00	0/0:0.000:-0.09,-0.71,-4.70	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.01,-1.88,-5.00	0/0:0.000:-0.09,-0.71,-4.70	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.10,-0.67,-4.40	0/0:0.000:-0.01,-1.51,-5.00	0/0:0.000:-0.02,-1.40,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.00,-2.02,-5.00	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.050:-0.18,-0.47,-2.73	0/0:0.000:-0.17,-0.49,-2.97	0/0:0.000:-0.10,-0.68,-4.40	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.12,-0.62,-3.38	0/0:0.000:-0.00,-2.06,-5.00	0/0:0.000:-0.16,-0.51,-2.66	0/0:0.000:-0.11,-0.64,-4.22	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.01,-1.64,-5.00	0/0:0.000:-0.00,-2.85,-5.00	0/0:0.000:-0.02,-1.38,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.0311436,-1.15989,-5	0/0:0.000:-0.36,-0.42,-0.73	0/0:0.000:-0.01,-1.88,-5.00	0/0:0.000:-0.05,-0.92,-5.00	0/0:0.000:-0.03,-1.16,-5.00	0/0:0.000:-0.04,-1.04,-5.00	0/0:0.000:-0.13,-0.59,-5.00	0/0:0.000:-0.02,-1.36,-5.00	0/0:0.000:-0.16,-0.51,-2.36	0/0:0.000:-0.02,-1.31,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.00,-4.40,-5.00	0/0:0.000:-0.03,-1.16,-5.00	0/0:0.000:-0.09,-0.73,-3.70	0/0:0.000:-0.19,-0.47,-1.77	0/0:0.000:-0.00,-3.32,-5.00	0/0:0.000:-0.17,-0.51,-2.00	0/0:0.000:-0.00,-2.17,-5.00	0/0:0.000:-0.00,-2.91,-5.00	0/0:0.000:-0.10,-0.71,-4.10	0/0:0.000:-0.03,-1.12,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.67,-5.00	0/0:0.000:-0.00,-2.09,-5.00	0/0:0.000:-0.04,-1.09,-5.00	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.02,-1.41,-5.00	0/0:0.000:-0.10,-0.69,-3.80	0/0:0.000:-0.01,-1.54,-5.00	0/0:0.000:-0.03,-1.16,-5.00	0/0:0.000:-0.09,-0.73,-4.70	0/0:0.000:-0.09,-0.74,-4.70	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.08,-0.78,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.10,-0.67,-4.40	0/0:0.000:-0.01,-1.71,-5.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.02,-1.26,-5.00	0/0:0.000:-0.04,-1.10,-5.00	0/0:0.000:-0.02,-1.27,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.00,-2.00,-5.00	0/0:0.000:-0.10,-0.67,-4.22	0/0:0.050:-0.18,-0.47,-2.34	0/0:0.000:-0.05,-1.00,-5.00	0/0:0.000:-0.11,-0.65,-3.85	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.08,-0.76,-5.00	0/0:0.000:-0.19,-0.47,-2.14	0/0:0.000:-0.00,-1.99,-5.00	0/0:0.000:-0.18,-0.47,-2.46	0/0:0.000:-0.09,-0.74,-4.40	0/0:0.450:-0.05,-0.94,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.18,-0.47,-2.34	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.02,-1.41,-5.00	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.18,-0.47,-1.95	0/0:0.000:-0.19,-0.46,-2.17	0/0:0.000:-0.03,-1.13,-5.00	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.18,-0.48,-2.23	0/0:0.000:-0.23,-0.45,-1.31	0/0:0.000:-0.11,-0.64,-3.92	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.11,-0.66,-4.22	0/0:0.000:-0.12,-0.61,-2.38	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.40,-0.45,-0.60	0/0:0.000:-0.00,-2.98,-5.00	0/0:0.000:-0.13,-0.59,-2.09	0/0:0.000:-0.02,-1.37,-5.00	0/0:0.000:-0.477139,-0.477113,-0.477113	0/0:0.000:-0.04,-1.10,-5.00	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.01,-1.51,-5.00	0/0:0.000:-0.01,-1.67,-5.00	0/0:0.000:-0.08,-0.75,-4.40	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.12,-0.63,-3.92	0/0:0.000:-0.01,-1.74,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.46,-2.60	0/0:0.000:-0.19,-0.46,-2.62	0/0:0.000:-0.11,-0.65,-4.70	0/0:0.000:-0.11,-0.66,-4.70	0/0:0.050:-0.18,-0.49,-2.04	0/0:0.050:-0.10,-0.67,-4.40	0/0:0.000:-0.01,-1.62,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.23,-0.41,-1.51	0/0:0.000:-0.18,-0.48,-2.18	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.10,-0.68,-4.10	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.18,-0.48,-2.14
22	20000146	rs73387790	G	A	100	PASS	LDAF=0.0169;RSQ=0.9482;THETA=0.0004;AA=G;AN=2184;AVGPOST=0.9972;VT=SNP;SNPSOURCE=LOWCOV;AC=36;ERATE=0.0003;AF=0.02;AFR_AF=0.07;EUR_AF=0.0013	GT:DS:GL	0/0:0.000:-0.00,-2.68,-5.00	0/0:0.000:-0.07,-0.82,-5.00	0/0:0.000:-0.13,-0.60,-3.05	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.18,-0.47,-3.08	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.16,-0.51,-2.63	0/0:0.000:-0.01,-1.76,-5.00	0/0:0.000:-0.10,-0.67,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.11,-0.66,-4.40	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.47,-2.30	0/0:0.000:-0.00,-2.80,-5.00	0/0:0.000:-0.00,-2.02,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.01,-1.76,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.02,-1.40,-5.00	0/0:0.000:-0.00,-2.09,-5.00	0/0:0.000:-0.10,-0.70,-4.10	0/0:0.000:-0.22,-0.46,-1.27	0/0:0.000:-0.18,-0.48,-2.39	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.47,-2.27	0/0:0.000:-0.07,-0.85,-5.00	0/0:0.000:-0.00,-2.53,-5.00	0/0:0.000:-0.00,-2.83,-5.00	0/0:0.000:-0.22,-0.46,-1.24	0/0:0.000:-0.19,-0.46,-2.27	0/0:0.000:-0.10,-0.68,-4.40	0/0:0.000:-0.09,-0.73,-4.22	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.15,-0.55,-2.64	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.08,-0.76,-4.70	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.06,-0.86,-5.00	0/0:0.000:-0.029681,-1.18006,-5	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.48,-2.33	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.00,-2.04,-5.00	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.0843997,-0.755377,-3.00877	0/0:0.000:-0.21,-0.47,-1.33	0/0:0.000:-0.00,-3.22,-5.00	0/0:0.000:-0.00,-3.70,-5.00	0/0:0.000:-0.05,-0.96,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.01,-1.86,-5.00	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.01,-1.51,-5.00	0/0:0.000:-0.03,-1.25,-5.00	0/0:0.000:-0.01,-1.92,-5.00	0/0:0.000:-0.00,-2.58,-5.00	0/0:0.000:-0.06,-0.89,-5.00	0/0:0.000:-0.05,-1.00,-5.00	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.01,-1.72,-5.00	0/0:0.000:-0.00,-2.25,-5.00	0/0:0.000:-0.02,-1.43,-5.00	0/0:0.000:-0.18,-0.48,-2.03	0/0:0.000:-0.18,-0.47,-2.72	0/0:0.000:-0.09,-0.72,-4.70	0/0:0.000:-0.18,-0.47,-2.42	0/0:0.000:-0.19,-0.46,-2.20	0/0:0.000:-0.24,-0.44,-1.19	0/0:0.000:-0.01,-1.75,-5.00	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.19,-0.46,-2.27	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.06,-0.87,-5.00	0/0:0.000:-0.00,-3.40,-5.00	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.01,-1.72,-5.00	0/0:0.000:-0.0148341,-1.47392,-5	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.03,-1.16,-5.00	0/0:0.000:-0.04,-1.03,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.09,-0.73,-4.40	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.02,-1.26,-5.00	0/0:0.000:-0.03,-1.25,-5.00	0/0:0.000:-0.19,-0.45,-2.12	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.05,-0.96,-5.00	0/0:0.000:-0.00,-4.70,-5.00	0/0:0.000:-0.13,-0.58,-3.06	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.17,-0.51,-2.09	0/0:0.000:-0.00,-2.29,-5.00	0/0:0.000:-0.38,-0.43,-0.67	0/0:0.000:-0.00,-2.81,-5.00	0/0:0.000:-0.00,-3.25,-5.00	0/0:0.000:-0.22,-0.46,-1.26	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.01,-1.54,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-2.04,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.10,-0.68,-3.70	0/0:0.000:-0.09,-0.72,-5.00	0/0:0.000:-0.01,-1.77,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.01,-1.51,-5.00	0/0:0.000:-0.16,-0.51,-2.74	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.18,-0.48,-2.26	0/0:0.000:-0.18,-0.48,-2.37	0/0:0.000:-0.18,-0.48,-2.27	0/0:0.000:-0.00,-2.58,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.00,-2.34,-5.00	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.08,-0.78,-4.70	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.18,-0.48,-2.36	0/0:0.000:-0.01,-1.53,-5.00	0/0:0.000:-0.18,-0.48,-2.25	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.09,-0.73,-5.00	0/0:0.000:-0.02,-1.41,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.18,-0.47,-2.41	0/0:0.000:-0.09,-0.73,-4.40	0/0:0.000:-0.00,-2.00,-5.00	0/0:0.000:-0.19,-0.46,-2.19	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.05,-0.98,-5.00	0/0:0.000:-0.18,-0.47,-2.31	0/0:0.000:-0.09,-0.73,-4.10	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.10,-0.67,-3.66	0/0:0.000:-0.12,-0.63,-4.70	0/0:0.000:-0.16,-0.51,-2.28	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.01,-1.75,-5.00	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.10,-0.68,-4.22	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.12,-0.62,-3.08	0/0:0.000:-0.19,-0.45,-2.25	0/0:0.000:-0.01,-1.77,-5.00	0/0:0.000:-0.13,-0.60,-2.15	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.04,-1.05,-5.00	0/0:0.000:-0.19,-0.46,-2.30	0/0:0.000:-0.19,-0.46,-2.28	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.190265,-0.457324,-2.2321	0/0:0.000:-0.23,-0.46,-1.21	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.00,-2.44,-5.00	0/0:0.000:-0.01,-1.73,-5.00	0/0:0.000:-0.01,-1.53,-5.00	0/0:0.000:-0.05,-0.96,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.19,-0.46,-2.62	0/0:0.000:-0.00,-2.25,-5.00	0/0:0.000:-0.19,-0.46,-2.23	0/0:0.000:-0.11,-0.65,-4.70	0/0:0.000:-0.18,-0.46,-2.68	0/0:0.000:-0.11,-0.66,-4.40	0/0:0.000:-0.19,-0.46,-2.43	0/0:0.000:-0.18,-0.49,-2.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.20,-0.45,-2.05	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.10,-0.68,-4.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.10,-0.69,-4.22	0/0:0.000:-0.11,-0.65,-4.40	0/0:0.000:-0.11,-0.67,-3.85
22	20000199	rs183293480	A	C	100	PASS	LDAF=0.0009;THETA=0.0004;AN=2184;AVGPOST=0.9990;VT=SNP;AA=A;RSQ=0.6274;SNPSOURCE=LOWCOV;AC=1;ERATE=0.0003;AF=0.0005;EUR_AF=0.0013	GT:DS:GL	0/0:0.000:-0.00,-2.04,-5.00	0/0:0.000:-0.07,-0.82,-3.47	0/0:0.000:-0.07,-0.83,-5.00	0/0:0.000:-0.03,-1.12,-5.00	0/0:0.000:-0.11,-0.64,-4.10	0/0:0.000:-0.12,-0.62,-3.85	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.01,-1.54,-5.00	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.16,-0.50,-3.30	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.10,-0.70,-5.00	0/0:0.000:-0.19,-0.46,-2.46	0/0:0.000:-0.16,-0.51,-2.67	0/0:0.000:-0.00,-2.57,-5.00	0/0:0.000:-0.00,-2.85,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-2.55,-5.00	0/0:0.000:-0.00,-2.02,-5.00	0/0:0.050:-0.48,-0.48,-0.48	0/0:0.000:-0.23,-0.46,-1.24	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.10,-0.68,-4.40	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.13,-0.61,-2.23	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.26,-0.43,-1.11	0/0:0.000:-0.04,-1.01,-5.00	0/0:0.000:-0.12,-0.62,-3.28	0/0:0.000:-0.00,-2.27,-5.00	0/0:0.000:-0.00,-2.42,-5.00	0/0:0.000:-0.04,-1.08,-5.00	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.04,-1.03,-5.00	0/0:0.000:-0.10,-0.70,-4.10	0/0:0.000:-0.18,-0.48,-2.22	0/0:0.000:-0.02,-1.41,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.14,-0.57,-2.12	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.12,-0.62,-2.28	0/0:0.000:-0.0149779,-1.4698,-5	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.06,-0.89,-5.00	0/0:0.000:-0.39,-0.23,-2.25	0/0:0.000:-0.00,-3.40,-5.00	0/0:0.000:-0.07,-0.86,-4.40	0/0:0.000:-0.18,-0.48,-2.35	0/0:0.000:-0.00967891,-1.65679,-5	0/0:0.000:-0.22,-0.46,-1.24	0/0:0.000:-0.00,-2.27,-5.00	0/0:0.000:-0.00,-3.12,-5.00	0/0:0.000:-0.01,-1.73,-5.00	0/0:0.000:-0.02,-1.39,-5.00	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.09,-0.72,-4.10	0/0:0.000:-0.10,-0.71,-4.70	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.04,-1.11,-5.00	0/0:0.000:-0.03,-1.14,-5.00	0/0:0.000:-0.00,-2.42,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.11,-0.65,-3.85	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.01,-1.72,-5.00	0/0:0.000:-0.00,-2.08,-5.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.16,-0.52,-2.11	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.44,-0.46,-0.54	0/0:0.000:-0.07,-0.83,-5.00	0/0:0.000:-0.16,-0.51,-2.42	0/0:0.000:-0.18,-0.48,-2.30	0/0:0.000:-0.19,-0.46,-2.27	0/0:0.000:-0.06,-0.89,-5.00	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.00,-3.15,-5.00	0/0:0.000:-0.12,-0.61,-4.10	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.00935928,-1.67121,-5	0/0:0.000:-0.18,-0.48,-2.13	0/0:0.000:-0.07,-0.85,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.03,-1.15,-5.00	0/0:0.000:-0.09,-0.71,-4.70	0/0:0.000:-0.19,-0.46,-2.52	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.09,-0.73,-4.10	0/0:0.000:-0.11,-0.64,-4.40	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.19,-0.47,-1.83	0/0:0.000:-0.00,-2.76,-5.00	0/0:0.000:-0.09,-0.73,-3.92	0/0:0.000:-0.15,-0.55,-2.43	0/0:0.000:-0.18,-0.48,-1.89	0/0:0.000:-0.00,-3.18,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.02,-1.43,-5.00	0/0:0.000:-0.22,-0.40,-5.00	0/0:0.000:-0.07,-0.84,-4.70	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.10,-0.69,-3.85	0/0:0.000:-0.01,-1.84,-5.00	0/0:0.000:-0.00,-3.07,-5.00	0/0:0.000:-0.10,-0.69,-3.85	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.10,-0.69,-3.80	0/0:0.000:-0.48,-0.18,-4.10	0/0:0.000:-0.02,-1.47,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.18,-0.48,-2.34	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.48,-2.37	0/0:0.000:-0.10,-0.70,-4.40	0/0:0.000:-0.00,-2.44,-5.00	0/0:0.000:-0.01,-1.78,-5.00	0/0:0.000:-0.00,-2.06,-5.00	0/0:0.000:-0.00,-2.28,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.01,-1.51,-5.00	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.11,-0.64,-4.70	0/0:0.000:-0.19,-0.46,-2.54	0/0:0.000:-0.02,-1.38,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.02,-1.26,-5.00	0/0:0.000:-0.18,-0.47,-2.47	0/0:0.000:-0.16,-0.51,-2.43	0/0:0.000:-0.01,-1.73,-5.00	0/0:0.000:-0.19,-0.46,-2.79	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.18,-0.48,-2.35	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.19,-0.46,-2.72	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.16,-0.51,-2.71	0/0:0.000:-0.01,-1.80,-5.00	0/0:0.000:-0.10,-0.69,-3.66	0/0:0.000:-0.10,-0.69,-4.22	0/0:0.000:-0.23,-0.44,-1.26	0/0:0.000:-0.19,-0.46,-2.12	0/0:0.000:-0.04,-1.07,-5.00	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.46,-2.20	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.11,-0.65,-3.44	0/0:0.000:-0.05,-1.00,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.01,-1.77,-5.00	0/0:0.000:-0.189687,-0.458221,-2.2426	0/0:0.000:-0.01,-1.79,-5.00	0/0:0.050:-0.24,-0.42,-1.39	0/0:0.000:-0.19,-0.45,-4.70	0/0:0.000:-0.00,-2.15,-5.00	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.18,-0.48,-2.23	0/0:0.000:-0.00,-2.11,-5.00	0/0:0.000:-0.11,-0.66,-4.70	0/0:0.000:-0.01,-1.59,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.46,-2.63	0/0:0.000:-0.11,-0.66,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.46,-2.43	0/0:0.000:-0.19,-0.47,-1.84	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.46,-2.21	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.82,-5.00	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.00,-2.26,-5.00	0/0:0.000:-0.10,-0.69,-4.00
22	20000291	rs185807825	G	T	100	PASS	ERATE=0.0005;AVGPOST=0.9983;AA=G;AN=2184;LDAF=0.0015;VT=SNP;SNPSOURCE=LOWCOV;RSQ=0.5564;AC=2;THETA=0.0003;AF=0.0009;ASN_AF=0.0035	GT:DS:GL	0/0:0.000:-0.00,-2.06,-5.00	0/0:0.000:-0.07,-0.83,-5.00	0/0:0.000:-0.02,-1.27,-5.00	0/0:0.000:-0.01,-1.77,-5.00	0/0:0.000:-0.19,-0.45,-2.14	0/0:0.000:-0.11,-0.66,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/0:0.000:-0.02,-1.33,-5.00	0/0:0.000:-0.18,-0.48,-2.43	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.18,-0.46,-2.76	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.03,-1.15,-5.00	0/0:0.000:-0.07,-0.83,-4.40	0/0:0.000:-0.02,-1.33,-5.00	0/0:0.000:-0.10,-0.68,-4.40	0/0:0.000:-0.00,-2.28,-5.00	0/0:0.000:-0.01,-1.83,-5.00	0/0:0.000:-0.13,-0.61,-2.24	0/0:0.000:-0.00,-3.70,-5.00	0/0:0.000:-0.00,-2.21,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.22,-0.46,-1.25	0/0:0.000:-0.00,-2.81,-5.00	0/0:0.000:-0.02,-1.44,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.08,-0.79,-3.21	0/0:0.000:-0.01,-1.72,-5.00	0/0:0.000:-0.21,-0.46,-1.42	0/0:0.000:-0.08,-0.79,-4.40	0/0:0.000:-0.04,-1.06,-5.00	0/0:0.000:-0.02,-1.42,-5.00	0/0:0.000:-0.00,-3.85,-5.00	0/0:0.000:-0.07,-0.83,-5.00	0/0:0.000:-0.01,-1.80,-5.00	0/0:0.000:-0.11,-0.66,-4.40	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.19,-0.46,-2.54	0/0:0.000:-0.03,-1.22,-5.00	0/0:0.000:-0.05,-1.00,-5.00	0/0:0.000:-0.13,-0.60,-2.19	0/0:0.000:-0.0021071,-2.31515,-5	0/0:0.000:-0.10,-0.68,-4.40	0/0:0.000:-0.10,-0.68,-4.22	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.00,-3.38,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.20,-0.45,-1.99	0/0:0.000:-0.0193877,-1.35992,-5	0/0:0.000:-0.02,-1.28,-5.00	0/0:0.000:-0.00,-2.46,-5.00	0/0:0.000:-0.00,-2.75,-5.00	0/0:0.000:-0.18,-0.47,-2.36	0/0:0.000:-0.10,-0.68,-4.00	0/0:0.000:-0.02,-1.34,-5.00	0/0:0.000:-0.00,-3.27,-5.00	0/0:0.250:-0.03,-1.19,-5.00	0/0:0.000:-0.07,-0.84,-5.00	0/0:0.000:-0.01,-1.52,-5.00	0/0:0.000:-0.11,-0.65,-3.40	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.00,-1.99,-5.00	0/0:0.000:-0.01,-1.61,-5.00	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.10,-0.69,-4.22	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.01,-1.59,-5.00	0/0:0.000:-0.10,-0.69,-3.36	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.08,-0.77,-4.10	0/0:0.000:-0.05,-1.00,-5.00	0/0:0.000:-0.10,-0.67,-3.80	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.06,-0.86,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.11,-0.66,-4.70	0/0:0.000:-0.00,-2.68,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.01,-1.84,-5.00	0/0:0.000:-0.0293556,-1.18469,-5	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.19,-0.46,-2.37	0/0:0.000:-0.10,-0.68,-4.10	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.18,-0.48,-2.06	0/0:0.000:-0.06,-0.87,-5.00	0/0:0.000:-0.03,-1.13,-5.00	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.00,-3.85,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.07,-0.83,-4.40	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-2.82,-5.00	0/0:0.000:-0.08,-0.76,-3.80	0/0:0.000:-0.00,-3.47,-5.00	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.11,-0.65,-3.59	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.02,-1.43,-5.00	0/0:0.000:-0.00,-2.73,-5.00	0/0:0.000:-0.06,-0.86,-4.22	0/0:0.000:-0.14,-0.57,-2.84	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.01,-1.76,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.01,-1.79,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.01,-1.87,-5.00	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.08,-0.79,-5.00	0/0:0.000:-0.06,-0.91,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.00,-2.89,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.20,-0.44,-2.07	0/0:0.000:-0.00,-2.65,-5.00	0/0:0.000:-0.02,-1.27,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.01,-1.79,-5.00	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.35,-0.41,-0.78	0/0:0.000:-0.09,-0.75,-3.70	0/0:0.000:-0.05,-0.98,-5.00	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.08,-0.77,-5.00	0/0:0.000:-0.04,-1.03,-5.00	0/0:0.000:-0.02,-1.47,-5.00	0/0:0.000:-0.03,-1.16,-5.00	0/0:0.000:-0.01,-1.57,-5.00	0/0:0.000:-0.06,-0.89,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.00,-2.30,-5.00	0/0:0.000:-0.05,-0.98,-5.00	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.06,-0.87,-5.00	0/0:0.000:-0.16,-0.51,-2.37	0/0:0.000:-0.19,-0.46,-2.23	0/0:0.050:-0.27,-0.34,-3.22	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.08,-0.77,-4.40	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.07,-0.82,-5.00	0/0:0.000:-0.12,-0.61,-2.91	0/0:0.000:-0.00,-1.98,-5.00	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.07,-0.82,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.12,-0.61,-3.15	0/0:0.000:-0.00,-2.08,-5.00	0/0:0.000:-0.02,-1.33,-5.00	0/0:0.000:-0.01,-1.74,-5.00	0/0:0.050:-0.11227,-0.642561,-4.22185	0/0:0.000:-0.22,-0.46,-1.28	0/0:0.000:-0.10,-0.70,-4.10	0/0:0.000:-0.10,-0.67,-3.62	0/0:0.000:-0.03,-1.18,-5.00	0/0:0.000:-0.03,-1.24,-5.00	0/0:0.000:-0.10,-0.70,-4.10	0/0:0.000:-0.01,-1.80,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.02,-1.47,-5.00	0/0:0.000:-0.06,-0.88,-5.00	0/0:0.000:-0.03,-1.13,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.46,-2.69	0/0:0.000:-0.03,-1.13,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.18,-0.48,-2.41	0/0:0.000:-0.18,-0.47,-2.85	0/0:0.050:-0.48,-0.48,-0.48	0/0:0.000:-0.08,-0.76,-4.70	0/0:0.000:-0.10,-0.69,-4.00	0/0:0.000:-0.11,-0.65,-3.62	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.70,-5.00	0/0:0.000:-0.00,-2.57,-5.00
22	20000428	rs55902548	G	T	100	PASS	AC=323;AVGPOST=0.9983;AA=G;AN=2184;VT=SNP;RSQ=0.9949;LDAF=0.1473;SNPSOURCE=LOWCOV;ERATE=0.0003;THETA=0.0003;AF=0.15;ASN_AF=0.0017;AMR_AF=0.15;AFR_AF=0.31;EUR_AF=0.15	GT:DS:GL	1/0:1.000:-5.00,0.00,-5.00	0/0:0.000:-0.35,-0.43,-0.73	0/1:1.000:-1.81,-0.01,-2.95	0/0:0.000:-0.01,-1.79,-5.00	0/0:0.000:-0.06,-0.86,-5.00	1/0:1.000:-0.19,-0.46,-2.18	0/0:0.000:-0.10,-0.68,-5.00	0/1:1.000:-4.40,-0.03,-1.12	0/1:1.000:-5.00,-0.69,-0.10	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.48,-0.48,-0.48	0/1:1.000:-5.00,-0.01,-1.77	0/0:0.000:-0.18,-0.48,-2.57	0/0:0.000:-0.02,-1.31,-5.00	0/0:0.000:-0.11,-0.65,-4.70	0/0:0.000:-0.10,-0.68,-4.70	0/0:0.000:-0.01,-1.72,-5.00	1/0:1.000:-5.00,0.00,-5.00	1/0:1.000:-1.38,-0.02,-2.61	0/1:1.000:-5.00,-1.40,-0.02	0/0:0.000:-0.00,-2.97,-5.00	0/0:0.000:-0.19,-0.47,-2.15	0/0:0.000:-0.44,-0.46,-0.54	0/0:0.000:-0.00,-2.52,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.01,-1.77,-5.00	0/1:0.750:-0.22,-0.46,-1.26	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.21,-0.46,-1.42	0/0:0.000:-0.19,-0.45,-2.20	0/0:0.000:-0.12,-0.63,-3.36	0/0:0.000:-0.00,-2.54,-5.00	0/0:0.000:-0.00,-2.88,-5.00	1/0:1.000:-2.10,-0.00,-4.00	1/0:1.250:-5.00,-1.62,-0.01	0/1:1.000:-3.06,-0.47,-0.18	0/1:1.000:-5.00,-0.87,-0.06	1/1:2.000:-5.00,-0.84,-0.07	0/0:0.000:-0.05,-0.95,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.14,-0.58,-2.17	0/0:0.000:-0.01,-1.77,-5.00	0/0:0.000:-0.06,-0.91,-5.00	1/0:1.000:-5,0,-5	0/0:0.000:-0.05,-0.95,-5.00	1/0:1.000:-4.70,-0.70,-0.10	0/0:0.000:-0.09,-0.72,-4.70	0/0:0.000:-0.01,-1.54,-5.00	1/1:2.000:-5.00,-0.68,-0.10	0/0:0.000:-0.18,-0.48,-2.33	0/0:0.000:-0.00465443,-1.97224,-5	0/0:0.000:-0.48,-0.48,-0.48	1/0:1.000:-5.00,-0.93,-0.05	0/0:0.000:-0.00,-2.82,-5.00	1/1:2.000:-5.00,-1.67,-0.01	0/0:0.000:-0.05,-0.95,-5.00	1/0:1.000:-5.00,-0.87,-0.06	0/0:0.000:-0.01,-1.78,-5.00	0/0:0.000:-0.01,-1.83,-5.00	0/1:1.000:-5.00,-0.00,-3.70	0/1:0.950:-0.15,-0.53,-2.39	0/0:0.000:-0.09,-0.71,-4.70	0/0:0.000:-0.18,-0.48,-2.56	0/0:0.000:-0.00,-2.60,-5.00	0/0:0.000:-0.01,-1.49,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.04,-1.09,-5.00	0/0:0.000:-0.00,-2.40,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.01,-1.54,-5.00	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.01,-1.84,-5.00	0/1:1.000:-2.39,-0.30,-0.31	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.11,-0.64,-3.44	0/0:0.000:-0.02,-1.42,-5.00	0/0:0.000:-0.01,-1.76,-5.00	0/0:0.000:-0.03,-1.17,-5.00	0/1:1.000:-5.00,-0.03,-1.18	0/0:0.000:-0.03,-1.21,-5.00	0/0:0.000:-0.10,-0.70,-5.00	0/1:1.000:-5.00,0.00,-5.00	0/0:0.000:-0.01,-1.58,-5.00	0/1:1.000:-5.00,0.00,-5.00	1/0:1.000:-5,-5.21375e-05,-3.92082	0/0:0.000:-0.05,-0.92,-5.00	0/0:0.000:-0.02,-1.45,-5.00	0/1:1.000:-5.00,-0.00,-4.22	0/0:0.000:-0.09,-0.73,-4.22	0/0:0.000:-0.00,-2.03,-5.00	0/0:0.000:-0.03,-1.20,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.18,-0.48,-2.03	0/0:0.000:-0.01,-1.59,-5.00	0/0:0.000:-0.37,-0.42,-0.71	0/0:0.000:-0.02,-1.45,-5.00	0/0:0.000:-0.00,-3.36,-5.00	1/0:1.000:-3.85,-0.05,-0.94	1/0:1.000:-5.00,-0.84,-0.07	1/0:1.000:-0.06,-0.90,-5.00	0/0:0.000:-0.00,-2.56,-5.00	0/0:0.000:-0.19,-0.47,-1.73	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.00,-3.74,-5.00	0/0:0.000:-0.22,-0.46,-1.26	1/0:1.000:-5.00,-0.00,-3.92	0/0:0.000:-0.10,-0.69,-3.85	0/0:0.000:-0.00,-2.76,-5.00	0/0:0.000:-0.01,-1.70,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.06,-0.92,-5.00	0/0:0.000:-0.01,-1.73,-5.00	0/0:0.000:-0.05,-1.00,-5.00	1/0:1.000:-2.32,-0.01,-1.58	0/1:0.950:-0.09,-0.73,-4.70	0/1:1.000:-2.25,-0.02,-1.52	0/0:0.000:-0.01,-1.50,-5.00	0/0:0.000:-0.10,-0.70,-4.40	0/0:0.000:-0.05,-0.99,-5.00	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.03,-1.25,-5.00	0/0:0.000:-0.10,-0.67,-4.10	0/1:1.000:-5.00,-0.00,-4.22	0/0:0.000:-0.18,-0.48,-2.02	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.00,-4.70,-5.00	0/0:0.000:-0.03,-1.23,-5.00	0/0:0.000:-0.05,-0.93,-5.00	0/0:0.000:-0.03,-1.20,-5.00	1/0:1.000:-0.37,-0.24,-2.87	0/0:0.000:-0.05,-0.94,-5.00	0/0:0.000:-0.18,-0.48,-2.36	0/0:0.000:-0.18,-0.48,-1.83	0/0:0.000:-0.18,-0.48,-2.23	0/0:0.000:-0.01,-1.78,-5.00	0/0:0.000:-0.06,-0.90,-5.00	0/0:0.000:-0.07,-0.82,-5.00	0/0:0.000:-0.10,-0.70,-4.70	0/0:0.000:-0.01,-1.48,-5.00	0/0:0.000:-0.02,-1.46,-5.00	0/0:0.000:-0.18,-0.48,-2.11	0/0:0.000:-0.10,-0.69,-4.70	0/0:0.000:-0.18,-0.47,-2.84	0/0:0.000:-0.03,-1.21,-5.00	1/0:1.000:-4.40,-0.00,-4.70	0/1:0.800:-0.03,-1.20,-5.00	0/0:0.000:-0.05,-0.98,-5.00	0/0:0.000:-0.10,-0.68,-3.70	0/0:0.000:-0.10,-0.67,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.10,-0.68,-3.59	0/0:0.000:-0.02,-1.26,-5.00	0/0:0.000:-0.48,-0.48,-0.48	0/0:0.000:-0.01,-1.80,-5.00	1/0:1.000:-5.00,-0.66,-0.11	0/0:0.000:-0.12,-0.61,-2.49	1/1:2.000:-5.00,-1.30,-0.02	0/0:0.000:-0.04,-1.07,-5.00	0/0:0.000:-0.18,-0.48,-2.19	0/0:0.000:-0.07,-0.85,-5.00	1/0:1.000:-2.07,-0.04,-1.10	0/1:1.000:-0.09,-0.74,-4.70	0/1:1.000:-5.00,-0.00,-2.47	0/0:0.000:-0.02,-1.26,-5.00	1/0:1.000:-2.25,-0.01,-1.55	0/0:0.000:-0.00,-3.17,-5.00	0/0:0.000:-0.203967,-0.438136,-1.99396	0/0:0.000:-0.12,-0.62,-3.18	0/0:0.000:-0.03,-1.19,-5.00	0/0:0.000:-0.01,-1.66,-5.00	1/0:1.000:-5.00,-0.00,-3.74	1/0:1.000:-1.43,-0.02,-4.70	0/0:0.000:-0.05,-0.96,-5.00	0/0:0.000:-0.09,-0.74,-4.70	0/0:0.000:-0.48,-0.48,-0.48	1/0:1.000:-5.00,-0.18,-0.47	0/0:0.000:-0.17,-0.50,-2.70	0/1:1.000:-5.00,-0.00,-3.80	0/1:1.000:-2.22,-0.01,-1.97	0/1:1.000:-5.00,-0.67,-0.10	0/0:0.000:-0.21,-0.43,-2.01	0/0:0.000:-0.19,-0.47,-1.83	0/0:0.000:-0.10,-0.69,-4.40	0/0:0.000:-0.11,-0.64,-4.10	0/0:0.000:-0.18,-0.47,-2.80	0/0:0.000:-0.01,-1.47,-5.00	0/0:0.000:-0.22,-0.43,-1.61	0/0:0.000:-0.02,-1.47,-5.00	0/0:0.000:-0.01,-1.70,-5.00	0/0:0.000:-0.05,-0.97,-5.00	0/0:0.000:-0.02,-1.28,-5.00

We'd like to convert the genotype data (GT field) into the matrix of minor alleles.

Genetic model

There are differnt SNP models. The additive SNP model essentially counts the number of minor allele (0, 1 or 2) per genotype. Other SNP models are dominant and recessive, both in terms of the minor allele. VCFTools.jl always assume the ALT allele in the VCF file is the minor allele. Thus genotypes are translated to real number according to

GenotypeVCF GTmodel=:additivemodel=:dominantmodel=:recessive
ALT, ALT1/1, 1$\vert$1211
REF, ALT0/1, 0$\vert$1110
REF, REF0/0, 0$\vert$0000
missing./., .$\vert$.missingmissingmissing

To properly record the missing genotypes, VCFTools convert VCF GT data to matrix A where element type of A is either a numeric number, or missing value. In Julia, this means eltype(A) <: Union{Missing, Real} where <: means "is a subtype".

Convert GT data into a numeric genotype matrix

Convert GT data in VCF file test.08Jun17.d8b.vcf.gz to a Matrix{Union{Missing, Int8}}. VCFTools.jl will copy the 0s and 1s of the file directly into A without checking if ALT or REF is the minor allele.

@time A = convert_gt(Int8, "test.08Jun17.d8b.vcf.gz"; model = :additive, impute = false, center = false, scale = false)
  1.760590 seconds (2.06 M allocations: 118.037 MiB, 1.18% gc time, 95.14% compilation time)





191×1356 Matrix{Union{Missing, Int8}}:
 0  0  0  0  1  0  0  0  0  0  0  0  2  …  0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  2     0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  2  …  0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  1  1  0
 0  0  0  0  1  0  0  0  0  0  0  0  2     0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2  …  0  0  1  0  1  0  0  0  0  1  1  0
 0  0  0  0  1  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  1  1  0
 ⋮              ⋮              ⋮        ⋱     ⋮              ⋮              ⋮
 0  0  0  0  1  0  0  0  0  0  0  0  2     0  0  1  0  0  0  0  0  0  1  1  0
 0  0  0  0  0  0  0  0  0  0  0  0  2  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  2     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  1  0  2  …  0  0  0  0  0  0  0  0  0  0  0  0

We can also optionally impute missing genotypes according to allele frequency, centers the dosages around 2MAF, and scales the dosages by sqrt(2MAF*(1-MAF)).

@time A = convert_gt(Float64, "test.08Jun17.d8b.vcf.gz"; model = :additive, impute = true, center = true, scale = true)
  0.148316 seconds (523.76 k allocations: 42.592 MiB, 4.85% gc time, 47.83% compilation time)





191×1356 Matrix{Union{Missing, Float64}}:
 0.0  0.0  0.0  0.0   1.41301   0.0  …  0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0   1.41301   0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0   1.41301   0.0  …  0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0   2.36899    2.36899   0.0
 0.0  0.0  0.0  0.0   1.41301   0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0   1.41301   0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0  …  0.0  0.0   2.36899    2.36899   0.0
 0.0  0.0  0.0  0.0   1.41301   0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0   2.36899    2.36899   0.0
 ⋮                              ⋮    ⋱                                  ⋮
 0.0  0.0  0.0  0.0   1.41301   0.0     0.0  0.0   2.36899    2.36899   0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0  …  0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0  …  0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0     0.0  0.0  -0.390016  -0.390016  0.0
 0.0  0.0  0.0  0.0  -0.586138  0.0  …  0.0  0.0  -0.390016  -0.390016  0.0

Convert GT data into haplotypes panels

In certain applications such as genotype imputation, the genotypes are phased. In this case, the alleles are separated by a '|' so we can distinguish heterozygote genotypes (1|0 vs 0|1). Haplotypes can be imported via convert_ht

@time H = convert_ht(Int8, "test.08Jun17.d8b.vcf.gz")
  0.160939 seconds (732.86 k allocations: 52.687 MiB, 5.00% gc time, 50.10% compilation time)





382×1356 Matrix{Int8}:
 0  0  0  0  1  0  0  0  0  0  0  0  1  …  0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  1  …  0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0  0  0  1  …  0  0  0  0  1  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 ⋮              ⋮              ⋮        ⋱     ⋮              ⋮              ⋮
 0  0  0  0  0  0  0  0  0  0  0  0  1  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  1  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  1  0  1     0  0  0  0  0  0  0  0  0  0  0  0

Note there are 382 rows to represent the haplotypes of 191 samples. The first sample occupies row 1 & 2, the second sample occupies 3&4...etc.

Note

VCFTools.jl does not check if the allele separators are '|' when you call convert_ht. One must be cautious to run this function only on phased data.

Convert DS data into dosages

Often data contains dosage information. In this case, matrix values can be any number between 0 and 2. One can similarly import dosage into a numeric matrix.

@time D = convert_ds(Float64, "test.08Jun17.d8b.vcf.gz"; key="DS", impute=false, center=false, scale=false)
  0.204783 seconds (1.56 M allocations: 121.663 MiB, 10.37% gc time, 31.82% compilation time)





191×1356 Matrix{Union{Missing, Float64}}:
 0.0   0.0  0.0  0.0   1.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   1.0  0.0  0.0     0.0  0.0  0.0  0.0  0.05  0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   1.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  1.0   1.0  0.0
 0.0   0.0  0.0  0.0   1.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   1.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  1.0   1.0  0.0
 0.0   0.0  0.0  0.0   1.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  1.0   1.0  0.0
 ⋮                          ⋮         ⋱       ⋮                         ⋮
 0.0   0.0  0.0  0.0   1.0  0.0  0.0     0.0  0.0  0.0  0.0  1.0   1.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.05  0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.05  0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.05  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0   0.0  0.0
 0.0   0.0  0.0  0.0   0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0   0.0  0.0

Extract data marker-by-maker or window-by-window

Large VCF files easily generate numeric arrays that cannot fit into computer memory. Many analyses only need to loop over markers or sets of markers. Previous functions for importing genotypes/haplotypes/dosages have equivalent functions to achieve this:

  • copy_gt! loops over genotypes
  • copy_ht! loops over haplotypes
  • copy_ds! loops over dosages

For example, to loop over all genotype markers in the VCF file test.08Jun17.d8b.vcf.gz:

using VariantCallFormat

# initialize VCF reader
people, snps = nsamples("test.08Jun17.d8b.vcf.gz"), nrecords("test.08Jun17.d8b.vcf.gz")
reader = VCF.Reader(openvcf("test.08Jun17.d8b.vcf.gz"))
# pre-allocate vector for marker data
g = zeros(Union{Missing, Float64}, people)
for j = 1:snps
    copy_gt!(g, reader; model = :additive, impute = true, center = true, scale = true)
    # do statistical anlaysis
end
close(reader)

To loop over markers in windows of size 25:

# initialize VCF reader
people, snps = nsamples("test.08Jun17.d8b.vcf.gz"), nrecords("test.08Jun17.d8b.vcf.gz")
reader = VCF.Reader(openvcf("test.08Jun17.d8b.vcf.gz"))
# pre-allocate matrix for marker data
windowsize = 25
g = zeros(Union{Missing, Float64}, people, windowsize)
nwindows = ceil(Int, snps / windowsize)
for j = 1:nwindows
    copy_gt!(g, reader; model = :additive, impute = true, center = true, scale = true)
    # do statistical anlaysis
end
close(reader)
┌ Warning: Reached end of reader; columns 7-25 are set to missing values
└ @ VCFTools /Users/biona001/.julia/dev/VCFTools/src/convert.jl:72

As the warning suggests, the last window has less than 25 markers. The remaining columns in the matrix g are set to missing values.

Sample ID, Chromosome, SNP position, REF/ALT alleles

To extract sample ID without looping over the entire VCF file, you can do

ids = sampleID("test.08Jun17.d8b.vcf.gz")
191-element Vector{String}:
 "HG00096"
 "HG00097"
 "HG00099"
 "HG00100"
 "HG00101"
 "HG00102"
 "HG00103"
 "HG00104"
 "HG00106"
 "HG00108"
 "HG00109"
 "HG00110"
 "HG00111"
 ⋮
 "HG00383"
 "HG00384"
 "HG00403"
 "HG00404"
 "HG00406"
 "HG00407"
 "HG00418"
 "HG00419"
 "HG00421"
 "HG00422"
 "HG00427"
 "HG00428"

However, extracting each SNP's (record) chromosome, SNP position, or REF/ALT alleles, one must loop over the entire VCF file. This is achieved using the optional argument save_snp_info = true that can be supplied to functions convert_gt, convert_ht, and convert_ds.

@time X, X_sampleID, X_chr, X_pos, X_ids, X_ref, X_alt = convert_gt(Float64, 
    "test.08Jun17.d8b.vcf.gz", save_snp_info=true)
  0.088368 seconds (536.26 k allocations: 43.550 MiB, 9.98% gc time)





(Union{Missing, Float64}[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], ["HG00096", "HG00097", "HG00099", "HG00100", "HG00101", "HG00102", "HG00103", "HG00104", "HG00106", "HG00108"  …  "HG00403", "HG00404", "HG00406", "HG00407", "HG00418", "HG00419", "HG00421", "HG00422", "HG00427", "HG00428"], ["22", "22", "22", "22", "22", "22", "22", "22", "22", "22"  …  "22", "22", "22", "22", "22", "22", "22", "22", "22", "22"], [20000086, 20000146, 20000199, 20000291, 20000428, 20000683, 20000771, 20000793, 20000810, 20000814  …  20099406, 20099579, 20099654, 20099659, 20099660, 20099674, 20099716, 20099752, 20099891, 20099941], [["rs138720731"], ["rs73387790"], ["rs183293480"], ["rs185807825"], ["rs55902548"], ["rs142720028"], ["rs114690707"], ["rs189842693"], ["rs147349046"], ["rs183154520"]  …  ["rs41281429"], ["rs145947632"], ["rs9605066"], ["rs142467695"], ["rs74605905"], ["rs145967409"], ["rs139838034"], ["rs73389792"], ["rs1048659"], ["rs113958995"]], ["T", "G", "A", "G", "G", "A", "A", "T", "C", "T"  …  "G", "CCA", "C", "C", "C", "T", "C", "G", "C", "T"], [["C"], ["A"], ["C"], ["T"], ["T"], ["G"], ["C"], ["C"], ["T"], ["C"]  …  ["C"], ["C"], ["T"], ["T"], ["T"], ["C"], ["G"], ["T"], ["G"], ["A"]])