It is about 20 months as NevGen YDNA Haplogroup Predictor started his life on internet, on www.nevgen.org. During that time, several people, mostly from universities across the world, asked us for desktop version. Also, some of them asked for batch capability, for quick processing of hundreds of haplotypes from academic works of different world populations. So, we decided to come in their way and release free desktop edition of predictor, with capability for Promega and YFiler sets of 23 or 17 markers, since most academic haplotype collections are done with that marker sets, together with batch processing capability.
First edition contains only haplogroup predictor, but soon it should contain several other tools, some for Y-DNA, some for autosomal data (that is why it is called "NevGen Genealogy Tools"). This tools have been written several years ago, some of them before predictor, but time is needed to prepare them for english-speaking world, with better translations and user manual. For now desktop program works in english language, soon we expect to be russian language too, as soon translation is finished.
In future, another levels might be added to predictor's desktop edition, for now is only one available. Into available limited level for 23 markers could be entered other markers, but they shall be ignored.
Update of October 02, 2017. Now is available R1a Level in Predictor. Russian translation of program is here too. And, most important,
"NevGen probability Calculator of Time to Most Recent Common Ancestor" is available to Genetic Genealogy community.
There is no setup code for this, user only has to unpack all contents into the same folder, and to rename "NevGen.e" into "NevGen.exe". Program is then simply started by executing "NevGen.exe". Please keep in mind that any .dat files should be kept in the same directory as NevGen.exe. Otherwise, predictor shall not work.
Here is link to download NevGen Genealogy Tools v1.0.
This edition was helped by and Radimpex Civil Engineering Software.
; This row is comment, because it starts with semicolon sign (";"), and is meaningful only for batch input.
16 13 20 30 13 10 23 11 11 11 14 17 23 24 13 14 10 13 15 14 17 15 12 ; BO73 Bolivia [Native American]
19 13 19 29 14 11 22 13 12 12 15 18 23 24 12 13 10 13 16 11 13 16 12 ; ID Oviedo 269 Asturias, Spain [Spanish]
16 12 22 29 15 10 22 13 12 10 15 19 21 21 11 11 12 15 15 13 16 15 11 ; 62 Basel, Switzerland [Swiss]
Please note that in the same line with haplotype, after optional semicolon sign ";" any comment can be placed, which does not affect calculation, but might be very useful to user who is batch proccessing haplotype set. Without such possibility, work with it could be much harder. Every input row, regardless whether it contains haplotype or it is only comment, is copied into output. This is output of this batch proccesing:
; This row is comment, because it starts with semicolon sign (";"), and is meaningful only for batch input.
1. ----------
16 13 20 30 13 10 23 11 11 11 14 17 23 24 13 14 10 13 15 14 17 15 12 ; BO73 Bolivia [Native American]
Probability = 98.80% Fitness=37.34 [1.34] Q M346>> M3> M848
Probability = 1.19% Fitness=30.83 [0.93] Q M346>> Z780
2. ----------
19 13 19 29 14 11 22 13 12 12 15 18 23 24 12 13 10 13 16 11 13 16 12 ; ID Oviedo 269 Asturias, Spain [Spanish]
Probability = 100.00% Fitness=53.30 [1.31] R1b (for 67+ markers, try level for R1b-s, 300+ subclades)
3. ----------
16 12 22 29 15 10 22 13 12 10 15 19 21 21 11 11 12 15 15 13 16 15 11 ; 62 Basel, Switzerland [Swiss
Probability of unsupported subclade: 93.61%
Warning: Values of fitness (or relative fitness) are rather small, so results are not too confident. It is possible that its haplogroup is not supported by current version of predictor (so called "FALSE POSITIVE"), or haplotype really belongs to some supported haplogroup, but it is rare or too distant branch, which is not sufficiently represented in samples used by predictor.
Probability = 5.87% Fitness=19.63 [0.48] G2a2b1 M406> FGC5081> Z17887
Probability = 0.38% Fitness=18.54 [0.44] G2a2b1 M406> PF3293
Probability = 0.12% Fitness=18.96 [0.39] G2a2b1 M406> FGC5081> L14
Probability = 0.03% Fitness=16.64 [0.40] G2a2b1 M406> M3302
Probability = 0.00% Fitness=14.64 [0.29] G2a2 > L497>> Z725>> L43
Probability = 0.00% Fitness=13.44 [0.44] G2a2a PF3147
Probability = 0.00% Fitness=12.37 [0.38] G2a1-L293
Probability = 0.00% Fitness=11.72 [0.37] G1 M342
Probability = 0.00% Fitness=11.79 [0.25] G2a2 > L497>> Z725>> CTS4803> S2808
Probability = 0.00% Fitness=11.71 [0.26] G2a2 > L497>> Z725>> CTS4803
Probability = 0.00% Fitness=11.26 [0.29] G2a2b2a1c - Z724
Probability = 0.00% Fitness=11.70 [0.30] I2a2b-L38
Output of batch prediction is automaticaly saved in file "Batch prediction.txt", in folder where executable is positioned, in Unicode TXT format. This is how it looks in dialog box of predictor (drawn picture is of last haplotype in batch):
Please be aware that in picture in right side of the dialog box entered STRs are allways displayed in FTDNA order and FTDNA format for GATA H4, regardless of format of entered haplotype. Description of meaning of picture, and many other things (like SNP restriction), can be found here.
Button "Most frequent values" replaces entered haplotype with most frequent values for the first subclade selected in listbox in right upper corner, and redraws it in picture field. It does not display modal values (probable STR values of most recent common ancestor of subclade) for selected subclade, but the most frequent ones. It is not allways the same. It can be useful sometimes, mostly for comparing STR value statistics of different haplogroups (subbranches).
Should be kept in mind that this version ignores any incomplete values of STRs (for example 20.2 is treated like 20). If value of STR is unknown, use $ sign. For example:
16 13 20 30 $ 10 23 11 11 11 14 17 23 24 $ 14 10 13 15 14 17 15 12 ; BO73 Bolivia [Native American].
FTDNA order is the most general format of entering STR data into predictor, because it allows any STR marker to have more than one value, multiple values of the same STR are separated by '-' sign. In order of 23 or 17 markers (Promega and YFiler) having multiple values for any STR other than DYS385 can make problems, because they expect exact 23 (or 17) values in array. For example, having multiple marker value for DYS19 (it is common in some subclades of haplogroup G) would yield longer array, which would not be recognised by predictor. In such cases, when there are markers with multiple values other than DYS385 it is recommended to convert data from Promega and YFiler formats of 23 or 17 markers into more general FTDNA format.
Here is such sample, with values of 15 and 16 on DYS19.
19 14 22 30 15,16 10 21 12 10 10 16 18 24 21 11 11 10 14 18 13 14 15 14 ; E13-265 Basel, Switzerland [Swiss]
Reordering of STR values together with inserting of $ signs for unknown marker values can be easily done in any spreadsheet program. Beware that when converting STR data from Promega and YFiler formats of 23 or 17 markers into FTDNA format value od GATA H4 should be decreased by 1. When sample is reordered in that way, we got:
14 21 15-16 10 13-14 $ $ 11 14 11 30 18 $ $ $ $ 16 22 $ $ $ 13 $ 15 $ 19 18 $ $ 10 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 21 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 12 $ $ $ 10 $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 24 $ 10 ; E13-265 Basel, Switzerland [Swiss]