Overview
Variant Storage
MongoDB Schema
_id key
The _id key must allow sort results with indexes. To do this, the key must be sortable lexicographically. This key is used in both variant and stage collections.
The key is a concatenation of chromosome, position, reference and alternate separated by colon.
CHR:POS:REF:ALT
Where:
- CHR starts with " " if it's a single number chromosome, to sort 2 digits chromosomes.
- POS has a left padding of 10 positions
- REF and ALT are a SHA1 of the original allele if is bigger than {@link Variant#SV_THRESHOLD}
Example:
| Variant | _id |
|---|---|
| 22 156 A T | 22:.......156:A:T |
| 3 56789 CACA - | .3:.....56789:CACA: |
| X 68432 - GCC | X:.....68432::GCC |
* spaces has been replaced with dots
Stage collection
Variants collection
{
"_id" : "22: 123456:A:T",
"chromosome" : "1",
"start" : 123456,
"end" : 123456,
"reference" : "A",
"alternate" : "T",
"length" : 1,
"type" : "SNV",
"_at" : {
"chunkIds" : [
"22_123_1k",
"22_12_10k"
]
},
"studies" : [
{
"sid" : 3,
"gt": {
"0|1" : [54, 78, 254, 623],
"1|1" : [84, 89, 156],
"?/?" : [110,111,112,113,114,115,116,117,118,119,120]
},
"files" : [
{
"fid" : 4,
"attrs" : {}
}, {
"fid" : 5,
"attrs" : {}
}
]
}
],
"stats" : [ {
"sid" : 3,
"cid": 6,
"maf": 0.00638977624475956,
"mgf": 0,
"mafAl": "T",
"mgfGf": "1|1",
"missAl": 0,
"missGt": 0,
"numGt": {
"0/0" : 562,
"1|1" : 3,
"0|1" : 4,
}
} ],
"annotation" : [ {
"id" : "?",
"ct" : [
{
"so" : [ 1628 ]
} , {
"so" : [ 1566 ]
}
],
"cr_score" : [
{
"sc" : 0.8619999885559082,
"src" : "gerp"
} , {
"sc" : 0.004999999888241291,
"src" : "phastCons"
} , {
"sc" : 0.11299999803304672,
"src" : "phylop"
}
],
"popFq" : [
{
"study" : "1000GENOMES_phase_3",
"pop" : "ALL",
"refFq" : 0.9986000061035156,
"altFq" : 0.0006000000284984708
} , {
"study" : "1000GENOMES_phase_3",
"pop" : "EAS",
"refFq" : 0.9970200061798096,
"altFq" : 0.0029800001066178083
} , {
"study" : "1000GENOMES_phase_3",
"pop" : "EUR",
"refFq" : 0.998009979724884,
"altFq" : 0
}
],
...
} ],
"customAnnotation" : {
}
}