Overview

Variant Storage

MongoDB Schema
_id key

The _id key must allow sort results with indexes. To do this, the key must be sortable lexicographically. This key is used in both variant and stage collections.

The key is a concatenation of chromosome, position, reference and alternate separated by colon.

CHR:POS:REF:ALT

Where:

  • CHR starts with " " if it's a single number chromosome, to sort 2 digits chromosomes.
  • POS has a left padding of 10 positions
  • REF and ALT are a SHA1 of the original allele if is bigger than {@link Variant#SV_THRESHOLD}

Example:

Variant _id
22 156 A T 22:.......156:A:T
3 56789 CACA - .3:.....56789:CACA:
X 68432 - GCC X:.....68432::GCC

* spaces has been replaced with dots

Stage collection
Variants collection
{
  "_id" : "22:    123456:A:T",
  "chromosome" : "1",
  "start" : 123456,
  "end" : 123456,
  "reference" : "A",
  "alternate" : "T",
  "length" : 1,
  "type" : "SNV",
  "_at" : {
    "chunkIds" : [
       "22_123_1k",
       "22_12_10k"
    ]
  },

  "studies" : [
    {
      "sid" : 3,
      "gt": {
        "0|1" : [54, 78, 254, 623],
        "1|1" : [84, 89, 156],
        "?/?" : [110,111,112,113,114,115,116,117,118,119,120]
      },
      "files" : [ 
        { 
           "fid" : 4,
           "attrs" : {}
        }, {
           "fid" : 5,
           "attrs" : {}
        }
      ]
    } 
  ], 
  "stats" : [ {
      "sid" : 3,
      "cid": 6,
      "maf": 0.00638977624475956,
      "mgf": 0,
      "mafAl": "T",
      "mgfGf": "1|1",
      "missAl": 0,
      "missGt": 0,
      "numGt": {
        "0/0" : 562,
        "1|1" : 3,
        "0|1" : 4,
      }
  } ],

  "annotation" : [ {
     "id" : "?",
     "ct" : [
       {
         "so" : [ 1628 ]
       } , {
         "so" : [ 1566 ]
       }
     ],
     "cr_score" : [
       {
         "sc" : 0.8619999885559082,
         "src" : "gerp"
       } , {
         "sc" : 0.004999999888241291,
         "src" : "phastCons"
       } , {
         "sc" : 0.11299999803304672,
         "src" : "phylop"
       }
     ],
     "popFq" : [
       {
         "study" : "1000GENOMES_phase_3",
         "pop" : "ALL",
         "refFq" : 0.9986000061035156,
         "altFq" : 0.0006000000284984708
       } , {
         "study" : "1000GENOMES_phase_3",
         "pop" : "EAS",
         "refFq" : 0.9970200061798096,
         "altFq" : 0.0029800001066178083
       } , {
         "study" : "1000GENOMES_phase_3",
         "pop" : "EUR",
         "refFq" : 0.998009979724884,
         "altFq" : 0
       }
     ],
     ...
  } ],
  "customAnnotation" : {
  }
}
Studies collection
Files collection

Alignment Storage

MongoDB Schema