Datasets

Where can I download the datasets?

The CRyPTIC Consortium datasets are available to download on Zenodo.

They are subject to a permissive CC-BY 4.0 license, hence to cite the datasets please either cite the collection of datasets via doi:10.5281/zenodo.15679730 or cite the exact version you used (details below).

The two CRyPTIC publications that describe the data collection process and methods are

  1. The CRyPTIC Consortium
    A data compendium associating the genomes of 12,289 Mycobacterium tuberculosis isolates with quantitative resistance phenotypes to 13 antibiotics
    PLoS Biology 20(8):e3001721 doi:10.1371/journal.pbio.3001721
  2. The CRyPTIC Consortium
    Epidemiological cutoff values for a 96-well broth microdilution plate for high-throughput research antibiotic susceptibility testing of M. tuberculosis.
    Eur Resp J
     60:2200239 doi:10.1183/13993003.00239-2022

Please cite either or both of these, or the most relevant CRyPTIC publication, as appropriate.

Description of datasets

Listed with most recent and up to date first.

CRyPTIC Release Three

cryptic-tables-v3.4.0, 21 May 2025

The main differences compared to Release Two are

  • all sample genetics processed using an updated bioinformatics pipeline
  • all sample genetics processed using a research instance of EIT Pathogena
  • all samples plate images read using a machine learning model (TMAS) rather than AMyDA (i.e. TMAS supersedes AMyGDA)
  • as many remaining CRyPTIC samples whose FASTQ files had not been deposited in the ENA due to upload failures etc have not been deposited

Statistics

  • 53,897 samples with genetic information and the result of at least one drug susceptibility test

Citation: doi:10.5281/zenodo.15680920

For more information please see the RELEASE_NOTES.

CRyPTIC Release Two

cryptic-tables-v2.1.2, 23 July 2024

This release used broadly the same process as Release One but collected and processed additional genetic and plate information received after the Data Freeze in April 2020. Due to not all samples being processed and being in the ENA at the time of processing we do not recommend using this dataset but it is provided here for completeness.

The main difference compared to Release One are

  • the version of Clockwork was updated to v0.12.4
  • the samples in Release On where the FASTQ files were downloaded from the ENA were not processed; most but not all of these have no matching drug susceptibility testing data.
  • simplified the table schema

Statistics

  • 36,738 samples with genetic information and the result of at least one drug susceptibility test

Citation: doi:10.5281/zenodo.15679886

For more information please see the RELEASE_NOTES.

CRyPTIC Release One

cryptic-tables-v1.1.1, 25 Jan 2021

This is the original dataset that was used in all the primary CRyPTIC publications and therefore the tables provided should be identical to those available via the ENA FTP site.

Statistics

  • 41,130 samples with genetic information and the result of at least one drug susceptibility test

Citation: doi:10.5281/zenodo.15680920

Other data

The above data tables summarise the genetic (e.g. mutation in gene) and drug susceptibility testing (e.g. MIC or SUR result) results. Other data are available, including

  • FASTQ files; these are available from the NCBI or ENA. Accession numbers are listed in the WGS_SAMPLES table in Release Three onwards.
  • Images of UKMYC plates. These are not yet readily available but are available on request.
  • Variant Call Format files. These are not yet readily available but are available on request.