Where can I download the datasets?
The CRyPTIC Consortium datasets are available to download on Zenodo.
They are subject to a permissive CC-BY 4.0 license, hence to cite the datasets please either cite the collection of datasets via doi:10.5281/zenodo.15679730 or cite the exact version you used (details below).
The two CRyPTIC publications that describe the data collection process and methods are
- The CRyPTIC Consortium
A data compendium associating the genomes of 12,289 Mycobacterium tuberculosis isolates with quantitative resistance phenotypes to 13 antibiotics
PLoS Biology 20(8):e3001721 doi:10.1371/journal.pbio.3001721 - The CRyPTIC Consortium
Epidemiological cutoff values for a 96-well broth microdilution plate for high-throughput research antibiotic susceptibility testing of M. tuberculosis.
Eur Resp J 60:2200239 doi:10.1183/13993003.00239-2022
Please cite either or both of these, or the most relevant CRyPTIC publication, as appropriate.
Description of datasets
Listed with most recent and up to date first.
CRyPTIC Release Three
cryptic-tables-v3.4.0, 21 May 2025
The main differences compared to Release Two are
- all sample genetics processed using an updated bioinformatics pipeline
- all sample genetics processed using a research instance of EIT Pathogena
- all samples plate images read using a machine learning model (TMAS) rather than AMyDA (i.e. TMAS supersedes AMyGDA)
- as many remaining CRyPTIC samples whose FASTQ files had not been deposited in the ENA due to upload failures etc have not been deposited
Statistics
- 53,897 samples with genetic information and the result of at least one drug susceptibility test
Citation: doi:10.5281/zenodo.15680920
For more information please see the RELEASE_NOTES.
CRyPTIC Release Two
cryptic-tables-v2.1.2, 23 July 2024
This release used broadly the same process as Release One but collected and processed additional genetic and plate information received after the Data Freeze in April 2020. Due to not all samples being processed and being in the ENA at the time of processing we do not recommend using this dataset but it is provided here for completeness.
The main difference compared to Release One are
- the version of Clockwork was updated to v0.12.4
- the samples in Release On where the FASTQ files were downloaded from the ENA were not processed; most but not all of these have no matching drug susceptibility testing data.
- simplified the table schema
Statistics
- 36,738 samples with genetic information and the result of at least one drug susceptibility test
Citation: doi:10.5281/zenodo.15679886
For more information please see the RELEASE_NOTES.
CRyPTIC Release One
cryptic-tables-v1.1.1, 25 Jan 2021
This is the original dataset that was used in all the primary CRyPTIC publications and therefore the tables provided should be identical to those available via the ENA FTP site.
Statistics
- 41,130 samples with genetic information and the result of at least one drug susceptibility test
Citation: doi:10.5281/zenodo.15680920
Other data
The above data tables summarise the genetic (e.g. mutation in gene) and drug susceptibility testing (e.g. MIC or SUR result) results. Other data are available, including
- FASTQ files; these are available from the NCBI or ENA. Accession numbers are listed in the
WGS_SAMPLES
table in Release Three onwards. - Images of UKMYC plates. These are not yet readily available but are available on request.
- Variant Call Format files. These are not yet readily available but are available on request.