With the availability of large preclinical datasets on cancer drug sensitivity and gene essentiality, computational biology models for predicting cancer sensitivity are gaining popularity. However, comparing these models proves to be a challenging task, as there are numerous published models and methods available, making it difficult to conduct meaningful comparisons without reproducing them on your own data.
Armed with the experience of benchmarking our own models at Turbine, we publish the Turbine Benchmark Suite. This carefully composed benchmark set focuses on models’ ability to identify biologically applicable predictions. While this benchmark set is not entirely foolproof and can potentially be overfit with sufficient attempts, we have made substantial efforts to ensure its resilience.
Our approach revolves around three key principles:
By adhering to these principles, our aim is to provide a benchmark that facilitates fair and meaningful comparisons of computational biology models in predicting cancer sensitivity. We encourage researchers to develop robust and selective predictors that transcend the limitations of bias and demonstrate their utility in real-world scenarios.
This data is not intended to be a competition set, it is designed to be a resource for your own projects. To make it easy to use, we have made the test data publicly available. At the same time this means that anyone attempting to misuse the data could potentially overfit it with enough attempts.
You can access the train/test data from the Releases section. The splits and target metrics are provided in separate JSON files, categorized into “ko” for gene essentiality and “drug” for drug sensitivity.
For gene essentiality predictions, we have created the following splits based on DepMap data, (https://depmap.org/portal/, https://www.nature.com/articles/ng.3984):
In our benchmark, we have utilized only a subset of gene essentiality data (a subset of genes) in order to keep the dataset more balanced. However, we provide the EXT_GEX and EXT_AEX splits to explore performance on the genome-wide DepMap data.
Each JSON file contains the following information about the samples:
Similarly, for drug sensitivity prediction, we have created the following splits based on GDSC2 data (https://www.cancerrxgene.org/, https://www.cell.com/cell/fulltext/S0092-8674(16)30746-2):
Each JSON file contains the following information about the samples:
We have three different target metrics:
We did this splitting exercise 3 times in order to allow gauging the dependence of each model on a specific given test set. The file behind the “all train/test sets” button contains all 3 “split variants”. The primary test set is the test set of split 0.
Evaluation scripts:
You can use our evaluation metrics and bias detector to correctly evaluate your predictions.
The downloadable zip file includes example files (example.json for targets and example.npy for predictions), precalculated cell and perturbation biases and two notebooks to run the evaluation scripts (eval_script.ipynb) and the bias detector (bias-detector.ipynb).
Q: I have a set of models trained separately for each drug / KO! Can I use this set?
A: Of course! Just leave the PEX and AEX splits out – the CEX results will still be valid.
Q: Should I run all split variants?
A: It makes sense to train/test all split variants once, so you can ensure you’re not overfitting any specific split. But generally, results on split 0 are fine on their own – the downloadable primary test set is actually just split 0’s test.
Q: Where are the rest of the genes?
A: We’ve only included genes for which we could generate node2vec embeddings from Omnipath data.
Planned for future versions:
The evaluation scripts and sample models are released under a CC-BY-SA 4.0 license. In a nutshell, feel free to use them in your projects – even commercial ones, as long as you don’t resell the datasets themselves.
If you publish results using or derived from the EFFECT benchmark, please cite the following article:
https://www.biorxiv.org/content/10.1101/2023.10.02.560281
The drug sensitivity dataset is based on GDSC data, so their license also applies (which are largely similar terms)
The gene dependency dataset is based on DepMap, which is published under CC-BY-4.0, don’t forget to attribute them as well!
Initial release containing two independent datasets: one for gene dependency model training and prediction (based on DepMap Achilles data), and another for benchmarking drug sensitivity capabilities (based on GDSC2 data).
An important caveat: don’t use the drug training sets to train for the gene dependency test or the other way around! It will leak data into the holdout sets, invalidating your results.
Also, if you assemble your own train sets to test for these benchmark targets, make sure the drugs’ targets don’t overlap with any genes in the GEX test set and vice versa, genes in your training set shouldn’t overlap with drugs’ targets in the DEX test set.
Statistics:
Drug dataset:
Cell lines in TRAIN (& DEX): 555
Cell lines in CEX (& AEX): 139
Drugs in TRAIN (& CEX): 117
Drugs in DEX (& AEX): 18
Total set sizes:
split 0 (primary) |
split 1 | split 2 | |
---|---|---|---|
TRAIN | 46.038 | 42.896 | 42.654 |
CEX | 14.334 | 13.460 | 13.427 |
DEX | 9.479 | 13.579 | 13.821 |
AEX | 2.424 | 3.396 | 3.489 |
CRISPR KO dataset:
Cell lines in TRAIN (& GEX): 803
Cell lines in CEX (& AEX): 201
Genes in TRAIN (& CEX): 1036
Genes in GEX (& AEX): 258
Genes in extended GEX: 6052
Total set sizes:
split 0 (primary) |
split 1 | split 2 | |
---|---|---|---|
TRAIN | 665.430 | 662.953 | 657.971 |
CEX | 208.196 | 211.308 | 217.520 |
GEX | 207.150 | 206.364 | 204.828 |
AEX | 51.850 | 52.620 | 54.172 |
EXT_GEX | 4.859.048 | 4.840.880 | 4.804.580 |
EXT_AEX | 1.216.216 | 1.234.368 | 1.270.684 |