Input data format

Input is expected as spreadsheets of comma-separated values csv.

Full format

Full format can be specified as shown below. If a field is not known, e.g. “MODEL_MUTATIONS” for a given “MODEL”, then leave it blank.

MODEL

DRUG1

DRUG2

TISSUE

MODEL_MUTATIONS

SENSITIVITY_DRUG1

SENSITIVITY_DRUG2

SYNERGY_SCORE

DRUG1_TARGETS

DRUG2_TARGETS

Hx147

CHEMBL3103192

AZ13453202

lung

KRAS

40.498

41.787

0.898170

DAPK2, DAPK3, DYRK1B, DYRK4, DYRK6, FIST3, HI…

NCI-H358

AZD1208

33419-42-0

lung

KRAS

3.643

25.140

4.306900

NCI-H322

topotecan

AZ13453202

lung

TP53

24.558

20.683

2.752007

NCI-H1975

841290-80-0

AZD6738

lung

EGFR, PIK3CA, TP53

18.982

5.044

-2.048408

ADCK4, AGMX1, ALK, ANKK1, ANKRD3, ARK5, ATK, …

FRAP, FRAP1, FRAP2, MTOR, RAFT1, RAPT1

NCI-H358

957054-30-7

NSC169534

lung

KRAS, TP53

13.152

24.732

3.957847

CLK2, CLK4, DYRK6, FIST3, FRAP, FRAP1, FRAP2,…

Alternative data format (simple)

Alternatively, separate data tables can be specified. For example, GDSC is natively better presented in this format.

> NB: In the underlying software, MODEL, DRUG and GENE identifiers are used to match and query the tables described above. Therefore these identifiers must be curated so that they are consistent across the tables.

Known synergy pairs

The first three columns contain model identifier, and drugs identifiers. The last column is the synergy score, or, if unavailable, a binary value 1/0 to indicate synergy or no synergy between the pair of drugs.

MODEL

DRUG1

DRUG2

SYNERGY_SCORE

5637

cisplatin

sunitinib

1

5637

sunitinib

cisplatin

1

8505C

bortezomib

docetaxel

1

8505C

docetaxel

bortezomib

1

A172

cisplatin

temozolomide

1

Sensitivity

A table of sensitivity measures; examples include LNIC50, AUC etc.

MODEL

DRUG

LNIC50

AUC

201T

5-Fluorouracil

3.738474

0.873656

201T

A-83-01

5.577933

0.975815

201T

ACY-1215

2.008393

0.746972

201T

AGI-6780

2.031296

0.977958

201T

AICA Ribonucleotide

10.006271

0.972249

Model tissue annotation

We recommend to perform analysis by tissue. Therefore we require tissue annoation for each model.

MODEL

TISSUE

1181N1

Central Nervous System

1205Lu

Skin

1273-99

Soft Tissue

1321N1

Central Nervous System

143B

Bone

Model mutations

List of mutated genes for each model. Note that the truncated raw csv file looks as below:

` MODEL,MUTATED GENES 201T,"IL16, TEKT4P2, TP53, NDC1, MROH7, INTS11" 22RV1,"SLC2A13, MUC19, SLC15A4, RAB40C, RHOT2" `

MODEL

GENES

201T

IL16, TEKT4P2, TP53, NDC1, MROH7, INTS11, …

22RV1

SLC2A13, MUC19, SLC15A4, RAB40C, RHOT2, …

23132-87

MYH7, EMILIN1, HECW2, CCDC39, PPARGC1B, …

42-MG-BA

LRRC74A, PILRB, CUX1, RB1, STAG2, TP53, …

451Lu

POTEG, SPEF1, LMBRD1, BRAF, ROBO2, TP53, …

Drug targets

List of genes targeted by a drug for each drug.

DRUG

GENE

(5Z)-7-Oxozeaenol

MAP3K7

5-Fluorouracil

Antimetabolite (DNA & RNA)

A-443654

AKT1, AKT2, AKT3

A-770041

LCK, FYN

A-83-01

TGFB1