Input data format¶

Input is expected as spreadsheets of comma-separated values csv.

Full format¶

Full format can be specified as shown below. If a field is not known, e.g. “MODEL_MUTATIONS” for a given “MODEL”, then leave it blank.

MODEL	DRUG1	DRUG2	TISSUE	MODEL_MUTATIONS	SENSITIVITY_DRUG1	SENSITIVITY_DRUG2	SYNERGY_SCORE	DRUG1_TARGETS	DRUG2_TARGETS
Hx147	CHEMBL3103192	AZ13453202	lung	KRAS	40.498	41.787	0.898170	DAPK2, DAPK3, DYRK1B, DYRK4, DYRK6, FIST3, HI…
NCI-H358	AZD1208	33419-42-0	lung	KRAS	3.643	25.140	4.306900
NCI-H322	topotecan	AZ13453202	lung	TP53	24.558	20.683	2.752007
NCI-H1975	841290-80-0	AZD6738	lung	EGFR, PIK3CA, TP53	18.982	5.044	-2.048408	ADCK4, AGMX1, ALK, ANKK1, ANKRD3, ARK5, ATK, …	FRAP, FRAP1, FRAP2, MTOR, RAFT1, RAPT1
NCI-H358	957054-30-7	NSC169534	lung	KRAS, TP53	13.152	24.732	3.957847	CLK2, CLK4, DYRK6, FIST3, FRAP, FRAP1, FRAP2,…
…	…	…	…	…	…	…	…	…	…

Alternative data format (simple)¶

Alternatively, separate data tables can be specified. For example, GDSC is natively better presented in this format.

> NB: In the underlying software, MODEL, DRUG and GENE identifiers are used to match and query the tables described above. Therefore these identifiers must be curated so that they are consistent across the tables.

Known synergy pairs¶

The first three columns contain model identifier, and drugs identifiers. The last column is the synergy score, or, if unavailable, a binary value 1/0 to indicate synergy or no synergy between the pair of drugs.

MODEL	DRUG1	DRUG2	SYNERGY_SCORE
5637	cisplatin	sunitinib	1
5637	sunitinib	cisplatin	1
8505C	bortezomib	docetaxel	1
8505C	docetaxel	bortezomib	1
A172	cisplatin	temozolomide	1
…	…	…	…

Sensitivity¶

A table of sensitivity measures; examples include LNIC50, AUC etc.

MODEL	DRUG	LNIC50	AUC	…
201T	5-Fluorouracil	3.738474	0.873656	…
201T	A-83-01	5.577933	0.975815	…
201T	ACY-1215	2.008393	0.746972	…
201T	AGI-6780	2.031296	0.977958	…
201T	AICA Ribonucleotide	10.006271	0.972249	…
…	…	…	…	…

Model tissue annotation¶

We recommend to perform analysis by tissue. Therefore we require tissue annoation for each model.

MODEL	TISSUE
1181N1	Central Nervous System
1205Lu	Skin
1273-99	Soft Tissue
1321N1	Central Nervous System
143B	Bone
…	…

Model mutations¶

List of mutated genes for each model. Note that the truncated raw csv file looks as below:

` MODEL,MUTATED GENES 201T,"IL16, TEKT4P2, TP53, NDC1, MROH7, INTS11" 22RV1,"SLC2A13, MUC19, SLC15A4, RAB40C, RHOT2" `

MODEL	GENES
201T	IL16, TEKT4P2, TP53, NDC1, MROH7, INTS11, …
22RV1	SLC2A13, MUC19, SLC15A4, RAB40C, RHOT2, …
23132-87	MYH7, EMILIN1, HECW2, CCDC39, PPARGC1B, …
42-MG-BA	LRRC74A, PILRB, CUX1, RB1, STAG2, TP53, …
451Lu	POTEG, SPEF1, LMBRD1, BRAF, ROBO2, TP53, …
…	…

Drug targets¶

List of genes targeted by a drug for each drug.

DRUG	GENE
(5Z)-7-Oxozeaenol	MAP3K7
5-Fluorouracil	Antimetabolite (DNA & RNA)
A-443654	AKT1, AKT2, AKT3
A-770041	LCK, FYN
A-83-01	TGFB1
…	…