Description of the API¶

Description of the main package functionality¶

Functions:

`MonteCarloCrossValidation`(dfTas[, n, …])	For continuous predictee: clf_for_CDA=LinearRegression()
`averagePredictionPairs`(se)
`downsampleRun`(ra[, dfTas, pref, mid, rep, …])	df_sub10000 = downsampleRun(range(500, 10000, 500), dfTas=dfTas_AZ_breast, pref=’’, mid=’AZ_breast’)
`filter_synergy_pairs_list`(df, dfTa[, pref])	Get pairs that overlap with sensitivity-mutations-targets data
`fit_validate_predict`(inData, inSynergy[, …])
`getCDAdrugDistance`(df_drugs_celllines[, method])	Excludes pairs of points with missing data method: {‘pearson’, ‘kendall’, ‘spearman’, ‘cosine’}
`getDistanceAndSensitivityData`(df[, method, …])	Sensitivity data and CDA drug distance dfC, dfS, Z = getDistanceAndSensitivityData(df_drug_sensitivity_GDSC1)
`makeCDAformattedData`(tissue, …[, …])
`prepDfTa`(df_drug_sensitivity, …[, lower])	Drug targeting mutated genes
`prepDfTas`(dfTa, dfKS, se_tissue_annotation)	Reshape dfTa, add tissue and synergy information
`prepareFromAZforCDA`(dir, …[, fname, …])	{‘IC50’, ‘LNIC50’, ‘H’}
`prepareFromDCforCDA`(df_DC, study, tissue, …)
`prepareFromDCfull`(df_temp[, sample, …])
`sample_train_predicted`(df_G2, random_state)
`split_train_test_validate_predict`(df[, …])
`testCase`(args, *kwargs)
`trainOneTestAnother`(df_one, df_another[, n, …])
`tryExcept`(func)

getCDAdrugDistance(df_drugs_celllines, method='pearson')[source]¶: Excludes pairs of points with missing data method: {‘pearson’, ‘kendall’, ‘spearman’, ‘cosine’}

getDistanceAndSensitivityData(df, method='pearson', sensitivity_metric='LNIC50', drug='DRUG', lower=True)[source]¶: Sensitivity data and CDA drug distance dfC, dfS, Z = getDistanceAndSensitivityData(df_drug_sensitivity_GDSC1)

prepDfTa(df_drug_sensitivity, se_models_mutations, se_drug_targets, dname, lower=True)[source]¶: Drug targeting mutated genes

filter_synergy_pairs_list(df, dfTa, pref='Synergy pairs')[source]¶

Get pairs that overlap with sensitivity-mutations-targets data

Usage: dfKS = filter_synergy_pairs_list(df_drug_synergy_Narayan, dfTa)

prepDfTas(dfTa, dfKS, se_tissue_annotation, tissue=None)[source]¶: Reshape dfTa, add tissue and synergy information

split_train_test_validate_predict(df, factor=0.5, random_state=None, stratified=False)[source]¶

makeCDAformattedData(tissue, se_drug_synergy, se_tissue_annotation, df_drug_sensitivity, se_models_mutations, se_drug_targets, dataset, sensitivityMethod='cosine', sensitivity_metric='LNIC50', returnMore=False, lower=True)[source]¶

prepareFromDCforCDA(df_DC, study, tissue, se_models_mutations, sample=None, random_state=None, sensitivity_measure='AUC', synergy='SYNERGY_ZIP', returnMore=False)[source]¶

prepareFromDCfull(df_temp, sample=None, random_state=None, returnMore=False)[source]¶

prepareFromAZforCDA(dir, se_tissue_annotation, se_models_mutations, se_drug_targets, fname='oi_combinations_synergy_scores_final.txt', sensitivity_measure='LNIC50')[source]¶: {‘IC50’, ‘LNIC50’, ‘H’}

trainOneTestAnother(df_one, df_another, n=10, n_sample=0.5)[source]¶

averagePredictionPairs(se)[source]¶

fit_validate_predict(inData, inSynergy, extData=None, extSynergy=None, cv=None, max_iter=10000, clf=None, **kwargs)[source]¶

sample_train_predicted(df_G2, random_state, sample=100)[source]¶

tryExcept(func)[source]¶

testCase(*args, **kwargs)[source]¶

MonteCarloCrossValidation(dfTas, n=10, sample_non_synergy=False, sample_non_synergy_size=100, clf_for_CDA=LogisticRegression(), deidentify=False, factor=0.6666666666666666, stratified=False)[source]¶: For continuous predictee: clf_for_CDA=LinearRegression()

downsampleRun(ra, dfTas=None, pref='temp', mid='temp', rep=10, basedir='output/', cv=None, stratified=False)[source]¶: df_sub10000 = downsampleRun(range(500, 10000, 500), dfTas=dfTas_AZ_breast, pref=’’, mid=’AZ_breast’)

Description of the auxiliary package functionality¶

Functions:

`centerDf`(df)	similar to the StandardScaler: from sklearn.preprocessing import StandardScaler StandardScaler().fit(df).transform(df)
`convertProteinNamesToGenes`(se, dfgp)
`encodeNames`(df)	A.k.a.
`fetchMySQL`(s[, host, database, user, password])
`getGeneToProteinNameAssociation`([host, …])	Tables in uniProt: [‘accToKeyword’, ‘accToTaxon’, ‘author’, ‘bigFiles’, ‘citation’, ‘citationRc’, ‘citationRp’, ‘comment’, ‘commentType’, ‘commentVal’, ‘commonName’, ‘description’, ‘displayId’, ‘extDb’, ‘extDbRef’, ‘feature’, ‘featureClass’, ‘featureId’, ‘featureType’, ‘gene’, ‘geneLogic’, ‘history’, ‘info’, ‘keyword’, ‘organelle’, ‘otherAcc’, ‘pathogenHost’, ‘protein’, ‘proteinEvidence’, ‘proteinEvidenceType’, ‘rcType’, ‘rcVal’, ‘reference’, ‘referenceAuthors’, ‘tableDescriptions’, ‘tableList’, ‘taxon’, ‘varAcc’, ‘varProtein’]
`ro_normalized`(df)	Takes a dataframe where index has 3 levels (MODEL, DRUG1, DRUG2), and there are two columns, first with ground truth scores, second with predicted scores.

encodeNames(df)[source]¶: A.k.a. One-hot encoding [‘MODEL’, ‘DRUG1’, ‘DRUG2’] should be present either in index levels or in the columns. The idea is from the AstraZeneca DREAM challenge second-best winning method for drug synergy prediction.

ro_normalized(df)[source]¶

Takes a dataframe where index has 3 levels (MODEL, DRUG1, DRUG2), and there are two columns, first with ground truth scores, second with predicted scores.

Output is the average pearson correlation coefficient for each drug pair weighted by the number of models for which a pair was measured against.

See definitions in: https://www.synapse.org/#!Synapse:syn4231880/wiki/235660

fetchMySQL(s, host='host', database='database', user='user', password='password')[source]¶

getGeneToProteinNameAssociation(host='genome-mysql.cse.ucsc.edu', database='uniProt', user='genomep', password='password')[source]¶

Tables in uniProt: [‘accToKeyword’, ‘accToTaxon’, ‘author’, ‘bigFiles’, ‘citation’, ‘citationRc’, ‘citationRp’, ‘comment’, ‘commentType’, ‘commentVal’, ‘commonName’, ‘description’, ‘displayId’, ‘extDb’, ‘extDbRef’, ‘feature’, ‘featureClass’, ‘featureId’, ‘featureType’, ‘gene’, ‘geneLogic’, ‘history’, ‘info’, ‘keyword’, ‘organelle’, ‘otherAcc’, ‘pathogenHost’, ‘protein’, ‘proteinEvidence’, ‘proteinEvidenceType’, ‘rcType’, ‘rcVal’, ‘reference’, ‘referenceAuthors’, ‘tableDescriptions’, ‘tableList’, ‘taxon’, ‘varAcc’, ‘varProtein’]

To see all columns in a table: fetchMySQL(“SHOW COLUMNS FROM my_table;”)

Usage:: assoc = getGeneToProteinNameAssociation()

convertProteinNamesToGenes(se, dfgp)[source]¶

centerDf(df)[source]¶: similar to the StandardScaler: from sklearn.preprocessing import StandardScaler StandardScaler().fit(df).transform(df)