LOBICO
  • Browse
    • Browse DataSets
    • Browse Best Models
  • Documentation
  • |
  •   Contact

Models

LOBICO is used to find the logic combinations of binary input features (e.g. mutations, CNAs) that best explain the response of cancer cell lines to anticancer drugs. These logic models that are described in the disjunctive normal form (DNF), a standard notation in which every logic function can be expressed. The DNF is parameterized by two parameters: K, the number of disjuncts, and M, the number of terms per disjunct. LOBICO is applied with eight different parameter settings, i.e. from simple single-predictor models (K=1, M=1) to more complex multi-predictor models ( K>1, M>1). Tables with logic symbols and the possible 8 models are given below.

Logic Symbol Name Logic Symbol
AND &
OR |
NOT ¬
Model complexity Drug name Drug target Optimal logic model
K=1,M=1 (Single predictor) PLX4720 RAF BRAF
K=1,M=2 (2-input AND) Paclitaxel Microtubules CDKN2A & TP53
K=1,M=3 (3-input AND) Cytarabine DNA synthesis CDKN2A & ¬EGFR & ¬SMAD4
K=1,M=4 (4-input AND) KIN001-102 Akt1 ¬APC & ¬BRAF & ¬EGFR & ¬KRAS
K=2,M=1 (2-input OR) BEZ235 PI3K,MTORC PIK3CA | PTEN
K=3,M=1 (3-input OR) AZD6244 MEK 1/2 BRAF | KRAS | NRAS
K=4,M=1 (3-input OR) Afatinib EGFR, ERBB2 EGFR | ERBB2 | JAK2 | SMAD4
K=2,M=2 (2-by-2) JQ12 HDAC (CDKN2A & ¬SMAD4) | (¬KRAS & ¬TP53)

Back to top

Plots

We describe here the details and specifications of the LOBICO plots.

Input (cell lines) histogram, tissue & mutation heatmaps

The input plot is split into 3 parts: i) the drug sensitivity histogram for all cell lines ii) the tissue type heatmap for the sensitive cell lines iii) the input (e.g. mutations) heatmap for the sensitive cell lines.

celllines histogram

The drug sensitivity histogram shows how many cell lines (on the x axis) have a given sensitivity (natural log of the IC50 on the y axis) for that particular drug. The legend indicates the number of sensitive and resistant cell lines after the binarization of the IC50. Hovering over a specific bar of the histogram, you can see the associated values (resistant or sensitive, the number of cell lines and the log IC50 value).

Back to top
celllines tissue type heatmaps

For each of the sensitive cell lines, the (binary) tissue type heatmap indicates its tissue of origin (indicated by light red). Hovering over the light red blocks makes an information bubble appear with the name and tissue type of the cell line.

Back to top
celllines mutation status heatmaps

Below the binary heatmap is a quantitative (green to red) heatmap indicating whether the cell lines from a given tissue type are enriched within the group of sensitive cell lines, as measured by the log2 of the odds ratio (Fisher’s exact test). Red indicates a strong enrichment, while green indicates a strong depletion. If you hover over the bars, then you can see how many sensitive and resistant cell lines are from this tissue type in the total cell lines set along with the associated percentages.

Back to top
celllines tissue type odds ratio heatmaps

For all sensitive cell lines, the (binary) input heatmap indicates whether a given cell line is associated with that input feature i.e. harbor that genetic alteration (dark red) or not (white). Hovering over the light red blocks gives you the names of the cell line and the input feature.

Back to top
celllines mutation status odds ratio heatmaps

Below the binary heatmap is a quantitative (green to red) heatmap indicating whether the cell lines associated with a specific input feature (e.g. harboring a mutation in a given gene) are enriched within the group of sensitive cell lines, as measured by the log2 of the odds ratio (Fisher’s exact test). Red indicates a strong enrichment, while green indicates a strong depletion. If you hover over the bars, then you can see how many sensitive and resistant cell lines are associated with that feature in the total cell lines set along with the associated percentages.

Back to top

Models   model overall and highlight plots

model overall bar plots

This modified trellis bar plot depicts all models (x axis; best model colored in green) and performance stats (y axis). The upper row shows counts of the confusion matrix (shades of green with true positives & negatives and shades of red for the false positives & negatives). The lower row shows the specificity (true negative rate TN/FP+TN), precision (positive predictive value TP/TP+FP), recall (sensitivity or true positive rate TP/TP+FN) and cross validation error. Hovering over all the bars indicates the model names and values. If you click on any of the bar, then the associated model is highlighted on the right hand frame.

Back to top
model highlight plots

The model highlight plot has 3 sections: 1. circuit diagram of the model 2. confusion matrix counts and model performance stats 3. boxplot of the prediction group (resistant & sensitive cell lines) for the drug sensitivity (natural log of IC50).

Back to top
model highlight boxplot

The boxplot shows the natural log IC50 values (y axis) of predicted group (resistant and sensitive) for the input cell lines (x axis). The black dotted line indicates the cutoff value after the binarization of IC50 values. Clicking on the box explodes into a scatter plot and double click reverses back to original plot. Hovering over the scatter plot points shows the cell line name and log value of IC50.

Back to top

Model Boxplots   model overall boxplots

model boxplots

This section depicts all boxplots with natural log IC50 values (y axis) of predicted group (resistant and sensitive) for the input cell lines (x axis) for all models. The model names are shown on the top of the boxplot (best model with green color). The black dotted line indicates the cutoff value after the binarization of IC50 values. Clicking on the box explodes into a scatter plot and double click reverses back to original plot. Hovering over the scatter plot points shows the cell line name and log value of IC50.

Back to top

Datasets

GDSC Dataset

We describe here the details and specifications of the GDSC dataset.

Input Samples tissue, cancer types & input features

The input samples used for generating the logic models using LOBICO algorithm are cell lines from the GDSC dataset (GDSC complete dataset contains primary tumors and cell lines). These cell lines are originated from 21 tissue types. An overview of the tissue types and cancer types (TCGA labels) are given here.

You can download the complete sample landscape for the GDSC dataset here.

The main input features for the cell lines in this dataset:

  • Cancer Genes (CGs) with point mutations, small insertions or deletions (e.g. SMAD4, ATM)
  • Recurrently aberrant copy number segments (RACSs) of a cancer gene or genes, locus / chromosome locations. Amplifications are represented as a(g1,g2,..) (e.g. a(CCND1,CTTN)) and deletions as d(g1,g2) (e.g. d(FAT1), d18q22.1).
  • Activation signature of signaling pathways containing well-known mutational activations, copy number alterations and other signaling aberrations. (e.g. TNFa-UP, TGFB-DOWN). Click here to download the pathway activity scores across cell lines.

Back to top

Output models to explain drug response

LOBICO generates logic models to explain drug response based on binarized mutation data of cancer samples. A gene was called mutated when it had a point mutation, a small insertion or deletion as determined by capillary sequencing, or when it was highly amplified or homozygously deleted based on copy number arrays. LOBICO was executed for each drug separately utilizing pan-cancer and cancer-specific molecular datasets. This led to the inference of 1,080 logic models. In the GDSC dataset, a total of 265 drug compounds are screened. A summary of the complete screened compounds can be downloaded from here.

Back to top

Contents

  • Models
  • Plots
    • Input (cell lines): histogram, tissue & mutation heatmaps
    • Models Overall & Highlight Plots
    • Model BoxPlots
  • Datasets
    • GDSC Dataset
      • Input Samples: tissue, cancer types & input features
      • Output: models to explain drug response