The TIDE framework

Description
Command options
Usage example
The TIDE-essential framework

Description

The Task Inferred from Differential Expression (TIDE) framework is a contraint-based metabolic modeling framework that was originally published by Dougherty et al., 2021. It leverages the use of mathematical descriptions of metabolic functions (metabolic tasks) and the results of a Differential Expression Analysis to study metabolic perturbations in a case control assay.

Command options

Argument	Shortcut	Description	Default
`dea_file`		Filename for a differential expression analysis results file. It should contain at least three columns: genic (string), log-FC (numeric) and significance (numeric, e.g.: p-value, adjusted p-value, FDR). Genes must be stored as EnsemblIDs.
`--delim`	`-d`	Field delimiter for inputed file.	`\t`
`-out`	`-o`	Name (and location) to store the analysis’ results. They will be stored in a tab-sepparated file, so filenames should contain the `.tsv` or `.txt` extensions.	`tide_results.tsv`
`--gene_col`		Name of the column in the inputed file containing gene names/symbols. Genes must be stored as EnsemblIDs.	`geneID`
`--lfc_col`		Name of the column in the inputed file containing log-FC values.	`log2FoldChange`
`--pvalue_col`		Name of the column in the inputed file containing significance values. Only required if the flag `--mask_lfc_values` is `True`.	`padj`
`--alpha`	`-a`	Significance threshold to mask log-FC. Only required if the flag `--mask_lfc_values` is `True`.	`0.05`
`--n_permutations`	`-n`	Number of permutations to infer p-values for the metabolic scores. The resolution of the computed p-values will depend on this number.	`1000`
`--n_cpus`		Number of CPUs for parallel execution.	`1`
`--or_func`		Name of the function that will be used to resolve OR relationships in gene-protein-reaction (GPR) rules. Possible values are `absmax`, which will return the absolute maximum value, and `max`, which will return the maximum value.	`absmax`
`--mask_lfc_values`		Flag to indicate whether to mask log-FC values to 0 according to their significance. That is, if a log-FC value is non-significant (determined by the user), they will be masked to 0.	`False`
`--random_scores`		Flag to indicate whether to return the null distribution of random scores used to inferr significance with the results file.	`False`

Usage example

Differential expression analysis

The first thing that the TIDE framework requires is a Differential Expression Analysis (DEA) result. Usually, this kind of data is stored in a tabular format and contains at least three columns: gene names/symbols, expression change values (log-FC) and significancy (p-value).

A typical DEA result will look like the following:

                geneID geneSymbol  log2FoldChange      padj
    ENSG00000000003     TSPAN6        3.710229  0.259406
    ENSG00000000005       TNMD       -2.437056  0.485180
    ENSG00000000419       DPM1        8.749658  0.802934
    ENSG00000000457      SCYL3      -10.409959  0.051220
    ENSG00000000460      FIRRM       -0.977916  0.926198
...                ...        ...             ...       ...

Running TIDE

To run the TIDE framework using the command-line, the command run-mtea TIDE should be used with the desired arguments. A typical TIDE analysis is run using a range of 1,000 to 10,000 permutations, the absmax function to evaluate OR GPR rules, and selecting the --mask_lfc_values flag, which will mask non-significant log-FC values to 0.

Only the Human-GEM and its metabolic tasks are implemented, so the framework will only take in EnsemblIDs as valid genic nomenclature. We are working to allow for any metabolic model and metabolic tasks to be used for more customisable analyses!

run-mtea TIDE dea_file.tsv \
    -o results/tide_results.tsv \
    -n 1000 \
    --n_cpus 4 \
    --or_func absmax \
    --gene_col geneID \
    --lfc_col log2FoldChange \
    --pvalue_col padj \
    -a 0.05 \
    --mask_lfc_values

Understanding the TIDE results

Once the analysis is run, a tabular file containing the analysis results will be saved into the inputed location. The results file will contain 7 columns: a task ID, the metabolic score, the mean random score obtained during the permutation test, its associated p-value, and three more columns detailing the metabolic task description, metabolic system and subsystem.

    task_id     score  random_score  pvalue                                 task_description               metabolic_system            metabolic_subsystem
    X159  1.150921     -0.201177   0.000                           Linolenate degradation              Lipids Metabolism          Fatty Acid Metabolism
    X164  1.216932     -0.185784   0.001                         Arachidonate degradation              Lipids Metabolism          Fatty Acid Metabolism
    X160  1.026041     -0.184708   0.001                            Linoleate degradation              Lipids Metabolism          Fatty Acid Metabolism
    X107  1.228656     -0.178454   0.001         Conversion of lysine to L-2-Aminoadipate         Amino Acids Metabolism              Lysine Metabolism
    X162  0.857837     -0.182584   0.001                     gamma-Linolenate degradation              Lipids Metabolism          Fatty Acid Metabolism
..      ...       ...           ...     ...                                              ...                            ...                            ...

The results can then be used to explore the metabolic changes of a case-control sample.

The TIDE-essential framework

Under construction! Please, come back soon.