The CellFie framework


Table of contents

  1. Description
  2. Command options
  3. Usage Example
    1. Transcriptomics Data
    2. Running CellFie
    3. Understanding the CellFie results

Description

The CellFie framework is a contraint-based metabolic modeling framework that was originally published by Richelle et al., 2021. It leverages the use of mathematical descriptions of metabolic functions (metabolic tasks) and transcriptomics data to quantify metabolic functions. As opposed to TIDE, the CellFie framework allows for the processing of multiple samples at a time, making it suitable for large datasets and single-cell RNA sequencing.

Command options

Argument Shortcut Description Default
expr_file   Filename for a normalized gene expression file (e.g., TPM). It should contain at least one column with gene names/symbols. Genes must be stored as EnsemblIDs.  
--delim -d Field delimiter for inputed file. \t
-out -o Directory to store the analysis’ results. The result file(s) will be stored in the specified directory in a tab-sepparated format (.tsv). CellFie_results/
--gene_col   Name of the column in the inputed file containing gene names/symbols. Genes must be stored as EnsemblIDs. geneID
--threshold_type   Determines the threshold approach to be used. A global approach used the same threshold for all genes whereas a local approach uses a different threshold for each gene when computing the gene activity levels. local
--global_threshold_type   Whether to use a value or a percentile of the distribution of all genes as global treshold for all genes. percentile
--global_value   Value to use as global threshold according to the global_threshold_type option selected. Note that percentile values must be between 0 and 1. 0.75
--local_threshold_type   Determines the threshold type to be used in a local approach. minmaxmean: the threshold for each gene is determined by the mean of expression values across all conditions/samples but must be higher or equal than a lower bound and lower or equal to an upper bound. mean: the threshold of a gene is determined as its mean expression across all conditions/samples. minmaxmean
--minmaxmean_threshold_type   Whether to use value or percentile of the distribution of all genes as upper and lower bounds. percentile
--upper_bound   Upper bound value to be used according to the minmaxmean_threshold_type. Note that percentile values must be between 0 and 1. 0.75
--lower_bound   Lower bound value to be used according to the minmaxmean_threshold_type. Note that percentile values must be between 0 and 1. 0.25
--binary_scores   Flag to indicate whether to also return the binary metabolic score matrix as a second result file. See the original publication for more details. False

Usage Example

Transcriptomics Data

One of the first things that the CellFie framework requires is a normalized gene expression matrix (usually stored as TPMs). Normaly, this type of data contains gene names/symbols as rows, and samples as columns. For the command to run, one of the columns of the matrix must store the information regarding gene names/symbols.

A typical normalized gene expression matrix will look like the following:

               geneID         S1         S2         S3         S4
0     ENSG00000000419   6.721972   7.768211   0.111999   0.561086
1     ENSG00000001036   5.880123  10.804611   4.273897   3.703098
2     ENSG00000001084  13.568022  11.912389  21.792070   4.126645
3     ENSG00000001630   9.830659  10.973878  16.052115   3.264040
4     ENSG00000002549  10.312642  10.373970   6.246490   0.597024
...               ...        ...        ...        ...        ...

Running CellFie

To run the CellFie framework using the command-line, the command run-mtea CellFie should be used with the desired arguments. A typical CellFie analysis is run using the minmaxmean local thresholding strategy, which will be used by default by the command, with a percentile upper and lower bounds of 0.75 and 0.25.

Only the Human-GEM and its metabolic tasks are implemented, so the framework will only take in EnsemblIDs as valid genic nomenclature. We are working to allow for any metabolic model and metabolic tasks to be used for more customisable analyses!

run-mtea CellFie expression_file.tsv \
    -o results/ \
    --gene_col geneID \
    --threshold_type local \
    --local_threshold_type minmaxmean \
    --minmaxmean_threshold_type percentile \
    --upper_bound 0.75 \
    --lower_bound 0.25 \
    --binary_scores

Understanding the CellFie results

Once the analysis is run, one or two results files will be stored in the specified directory.

Result File Description
cellfie_scores.tsv Main result file containing the metabolic activity score values. Columns represent the samples in the original gene expression file, and rows represent all the different metabolic tasks (stored in the task_id column). In addition to the sample columns, there are three more columns containing metabolic task metadata (description, metabolic system and subsystem).
cellfie_binary_scores.tsv Secondary result file that will only be generated if the flag --binary_scores is specified. It has the same structure as the main result file, but contains the binary interpretation of the activity of a metabolic task (0 if the task is considered inactive, 1 if the task is considered active). See the original publication for more information about active and inactive metabolic tasks.

A standard run of the CellFie framework should produce a cellfie_scores.tsv file similar to the following:

    task_id        S1        S2        S3        S4                                   task_description   metabolic_system        metabolic_subsystem
0        X1  0.076515  0.671443  0.050100  0.733470  Oxidative phosphorylation via NADH-coQ oxidore...  Energy Metabolism  Oxydative Phosphorylation 
1        X2  0.863416  0.561653  1.204112  1.253820  Oxidative phosphorylation via succinate-coenzy...  Energy Metabolism  Oxydative Phosphorylation
2        X3  1.354543  0.889970  1.586738  2.489626  Krebs cycle - oxidative decarboxylation of pyr...  Energy Metabolism                Krebs Cycle
3        X4  1.195976  1.961420  1.423547  1.644596                      Krebs cycle - NADH generation  Energy Metabolism                Krebs Cycle
4        X5  1.554831  1.785477  1.541452  1.704194        ATP regeneration (glycolysis + krebs cycle)  Energy Metabolism             Atp Generation
..      ...       ...       ...       ...       ...                                                ...                ...                        ...

For more information about metabolic tasks and their metadata, see the task_info/ folder at the MTEApy repository.