Trying to find a way to take a cancer drug (from CancerDR, for instance) and infer the metabolic reactions that are affected by it in the Human Metabolic Model.
Essentially, I would like to know which HMM reactions are affected by a certain cancer therapy drug, especially the ones found in the Cancer Cell Line Encyclopedia.
Example:
Searching for the reactions that are associated with the HSP90
protein, which is targeted (to some extent) by the 17-AAG
agent. That way, I can trace back the relevant section of the HMM and mark it as 'affected' by that agent.
This could be a very broad or naïve question, but I am afraid I don't know enough to be able to determine that.
If someone can only provide some tip or guidance, that will be greatly appreciated as well.
Thanks very much!
Short answer:
For drugs with targets having a direct effect on reactions in the model:
Find the gene name for the target protein -> Find the Entrez Gene ID (or other gene identifier, depending on the model) -> Search the model file or database for reactions having a gene association with the gene identifier.
AND/OR:
Find the EC number for the target protein. -> Search the model file or database for reactions associated with the EC number (likely to miss many reactions or be imprecise).
EC numbers can be found in the UniProt database, while the HGNC database at http://www.genenames.org/ can be used to look up genes by name and find their Entrez Gene ID and other resources, including links to UniProt. Many other databases can also be used.
For drugs with targets having an indirect effect on reactions in the model:
Model/estimate in-direct metabolic effect yourself by using data on known regulatory effects to create new gene-reaction associations or modify the bounds on reactions affected by the drug.
Long answer:
To knock out or constrain a reaction based on data about a drug, you have to go from an entry in a drug database to an EC number or gene identifier consistent with the annotation in the model you are using. However, There is currently no standard annotation nomenclature for gene-protein-reaction (GPR) associations in metabolic models, and in general only direct GPR associations are encoded in model files.
Assuming that you have a metabolic model in SBML format (for example the RECON 2 model), reactions may have a gene association and/or be associated with an Enzyme Comission (EC) Number.1 However, several model reactions catalyzed by distinct isozymes (different enzymes catalyzing the same reaction) may be associated with the same EC number, and many reactions may not be annotated with an EC association. Therefore you might want to use the gene associations directly. Both gene associations and EC numbers are recorded in the "notes" tag for each reaction in the SBML file, and is not part of the SBML specification. (A proposal has been published for improving support for gene associations in the SBML format.2)
Using EC numbers:
Using EC numbers is the simplest, but not the most precise or possibly even usable solution. It is simpler than using gene associations, because genes may described using any of a number of identifiers depending on the database while any reaction is associated with at most one EC number. Once you know the target protein, you can look it up in the UniProt database, and find its EC number (if any) in the field "protein names". A simple example: The DrugBank entry for the cancer drug Fluorouracil (http://www.drugbank.ca/drugs/DB00544), links to the UniProt entry for one of its targets, Thymidylate synthase (http://www.uniprot.org/uniprot/P04818), where the EC number is given as 2.1.1.45. Searching the RECON 2 SBML file (or searching in BiGG at http://bigg.ucsd.edu/bigg/main.pl) for occurences of this number will show that it is associated with the "TMDS" reaction.
The IUPHAR database (http://www.iuphar-db.org/index.jsp) may also be useful.
Using Gene associations:
Genes are conceptually at least one step removed from the reactions their products catalyze, so one might argue that using EC numbers which actually correspond to reactions would be better. However, it is thinkable that a drug might affect only one of several enzymes which catalyze the same reaction and thus share an EC number. Furthermore, annotation of reactions with EC numbers in metabolic models may be poor. Thus you may want to use gene associations to determine which reaction are affected by a drug.
Unfortunately, due to the large number of databases, a single gene may be described using many identifiers or symbols in different databases. You thus have to identify the gene coding for the product which is the target of the drug, and obtain the correct identifier for that gene which corresponds to the nomenclature used in the model.
Assuming again that you are using the RECON 2 model, it is an update of RECON1 which was made using the BiGG database3. Gene protein reaction (GPR) associations are therefore recorded using the gene names in BiGG. When ignoring the decimal and any following numbers, BiGG gene names appear to correspond to Gene IDs in the NCBI Gene database.5 A question on how to identify BiGG reactions was also discussed at BioStar.6 However, you have the opposite problem: Finding a BiGG gene name from another gene identifier.
Using Fluorouracil as an example again, the DrugBank entry lists "Cytochrome P450 2C9" as a targeted enzyme, with the effect being inhibition. Following the link to the UniProt entry P11712 (http://www.uniprot.org/uniprot/P11712), we find that it has an EC number 1.14.13.- However, this EC number corresponds to a class of reactions (or rather, enzymes). There are also listed several additional EC number at the single reaction level, but as these may not cover or be associated with all relevant reactions, we'll use the gene association. Going back to the DrugBank entry (or just reading on the UniProt page), we find that the gene name is CYP2C9. Looking up this name at a suitable database, for example the HGNC database of human gene names at http://www.genenames.org/, we find the entry for CYP2C9 (http://www.genenames.org/cgi-bin/gene_symbol_report?hgnc_id=2623) and that its Entrez Gene ID is 1559. Searching for this gene identifier in the BiGG database, we find that four reactions are associated with this identifier (The actual association is to "1559.1"): P4502C9,P4502C92,P4502C93 and P4502C94. Note that none of these reactions have an associated EC number.
Defining your own GPR associations
Keep in mind that even in genome-scale metabolic models, a large number of reactions and processes are missing. While the term metabolism in its widest sense may be used to refer to all reactions occuring in a cell, metabolic models have so far been mostly limited to covering transport and inter-conversions of small metabolites. Processes involving signalling molecules and regulation of enzymes have been mostly or completely left out, along with critical processes such as DNA replication, DNA repair and apoptosis. Because of the limited scope of the models, gene-protein-reaction (GPR) relations do not generally take regulatory effects into account, being limited to describing direct relations between one or more genes encoding (part of) a protein which directly catalyzes a reaction.
One could also model a very large number of indirect regulatory effects, but as current metabolic models generally do not specify these, you may have to construct them yourself. To find reactions in a metabolic model which are affected by a certain drug, you may have to "travel" several steps of interactions from the original target of the drug. You may consider using protein-protein-interaction network (PIN) data to help in this. You will have to consider the strength/importance of the interactions, something which may require a high degree of manual curation.
For example, if your drug affects HSP90, which is needed for correct folding of certain enzymes, you may consider creating a GPR stating that the gene for HSP90 (or rather, HSP90 itself) is needed for any reactions catalyzed by a specific enzyme to be active. This would be an indirect GPR, as HSP90 does not itself produce any part of the enzyme that catalyzes the reaction. As an alternative to on/off regulation through a GPR, you may consider searching for data on the ratio of correctly folded and misfolded enzymes in the presence and absence of HSP90, and apply bounds on reactions catalyzed by the affected enzyme to reflect this.