Metabolite and Enzyme corpus


Recently, the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus, it is important to recover them from the literature.

We are pleased to announce the availability of the NaCTeM Metabolite and Enzyme corpus.

The corpus is intended to act as a means to train text mining systems to recognise metabolites and enzymes. It consists of 296 MEDLINE abstracts that have been manually annotated by domain experts.

The following paper provides more details about the corpus and a system trained to recognise metabolites automatically:

Nobata, C., Dobson, P., Iqbal, S. A., Mendes, P., Tsujii, J., Kell, D. B. and Ananiadou, S. (2011). Mining Metabolites: Extracting the Yeast Metabolome from the Literature. Metabolomics, 7(1), 94-101.

Previous itemNext item
Back to news summary page