New biomedical corpus annotated with entitites and relations


We are pleased to announce the release of a new corpus of biomedical abtracts and articles, annotated with both named entities and relatioships between them, which is aimed at supporting the curation of information pertaining to metabolites in biomedical databases, including ChEBI.

The ChEBI corpus contains 199 annotated abstracts and 100 annotated full papers. In total, the corpus provides over 15000 named entity annotations and over 6,000 relations between entities. The entities were annotated according to the requirements of the curators of the ChEBI database. We primarily annotated mentions of metabolites, as well as other entities that were capable of bearing interesting information related to metabolites, such as: Chemicals, Proteins, Species, Biological Activity and Spectral Data. The entities were annotated with an inter-annotator agreement of 0.80-0.89 (F1 score, strict-matching). The relations provide further information about the links between metabolites and other entities. We following categories of relations have been annotated: Isolated From, Associated With, Binds With, Metabolite Of. The ChEBI corpus can be used to investigate interesting lexical properties of metabolites and related entities. In addition, it can be used to train machine learning algorithms to recognise the entities and relations that have been annotated.

