Biomedical causality corpus


Biomedical corpora annotated with event-level information represent an important resource for domain-specific information extraction (IE) systems. However, bio-event annotation alone cannot cater for all the needs of biologists. Causality lies at the heart of biomedical knowledge, such as diagnosis, pathology or systems biology, and, thus, automatic causality recognition can greatly reduce the human workload by suggesting possible causal connections and aiding in the curation of pathway models. A biomedical text corpus annotated with such relations is, hence, crucial for developing and evaluating biomedical text mining.

We are pleased to announce the availability of the NaCTeM BioCause corpus.

The corpus is intended to act as a means to understand how causality is expressed in the biomedical domains and to train text mining systems to recognise it automatically. It consists of 19 open-access full-text journal papers that have been manually annotated by domain experts.

The following paper provides more details about the corpus and characterises biomedical discourse causality:

Mihăilă, C., Ohta, T., Pyysalo, S. and Ananiadou, S. (2013). BioCause: Annotating and analysing causality in the biomedical domain. In: BMC Bioinformatics, 14(2)

Previous itemNext item
Back to news summary page