New paper on wide-coverage event extraction using multiple partially overlapping corpora
2013-06-13
We are pleased to announce the publication of a new journal article that presents a method for training an event extraction system that learns from multiple corpora with partial semantic annotation overlap to produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011.
Makoto Miwa, Sampo Pyysalo, Tomoko Ohta and Sophia Ananiadou (2013). Wide coverage biomedical event extraction using multiple partially overlapping corpora. BMC Bioinformatics, 14:175.
Full abstract
Background
Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes.Results
We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011.Conclusions
The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora.Previous item | Next item |
Back to news summary page |
Featured News
- Invited talk at BioASQ 2023
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023 and IJCNLP-AACL 2023
- New Knowledge Transfer Partnership with 10BE5
- Panellist at Digital Trust and Society Forum 2023
- Chinese Government AwardAward for PhD student Tianlin Zhang
- Advances in Data Science and AI Conference 2023
- Keynote talk at EMBL-EBI industry club Machine Learning for Text Mining
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Prof. Ananiadou gives talk as distinguished speaker in the Women in AI speaker series
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
Other News & Events
- Keynote Talk at the Festival of AI
- Recent funding successes for Prof. Sophia Ananiadou
- New article on using neural architectures to aggregate sequence labels from multiple annnotators
- New article on improving biomedical extractive summarisation using domain knowledge
- New article on automated detection and analysis of depression and stress in social media data