BioCause annotation
View the corpus online with the brat rapid annotation tool.
The BioCause_corpus directory contains a version of the entire ID corpus, which has been enriched with causality annotation. A more detailed description of this annotation, together with access to the annotation guidelines, is available here.
The BioCause_corpus directory contains files of two types.
- .txt - Contains the text files used for the annotation.
- .ann - Contains the annotated text in stand-off format.
The .ann files contain named entity, event and causality annotations formatted according to the BioNLP 2011 ST style. In the case of terms, the ID occurs first and is delimited from the rest of the line with a TAB character. The primary annotation is given as a SPACE-separated triple (type, start-offset, end-offset). The start-offset is the index of the first character of the annotated span in the text (".txt" file), i.e. the number of characters in the document preceding it. The end-offset is the index of the first character after the annotated span. Thus, the character in the end-offset position is not included in the annotated span. For reference, the text spanned by the annotation is included, separated by a TAB character.
In the case of events, the event ID occurs first, separated by a TAB character. The event trigger is specified as TYPE:ID and identifies the event type and its trigger through the ID. By convention, the event type is specified both in the trigger annotation and the event annotation. The event trigger is separated from the event arguments by SPACE. The event arguments are a SPACE-separated set of ROLE:ID pairs, where ROLE is one of the event- and task-specific argument roles (e.g., Effect, Cause, Theme, Site) and the ID identifies the entity or event filling that role. Note that several events can share the same trigger and that while the event trigger should be specified first, the event arguments can appear in any order.
An example of an annotated causal relation within the .ann file is shown below:
T139 Argument 3854 3963 Mlc is a global regulator of carbohydrate metabolism and controls several genes involved in sugar utilization T140 Trigger 3973 3982 Therefore T141 Argument 4008 4052 Mlc also affects the virulence of Salmonella E48 Trigger:T140 Evidence:T139 Effect:T141
Featured News
- Call for papers: CL4Health @ NAACL 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine