Anatomy Corpora
Anatomical entities are central to much of biomedical discourse and must be considered in any attempt to fully analyse biomedical scientific text. However, while a wealth of tools and resources have been introduced in domain natural language processing efforts for the recognition of molecular level entity (gene, protein, chemical) and organism name mentions in text, there has been little study of the recognition of mentions of anatomical entities such as tissues and organs.
To address this issue and to facilitate more detailed and comprehensive analysis of biomedical scientific text, our aim has been to to establish a fine-grained, species-independent anatomical entity mention detection task.
We have developed a number of manually-annotated corpora to support the above aim, as follows:
- Multi-Level Event Extraction (MLEE) corpus - abstracts of publications on angiogenesis, annotated with entity mentions and events across multiple levels of biological organization from the molecular to the organ system level. Over 8,000 entities with fine-grained types and over 6,000 structured events are annotated.
- AnEM corpus - a domain- and species-independent resource, annotated with anatomical entity mentions using a fine-grained classification system. The corpus consists of 500 documents (over 90,000 words) selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. The corpus annotation covers mentions of both healthy and pathological anatomical entities and contains over 3,000 annotated mentions.
- Extended Anatomical Entity Mention (AnatEM) corpus - 1212 documents (approx. 250,000 words) annotated with over 13,000 mentions of anatomical entities. Each annotation is assigned one of 12 granularity-based types such as Cellular component, Tissue and Organ, defined with reference to the Common Anatomy Reference Ontology. The corpus builds in part on the AnEM and MLEE corpora.
Featured News
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Congratulations to PhD student Panagiotis Georgiades
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation