Anatomy resources


Anatomical entities such as kidney, muscle and blood are central to much of biomedical scientific discourse, and the detection of mentions of anatomical entities is thus necessary for the automatic analysis of the structure of domain texts.

This page provides various tools and resources related to anatomical entities and their detection in text.

Anatomical Entity Mention (AnEM) corpus

To advance automatic anatomical entity mention detection, we have created the AnEM corpus, a domain- and species-independent resource manually annotated for anatomical entity mentions using a fine-grained classification system. The corpus consists of 500 documents (over 90,000 words) selected randomly from citation abstracts and full-text papers with the aim of making the corpus representative of the entire available biomedical scientific literature. The corpus annotation covers mentions of both healthy and pathological anatomical entities and contains over 3,000 annotated mentions.

example annotations

To allow the corpus to serve as a reference standard for the development and evaluation of methods for anatomical entity mention detection, we make the corpus available under the open CC-BY-SA licence and provide standard train/test splits and evaluation tools.

Corpus description

The AnEM corpus is presented in the following manuscript:

Annotation visualisations

The AnEM corpus annotations can be browsed using visualisations created using the brat tool here: browse AnEM data online.


The AnEM corpus data and evaluation tools as well as a set of supplementary data (feature representations, models, system outputs and evaluation results) are available for download:


1. Annotations

The annotations in the anatomy resources are copyrighted and licensed under the Creative Commons BY-SA 3.0 license.

Briefly, under this open licence, you are free to use and build on these resources as long as you attribute them correctly and distribute works building on the resources under a similar licence (click here for details).

Please attribute the resources by citing the DSSD'12 paper (see below) in publications and linking to this page in online resources.

2. Texts

The abstracts contained in the anatomy resources are from PubMed, a database of the U.S. National Library of Medicine (NLM). Please see the NLM page on copyright information regarding the copyright of the abstracts.

The full text extracts contained in the anatomy resources are from articles in the Open Access Subset of the PubMed Central (PMC) database of the NLM. These articles are made available under a Creative Commons or similar licence. Please see the REFERENCES file in the distribution for references to the articles and the PMC version of each article for the specific license terms.


See also

  • Previously released Anatomy resources (content of linked page is due to be merged with this one, please link to this page instead)


For any queries relating to the corpus, please contact: sampo pyysalo at gmail dot com.