NaCTeM

Anatomical entity mention recognition

This is the home page for AnatomyTagger, an open-source entity mention tagger for anatomical entities, the AnatEM anatomical entity mention corpus, and related open data resources presented in

if you use any of the tools and resources available from this page, please cite this paper.

Overview

The following tools and resources are provided:

AnatEM corpus

example annotations

The extended Anatomical Entity Mention corpus (AnatEM) consists of 1212 documents (approx. 250,000 words) manually annotated to identify over 13,000 mentions of anatomical entities. Each annotation is assigned one of 12 granularity-based types such as Cellular component, Tissue and Organ, defined with reference to the Common Anatomy Reference Ontology. The corpus builds in part on two previously introduced resources, AnEM and MLEE. The corpus annotations were created using the brat annotation tool.

Download

The corpus distribution contains the annotations in the standoff format used by the brat tool, the column-based CoNLL format, and the NERsuite format.

AnatomyTagger

AnatomyTagger is a Python-based entity mention tagger implemented using the NERsuite named entity recognition toolkit. The tagger is provided with various lexical and corpus resources for anatomical entity tagging and is simple to train and to apply.

Download

AnatomyTagger UIMA Components

We implemented UIMA components to execute the AnatomyTagger pipeline on the Argo platform.

Download

Live demo

An online demonstration of the Anatomical Entity Tagger is available here.

Literature-scale tagging results

We applied the AnatomyTagger to automatically recognize anatomical entity mention in the entire Open Access biomedical domain literature available in the Europe PMC Open Access Subset, at the time of processing consisting of 606389 full text articles. The results of this tagging are available:

Download

PMC Open Access Subset documents are available under various Open Access licenses; please refer to the original documents for the specific license applicable to each document. The automatically created anatomical entity mention annotations are under the CC BY-SA license.

Lexical Resources

The following lexical resources were introduced to support anatomical entity mention tagging. These resources are included in both the AnatomyTagger and UIMA components packages above. These can also be downloaded separately below:

These resources are made available under the CC BY-SA license.

Trained Models

The following models were trained on the AnatEM corpus annotations for the recognition of anatomical entities. The models are compatible with NERSuite and AnatomyTagger. Please see the AnatomyTagger documentation for instructions on the application of these models.

References

These tools and resources were introduced in

Please cite this paper if you use any of the tools and resources available from this page.