ACELA
The ACELA (ACcElerated Annotation) tool aims to reduce the human effort required to produce a gold standard corpus of named entity (NE) annotations. The process of annotation is similar to active leaning, in that it is performed as an iterative and interactive process between the human annotator and a machine-learned NE tagger. This is illustrated in Figure 1.
The aim of the tool is to ensure that all NEs of a given type are annotated in a given corpus with minimum effort from the human annotator. Only those sentences that are most likely to contain NEs of the target type (according to the predictions of the tagger) are displayed for the human to annotate, which means that it is not necessary to read through all sentences in the corpus that do contain relevant entities.
At each iteration of the process, the NE tagger is re-trained on all available sentences that have been human anntated, meaning that it makes increasingly accurate predictions about which sentences contain named entities. The tool also makes estimates about the number of entities in the corpus that have been annotated by the human (coverage), and the annotation process stops when the figure is close to 100%.
Figure 2 illustrates the efficiency of the tool for annotating NEs of type DNA in the GENIA corpus. Using manual annotation alone, it can be expected that after manually annotating 10,000 sentences in a sequential fashion, only about half the entities in the corpus will have been annotated. However, using the ACELA tool to predict the sentences most likly to contain the relevant NEs, annotating the same number of sentences acheives almost 100% coverage of all DNA entities in the corpus.
ACELA provides a web-based user interface, part of which is illustrated in Figure 3. The process begin with the user providing a number of example entitites of the target category ("seeds"), which the algorithm uses to start making predictions.
The main annotation screen of the ACELA interface is shown in Figure 4. For each sentence predicted to contain NES, the tool makes a number of alternative suggestions about the correct NEs in the sentence. The task of the annotator is to select the correct prediction for each sentence.
Further details about the annotation framework used by ACELA are aviable in the following paper:
Yoshimasa Tsuruoka, Jun'ichi Tsujii and Sophia Ananiadou. 2008. Accelerating the annotation of sparse named entities by dynamic sentence selection, BMC Bioinformatics, 9(Suppl 11):S8.
If you interested in using ACELA, please contact us
Featured News
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Congratulations to PhD student Panagiotis Georgiades
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation