Evaluation and comparison mechanism

U-Compare features an Evaluation and Comparison mechanism that allows users to identify optimal workflows.
Different combinations of components create different pipelines. Hence, users need to create, execute and evaluate each possible pipeline separately in order to identify the optimal one.


In the above example, we assume that our library of components contains 3 sentence splitters, 3 tokenisers, 3 POS taggers and 3 NERs that create a search space of 81 possible workflows. U-Compare can automatically Evaluate and Compare any number of possible workflows. The following figure illustrates U-Compare’s Evaluation and Comparison mechanism for POS tagger components.
A combination of components is used to gold annotate the data, whilst the remaining possible workflows are used for testing. U-Compare also computes various statistics:

  • G: number of gold annotations
  • T: number of test annotations
  • M: number of matched annotations between the gold and test workflow
  • F1: F1 score
  • PR: precision
  • RC: recall


U-Compare's Evaluation and Comparison mechanism

Users can also view the agreements and disagreements between components that they are comparing directly in the annotation viewer.

Visualisation of agreements and disagreements directly in text.

You can try U-Compare’s Evaluation and Comparison mechanism by selecting a predefined workflow from U-Compare’s menu.

Comments are closed.