Named Entity Annotation
Annotation scheme
The annotation scheme aims to capture fine details about phenotypes. Using a detailed, hierarchically structured set of semantic labels, and allowing entity spans to be nested within each other, potential relationships between entities are captured, e.g., if a treatment is mentioned within a phenotype (e.g. Steroid-induced skeletal muscle atrophy), then it is likely that the phenotype is caused or affected by the nested treatment.
The hierarchy of NE labels in our scheme is shown below, color-colded according to the level in the hierarchy. For each NE annotated, annotators were instructed to assign the most specific label possible. Table 1 below provides definitions and examples of each of these categories, together with annotation counts in the final corpus.
- Problem
- MedicalCondition
- RiskFactor
- SignOrSymptom
- IndividualBehaviour
- TestResult
- Treatment
- Test
- RadiologicalTest
- MicrobiologicalTest
- PhysiologicalTest
- ConstituentConcept
- AnatomicalConcept
- Drug
- Protein
- Quality
NE Type | Description | Examples | Number of annotations |
---|---|---|---|
Problem | An overall category for any COPD indicates of concern | frequent exacerbator | 2556 |
MedicalCondition | Any disease or medical condition; includes COPD comorbidities | emphysema, pulmonary vascular disease, asthma | 5119 |
RiskFactor | A phenotype signifying a patient's increased chances of having COPD | increased levels of the C-reactive protein, alpha1 antitrypsin deficiency | 1211 |
SignOrSymptom | An observable irregularity manifested by a COPD patient | chronic cough, shortness of breath | 2065 |
IndividualBehaviour | A patient's habits leading to susceptibility of having COPD | smoking for 25 years | 194 |
TestResult | Findings based on COPD-relevant examinations | decrease in rate of lung function, FEV1 45% predicted | 685 |
Treatment | Any medication, therapy or program for treating COPD | oxygen therapy, pulmonary rehabilitation | 4337 |
Test | An overall category for any COPD-relevant examinations or measures/parameters | increased compliance of the lung, FEV1, FEV1/FVC ratio | 3576 |
RadiologicalTest | Any of the radiological tests for detecting COPD | computed tomography scanning, high resolution computed tomography | 29 |
MicrobiologicalTest | An examination of a COPD-relevant specimen | complete blood count | 11 |
PhysiologicalTest | A measurement of a COPD patient's capacity to exercise | 6-min walking distance | 17 |
ConstituentConcept | an umbrella type for elementary concepts that may form part of a phenotype description; should only be chosen if none of the subtypes below apply | breath, wheezes, air | 5 |
AnatomicalConcept | a mention pertaining to anatomical entities | lung, heart, pulmonary, hepatic, respiratory airway | 2616 |
Drug | any drug name; will mostly overlap with Treatment | corticosteroids | 2593 |
Protein | any protein name | alpha1 antitrypsin | 820 |
Quality | expressions which describe any of the concepts above | chronic, obstructed, damaged, decreased rate, enhanced, decreased amount | 1153 |
Agreement
The entity annotations were undertaken by annotators with domain expertise. The quality and consistency of the annotations were verified through the calculation of inter-annotator agreement (IAA) on six full papers. We calculated IAA in terms of F-Score, for using strict conditions (i.e., requiring both annotators' annotations to match exactly in terms of the text span chosen and the semantic category assigned). The F-Score was 80.49