Entity Mention Annotation in CHF

Annotation scheme

Our annotation scheme for entity mentions aims to identify words and phrases that describe a number of types of concepts that are highly relevant to phenotype phenomema. Table 1 defines the types of concepts whose mentions are annotated on in the PhenoCHF corpus.

Table 1. Types of concepts annotated in PhenoCHF
Concept TypeDescriptionExamples of mentions
Cause Any medical problem that contributes to the occurrence of CHF chronic renal insufficiency, hypertension
Risk FactorA condition that increases the chance of a patient having the CHF diseaseobesity, type 2 diabetes, high cholesterol
Sign & Symptom Any observable manifestation of a disease which is experienced by a patient and reported to the physicianproductive cough, nausea, vomiting
Non-traditional risk factor Conditions associated with abnormalities in kidney functions that put the patient at higher risk of developing signs ∓ symptoms and causes of CHF iron deficiency, anemia
Organ Any body part lungs, abdomen
Chief Complaint Mentions of CHF CHF, congestive heart failure

Entity mention statistics

All mentions of the concepts of the types shown in Table 1 were annotated in each document of PhenoCHF. The total counts of each type of entity annotated in each part of the corpus (i.e., narrative EHR reports and literature articles) are shown in Table 2

Table 2. Statitics of Entity Mentions in PhenoCHF
Concept TypeNo of annotated mentions in narrative EHR reportsNo of annotated mentions in literature articles
Cause 1320 1107
Risk Factor1335408
Sign & Symptom 2449304
Non-traditional risk factor308 329
Organ 432 -

Distribution of entity mentions in PhenoCHF

Figure 1 provides on overview of the distribution of the mentions of phenotype-related concepts in PhenoCHF. In discharge summaries, there is large emphasis on describing the signs and symptoms of the disease, but these play a much less significant role in scientific articles, where the dominant topics are non-traditional risk factors and the etiology of CHF.

Figure 1. Distribution of entity mention annotations in PhenoCHF


The entity annotations were undertaken by two medical doctors. The quality and consistency of the annotations were verified through the calculation of inter-annotator agreement (IAA). We calculated IAA in terms of F-Score, and found that high levels of agreement were acheived. We calcluated both exact span matches, where the start and end of the annotated text spans chosen by both annotators must match exactly, and relaxed span matches, where it is sufficient for the annotated text spans to include some common parts. The IAA statistics, in terms of F-score, are shown in Table 3.

Table 3. Inter-annotator agreement rates (F-score)
Agreement TypeNarrative EHRsLiterature articles
Exact Match 0.82 0.69
Relaxed Match0.920.77