Pacific Life Re
Background
The role of reliable data in medicine cannot be underestimated. This applies not only to information describing general population-level phenomena covered in scientific publications, but also to health service records describing individuals. Although text mining methods have been widely applied to the former category, the latter has attracted much less attention. One of the main reasons is that these data were previously stored in a format that made them less accessible for digital processing. i.e., as paper documents, which were frequently handwritten. However, increasing adoption of digital solutions both by health service institutions and individual medical practitioners has started to change the picture. This new situation poses both new challenges and opportunities for text mining methods, since there is potentially valuable knowledge contained in individual medical records. In this project, we aim to analyse medical reports using text mining techniques, with the specific goal of quantifying the risk associated with the evidence described.
Problem
This project is being undertaken by NaCTeM in cooperation with a commercial partner, Pacific Life Re. The main task is to analyse an individual's medical report and determine the level of risk associated with the conditions described. The main challenges include the following:
- Medical reports are highly structured documents, containing many elements of different types, such as simple information (e.g., date of birth, gender, height and weight), enumerations (e.g., prescribed drugs), textual descriptions (e.g., outcomes of hospital visits) and references to external documents (e.g., test results).
- Risk can be associated with entities of different types: diseases, symptoms, drugs, test results or habits.
- The influence of a certain risk factor always depends on its context in the document, e.g., temporal (since medical reports frequently cover many years of treatment) or accompanying gradable adjectives (e.g., severe).
- The final risk is usually not a simple sum of the influences of individual factors, as some of them may strongly interact with each other, and thus have a significant impact on overall risk, e.g., family history and negative results of related tests.
- External knowledge is necessary to interpret the document, as the importance of certain types of evidence (e.g., the fact that the individual has prevously suffered from a particular disease) is considered to be implicitly understood by the reader, and hence is not explicitly written in the report.
- The quality of language is frequently poor: reports may contain many (potentially non-standard) abbreviations and acronyms, incomplete sentences and correspondence with patients, which can pose significant challenges for text mining methods.
Related work
The project will build on NaCTeM's experience in the following relevant areas:
- topic analysis,
- coembeddings,
- term extraction (TerMine),
- biomedical named entity recognition and normalisation,
- event attribute recognition (negation, confidence etc.),
- active learning,
- document classification
News
9th October 2018The work being carried out in this project has been mentioned in an article in Cover magazine, leading industry publication for life protection and health insurance.
Publications
Przybyła, P., Brockmeier, A. J. and Ananiadou, S.. (2018). Quantifying Risk Factors in Medical Reports with a Context-Aware Linear Model. Journal of the American Medical Informatics Association, 26(6):537-546Project team
Principal Investigator: Prof. Sophia AnaniadouResearchers: Dr. Nhung Nguyen, Mr. Paul Thompson
Featured News
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Call for papers: CL4Health @ NAACL 2025
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine