Supporting Evidence-based Public Health Interventions using Text Mining


This project aims to conduct novel research in text mining and machine learning to transform the way in which evidence-based public health (EBPH) reviews are conducted. The project is a collaboration between three institutions:

  • The National Centre for Text Mining (NaCTeM), with its proven track record of developing effective text mining tools operating in a variety of domains.
  • The Machine Learning and Data Analytics (MaLDA) at the University of Liverpool, specialising in the application of machine learning, data mining and general mathematical modelling and optimisation methodologies to complex real-world problems.
  • The National Institute for Health and Care Excellence (NICE), the world's leading centre of the development and application of the principles of evidence-based medicine to technology appraisal, clinical guidelines and public health.

Project goals

  • to develop new text mining unsupervised methods for deriving term similarities, based on distributional semantics, to produce meaningful and high quality document and label clusters to support screen while searching in EBPH reviews.
  • to develop new seariation algorithms for ranking and visualising meaningful associations of multiple types, dynamically and iteratively.
  • to evaluate these newly developed methods in EBPH reviews, based on implementation of a pilot, to ascertain the level of transformation in EBPH reviewing.


RobotAnalyst is a new tool that builds upon state of the art text mining technologies, including topic modelling and feedback-based text classification models, to minimise the human workload involved in the study identification phase.


5th October 2016

Prof Ananiadou will discuss the work carried out on this project during her participation in a panel session entitled Evidence Synthesis - Current Practices and Future Possibilities to be held as part of the IEEE International Conference on Healthcare Informatics (ICHI 2016), in Chicago, IL, USA.

20th May 2016

The project is mentioned in a new article about text mining and the work of NaCTeM, published in Pharma Technology Focus, a bi-monthly magazine that brings together the latest insights and innovations from across the pharaceutical industry.

Project information

The project is funded by the Medical Research Council for a period of 3 years, starting from 31st March 2014 (Grant No. MR/L01078X/1).

Project team

Prinicpal Investigator: Prof. Sophia Ananiadou (NaCTeM)

Co-Investigators: Mr. John McNaught (NaCTeM), Dr. John Goulermas (MaLDA).

Researchers: Dr. Austin Brockmeier (MaLDA), Dr. Georgios Kontonatsios (NaCTeM), Dr. Tingting Mu (MaLDA), Mr. Kazuma Hashimoto (NaCTeM).

Related Publications

Kontonatsios, G., Brockmeier, A. J., Przybyla, P., McNaught, J., Mu, T., Goulermas, J. Y and Ananiadou, S.. (In Press). A semi-supervised approach using label propagation to support citation screening. Journal of Biomedical Informatics.

Mu, T., Goulermas, J. Y and Ananiadou, S. (In Press). Visualizing local sample and global cohort neighbouring structure. IEEE Transactions on Pattern Analysis and Machine Intelligence

Sato, M., Brockmeier, A. J., Kontonatsios, G., Mu, T., Goulermas, J. Y, Tsujii, J. and Ananiadou, S. (2007). Distributed Document and Phrase Co-embeddings for Descriptive Clustering. In Proceedings of EACL, pp. 991 - 1001.

Alnazzawi, N., Thompson, P. and Ananiadou, S. (2016). Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource. PLOS ONE, 11(9), e0162287

Haynes, C., Kay, N., Harrison, K., McLeod, C., Shaw, B., Leng, G., Kontonatsios, G. and Ananiadou, S.. (In Press). Using text mining to facilitate study identification in public health systematic reviews. In: Guidelines International Network (G-I-N) conference

Hashimoto, K., Kontonatsios, G., Miwa, M. and Ananiadou, S. (2016). Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews. In: Journal of Biomedical Informatics, 62, 5965

Mo, Y., Kontonatsios, G. and Ananiadou, S.. (2015). Supporting Systematic Reviews Using LDA-based Document Representations. Systematic Reviews , 4, 172

Alnazzawi, N., Thompson, P., Batista-Navarro, R. and Ananiadou, S. (2015). Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Medical Informatics and Decision Making 15 (Suppl. 2):S3 .

Miwa, M. and Ananiadou, S. (2015). Adaptable, high recall, event extraction system with minimal configuration. BMC Bioinformatics 16:(Suppl. 10):S7

Xu, Y., Chen, L., Wei, J., Ananiadou, S., Fan, Y., Qian, Y., Chang, E. I-C. and Tsujii, J. (2015). Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary. BMC Bioinformatics 16:149

O'Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. and Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews 4:5 (Highly Accessed)

Mu, T., Goulermas, J. Y, Korkontzelos, I. and Ananiadou, S. (2014). Descriptive Clustering via Discriminant Learning in a Coembedded Space of Multi-level Similarities. Journal of the Association for Information Science and Technology

Ananiadou, S., Thompson, P., Nawaz, R., McNaught, J. and Kell, D. B. (2014). Event Based Text Mining for Biology and Functional Genomics. Briefings in Functional Genomics

Miwa, M., Thomas, J., O'Mara-Eves, A. and Ananiadou, S. (2014). Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics

Miwa, M., Thompson, P., Korkontzelos, I. and Ananiadou, S. (2014). Comparable Study of Event Extraction in Newswire and Biomedical Domains. In Proceedings of Coling 2014

Xu, Y., Hua, J., Ni, Z., Chen, Q., Fan, Y., Ananiadou, S., Chang, E. I-C. and Tsujii, J. (2014). Anatomical entity recognition with a hierarchical framework augmented by external resources PLOS ONE, 9(10), e108396

Further information