SLiM: Pilot study of the utility of text mining and machine learning tools to accelerate systematic review and meta-analysis of findings of in vivo research
Background
There is now more research published than ever before. The primary bibliographic database for biomedical research, PubMed, adds around 3,500 new references every day. Our random sample of 2,000 publications in PubMed suggests that in 2013 there were 98,000 publications describing in vivo experiments, of which 21,000 were in pharmacology and 14,500 in neuroscience. No one individual can read, let alone critically appraise or use even a small fraction of this new information, which is the product of months of investigator effort and substantial investment of research funds. This mismatch between the amount of research produced and the amount that can be effectively used, is a major challenge to biomedical research.
The Cochrane Collaboration has been highly successful in synthesising meta-analyses of clinical trial data and providing outcomes in an easily assimilated, widely recognised, format that is readily useable for healthcare funding decisions and day to day clinical practice. This approach has also influenced major improvements in research quality, especially the design, conduct and reporting of clinical trials. Whilst we wish to replicate the success of Cochrane in the pre-clinical domain, we recognise that the sheer volume and publication rate of pre-clinical data mean that methodological innovations are required that move beyond the largely manual processes that are currently adopted for most clinical systematic reviews. For example, in our recently completed systematic reviews of neuropathic pain, data had to be extracted from 229 clinical trials, whereas for the corresponding on-going pre-clinical systematic review, 65,156 publications were retrieved by the initial search. Of these, 33,818 had to be screened to identify approximately 6000 publications that actually contained relevant data.
Furthermore, there are substantial concerns about the risk of bias, either due to sub-optimal experimental design, or because published work is likely to overstate observed effects. Additionally, where sample sizes are low (and sample size calculations are seldom reported), there is also a risk that important biological effects are overlooked because individual studies do not carry sufficient weight.
In brief, therefore, the challenges are as follows:
- Information of potential relevance to scientists is produced at such a volume and rate that "reading the literature" is not feasible
- The risk of bias in in vivo research is such that detailed critical appraisal is required to allow judgement of whether the conclusions drawn are justified and whether a particular experimental design is appropriate
- Publication bias means that scientists relying on selected sources (e.g., particular journals) are likely to be misled
- Conventional systematic reviews can be helpful, but are usually between one and two years out of date even on their date of publication. This problem that is further intensified by the sheer volume of data implicit in a pre-clinical systematic review
Text Mining Support at NaCTeM
The main tasks of the projects are:
- identifying and retrieving relevant publications,
- extracting meta-data from identified publications,
- extracting outcome data from relevant publications.
To address this task, NaCTeM will build upon successful previous work, including the RobotAnalyst system, developed as part of the Supporting Evidence-based Public Health Interventions using Text Mining project.
References
Bahor, Z., Liao, J., Macleod, M. R., Bannach-Brown, A., McCann, S. K., Wever, K. E., Thomas, J., Ottavi, T., Howells, D. W., Rice, A., Ananiadou, S. and Sena, E. (2017). Risk of bias reporting in the recent animal focal cerebral ischaemia literature. In: Clinical Science, 131(20), 2525--2532
Sato, M., Brockmeier, A. J., Kontonatsios, G., Mu, T., Goulermas, J. Y, Tsujii, J. and Ananiadou, S. (2017). Distributed Document and Phrase Co-embeddings for Descriptive Clustering. In Proceedings of EACL, pp. 991-1001.
Haynes, C., Kay, N., Harrison, K., McLeod, C., Shaw, B., Leng, G., Kontonatsios, G. and Ananiadou, S. (2016). Using text mining to facilitate study identification in public health systematic reviews. In: Guidelines International Network (G-I-N) conference
Hashimoto, K., Kontonatsios, G., Miwa, M. and Ananiadou, S. (2016). Topic Detection Using Paragraph Vectors to Support Active Learning in Systematic Reviews. In: Journal of Biomedical Informatics, 62, 5965
Mo, Y., Kontonatsios, G. and Ananiadou, S.. (2015). Supporting Systematic Reviews Using LDA-based Document Representations. Systematic Reviews , 4, 172
O'Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. and Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews 4:5 (Highly Accessed)
Miwa, M., Thomas, J., O'Mara-Eves, A. and Ananiadou, S. (2014). Reducing systematic review workload through certainty-based screening. Journal of Biomedical Informatics
Project team
Principal Investigator: Prof. Malcolm Macleod, Centre for Clinical Brain Sciences, University of EdinburghCo-Investigators:
Prof. Sophia Ananiadou (NaCTeM)
Prof. James Thomas (UCL Institute of Education, University College London)
Prof. Andrew Rice (Department of Surgery and Cancer, Imperial College London)
Dr. Emily Sena (Centre for Clinical Brain Sciences, University of Edinburgh)
Researchers: Dr. Georgios Kontonatsios, Dr. Piotr Przybyła.
Funding
The project runs from April 2016 until March 2018, and is funded by the MRC (Grant No. MR/N015665/1). Please also see the RCUK Project page.
Featured News
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Call for papers: CL4Health @ NAACL 2025
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine