New article describing cited text span identification method to enable scientific summarisation
2020-05-09
We are pleased to announce the publication of a new article in Scientometrics, which presents a novel approach to the identification of cited text spans in scientific literature, using pre-trained encoders (BERT) in combination with different neural networks. Furthermore, the impact of using these cited text spans as input in BERT-based extractive summarisation methods is assessed.
The research described in the article was carried out in the context of the Lloyds HSE project, in which we collaborate with the Thomas Ashton Institute.
Zerva, C., Nghiem, M., Nguyen, N.T.H., and Ananiadou, S. (2020) Cited text span identification for scientific summarisation using pre-trained encoders. Scientometrics (2020). https://doi.org/10.1007/s11192-020-03455-z
Abstract
We present our approach for the identification of cited text spans in scientific literature, using pre-trained encoders (BERT) in combination with different neural networks. We further experiment to assess the impact of using these cited text spans as input in BERT-based extractive summarisation methods. Inspired and motivated by the CL-SciSumm shared tasks, we explore different methods to adapt pre-trained models which are tuned for generic domain to scientific literature. For the identification of cited text spans, we assess the impact of different configurations in terms of learning from augmented data and using different features and network architectures (BERT, XLNET, CNN, and BiMPM) for training. We show that identifying and fine-tuning the language models on unlabelled or augmented domain specific data can improve the performance of cited text span identification models. For the scientific summarisation we implement an extractive summarisation model adapted from BERT. With respect to the input sentences taken from the cited paper, we explore two different scenarios: (1) consider all the sentences (full-text) of the referenced article as input and (2) consider only the text spans that have been identified to be cited by other publications. We observe that in certain experiments, by using only the cited text-spans we can achieve better performance, while minimising the input size needed.
Previous item | Next item |
Back to news summary page |
Featured News
- ELLIS Workshop on Misinformation Detection - Presentation slides now available
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
Other News & Events
- AI for Research: How Can AI Disrupt the Research Process?
- CL4Health @ NAACL 2025 - Extended submission deadline - 04/02/2025
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal