New article on chemical named entity recognition
2015-01-20
We are pleased to announce the publication of a new article on chemical named entity recognition in the Journal of Cheminformatics:
Batista-Navarro, R., Rak, R. and Ananiadou, S. (2015). Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. Journal of Cheminformatics, 7(Suppl 1), S6.
The paper reports on a new conditional random fields-based chemical entity recogniser, whose performance is optimised through the incorporation a number of customisations, such as specialised pre-processing analytics, use of chemistry knowledge-rich features and post-processing rules. The recogniser achieves state-of-the-art performance, and is able to outperform two popular chemical NER tools. The suite of tools that form the recogniser has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo.
Full abstract
Background
The development of robust methods for chemical named entity recognition, a challenging natural language processing task, was previously hindered by the lack of publicly available, large-scale, gold standard corpora. The recent public release of a large chemical entity-annotated corpus as a resource for the CHEMDNER track of the Fourth BioCreative Challenge Evaluation (BioCreative IV) workshop greatly alleviated this problem and allowed us to develop a conditional random fields-based chemical entity recogniser. In order to optimise its performance, we introduced customisations in various aspects of our solution. These include the selection of specialised pre-processing analytics, the incorporation of chemistry knowledge-rich features in the training and application of the statistical model, and the addition of post-processing rules.
Results
Our evaluation shows that optimal performance is obtained when our customisations are integrated into the chemical entity recogniser. When its performance is compared with that of state-of-the-art methods, under comparable experimental settings, our solution achieves competitive advantage. We also show that our recogniser that uses a model trained on the CHEMDNER corpus is suitable for recognising names in a wide range of corpora, consistently outperforming two popular chemical NER tools.
Conclusion
The contributions resulting from this work are two-fold. Firstly, we present the details of a chemical entity recognition methodology that has demonstrated performance at a competitive, if not superior, level as that of state-of-the-art methods. Secondly, the developed suite of solutions has been made publicly available as a configurable workflow in the interoperable text mining workbench Argo. This allows interested users to conveniently apply and evaluate our solutions in the context of other chemical text mining tasks.
Previous item | Next item |
Back to news summary page |
Featured News
- BioNLP 2024 accepted as workshop at ACL 2024
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023 and IJCNLP-AACL 2023
- New Knowledge Transfer Partnership with 10BE5
- Chinese Government AwardAward for PhD student Tianlin Zhang
- Advances in Data Science and AI Conference 2023
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon