Species disambiguation of biomedical named entities- release of software, corpus and article
2010-03-31
Text mining technologies have been shown to reduce the laborious work involved in organising the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts.
The DECA project has released a corpus for organism disambiguation where every occurrence of protein/gene entity is manually tagged with a species ID
Software trained on the corpus has also been released, both as a web-based demo and as U-Compare components
The creation of the corpus and the training of the sotware are described more fully in a newly-released article:
Xinglong Wang, Jun'ichi Tsujii and Sophia Ananiadou (2010). Disambiguating the Species of Biomedical Named Entities Using
Natural Language Parsers.
Bioinformatics 2010
Previous item | Next item |
Back to news summary page |
Featured News
- Invited talk at BioASQ 2023
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023 and IJCNLP-AACL 2023
- New Knowledge Transfer Partnership with 10BE5
- Panellist at Digital Trust and Society Forum 2023
- Chinese Government AwardAward for PhD student Tianlin Zhang
- Advances in Data Science and AI Conference 2023
- Keynote talk at EMBL-EBI industry club Machine Learning for Text Mining
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Prof. Ananiadou gives talk as distinguished speaker in the Women in AI speaker series
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
Other News & Events
- Keynote Talk at the Festival of AI
- Recent funding successes for Prof. Sophia Ananiadou
- New article on using neural architectures to aggregate sequence labels from multiple annnotators
- New article on improving biomedical extractive summarisation using domain knowledge
- New article on automated detection and analysis of depression and stress in social media data