Species disambiguation of biomedical named entities- release of software, corpus and article
2010-03-31
Text mining technologies have been shown to reduce the laborious work involved in organising the vast amount of information hidden in the literature. One challenge in text mining is linking ambiguous word forms to unambiguous biological concepts.
The DECA project has released a corpus for organism disambiguation where every occurrence of protein/gene entity is manually tagged with a species ID
Software trained on the corpus has also been released, both as a web-based demo and as U-Compare components
The creation of the corpus and the training of the sotware are described more fully in a newly-released article:
Xinglong Wang, Jun'ichi Tsujii and Sophia Ananiadou (2010). Disambiguating the Species of Biomedical Named Entities Using
Natural Language Parsers.
Bioinformatics 2010
Previous item | Next item |
Back to news summary page |
Featured News
- Call for papers: CL4Health @ NAACL 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine