ELRA Distribution Agreement signed for BioLexicon
2009-09-28
ELRA, together with the National Centre for Text Mining (NaCTeM, University of Manchester, UK), the European Bioinformatics Institute (EBI, Hinxton, UK), and Istituto di Linguistica Computazionale-Consiglio Nazionale Ricerche (ILC-CNR, Pisa, Italy), has signed a Language Resources distribution agreement for a large-scale English language terminological resource in the biomedical domain: BioLexicon.
Biological terminology is a frequent cause of analysis errors when processing literature written in the biology domain, due largely to the high degree of variation in term forms, to the frequent mis-matches between labels of controlled vocabularies and ontologies on the one hand and the forms actually occurring in text on the other, and to the lack of detailed formal information on the linguistic behaviour of domain terms. For example, "retro-regulate" is a terminological verb often used in molecular biology but it is not included in conventional dictionaries. BioLexicon is a linguistic resource for the biology domain, tailored to cope with these problems. It contains information on:
- terminological nouns, including nominalised verbs and proper names (e.g., gene names)
- terminological adjectives
- terminological adverbs
- terminological verbs
- general English words frequently used in the biology domain
Existing information on terms was integrated, augmented, complemented and linked, through processing of massive amounts of biomedical text, to yield inter alia over 2.2M entries, and information on over 1.8M variants and on over 2M synonymy relations. Moreover, extensive information is provided on how verbs and nominalised verbs in the domain behave at both syntactic and semantic levels, supporting thus applications aiming at discovery of relations and events involving biological entities in text.
This comprehensive coverage of biological terms makes BioLexicon a unique linguistic resource within the domain. It is primarily intended to support text mining and information retrieval in the biomedical domain, however its standards-based structure and rich content make it a valuable resource for many other kinds of application.
Existing information on terms was integrated, augmented, complemented and linked, through processing of massive amounts of biomedical text, to yield inter alia over 2.2M entries, and information on over 1.8M variants and on over 2M synonymy relations. Moreover, extensive information is provided on how verbs and nominalised verbs in the domain behave at both syntactic and semantic levels, supporting thus applications aiming at discovery of relations and events involving biological entities in text.
This comprehensive coverage of biological terms makes BioLexicon a unique linguistic resource within the domain. It is primarily intended to support text mining and information retrieval in the biomedical domain, however its standards-based structure and rich content make it a valuable resource for many other kinds of application.
More information
For information about the Biolexicon, including references and information about how to obtain, please see our BioLexicon Page.
Previous item | Next item |
Back to news summary page |
Featured News
- Talk at Generative AI Summit
- Talk at Open Data Science Conference (ODSC)
- BioLaySumm 2023 - Shared Task @ BioNLP 2023
- Prof. Ananiadou appointed as Senior Area Chair for ACL 2023
- Recent funding successes for Prof. Sophia Ananiadou
- Junichi Tsujii awarded Order of the Sacred Treasure, Gold Rays with Neck Ribbon
Other News & Events
- Prof. Ananiadou gives talk as part of Women in AI speaker series
- New Knowledge Knowledge Transfer Partnership with 10BE5
- Keynote Talk at the Festival of AI
- New article on using neural architectures to aggregate sequence labels from multiple annnotators
- New article on improving biomedical extractive summarisation using domain knowledge