NaCTeM

ELRA Distribution Agreement signed for BioLexicon

2009-09-28

ELRA, together with the National Centre for Text Mining (NaCTeM, University of Manchester, UK), the European Bioinformatics Institute (EBI, Hinxton, UK), and Istituto di Linguistica Computazionale-Consiglio Nazionale Ricerche (ILC-CNR, Pisa, Italy), has signed a Language Resources distribution agreement for a large-scale English language terminological resource in the biomedical domain: BioLexicon.

Biological terminology is a frequent cause of analysis errors when processing literature written in the biology domain, due largely to the high degree of variation in term forms, to the frequent mis-matches between labels of controlled vocabularies and ontologies on the one hand and the forms actually occurring in text on the other, and to the lack of detailed formal information on the linguistic behaviour of domain terms. For example, "retro-regulate" is a terminological verb often used in molecular biology but it is not included in conventional dictionaries. BioLexicon is a linguistic resource for the biology domain, tailored to cope with these problems. It contains information on:

  • terminological nouns, including nominalised verbs and proper names (e.g., gene names)
  • terminological adjectives
  • terminological adverbs
  • terminological verbs
  • general English words frequently used in the biology domain

Existing information on terms was integrated, augmented, complemented and linked, through processing of massive amounts of biomedical text, to yield inter alia over 2.2M entries, and information on over 1.8M variants and on over 2M synonymy relations. Moreover, extensive information is provided on how verbs and nominalised verbs in the domain behave at both syntactic and semantic levels, supporting thus applications aiming at discovery of relations and events involving biological entities in text.

This comprehensive coverage of biological terms makes BioLexicon a unique linguistic resource within the domain. It is primarily intended to support text mining and information retrieval in the biomedical domain, however its standards-based structure and rich content make it a valuable resource for many other kinds of application.

Existing information on terms was integrated, augmented, complemented and linked, through processing of massive amounts of biomedical text, to yield inter alia over 2.2M entries, and information on over 1.8M variants and on over 2M synonymy relations. Moreover, extensive information is provided on how verbs and nominalised verbs in the domain behave at both syntactic and semantic levels, supporting thus applications aiming at discovery of relations and events involving biological entities in text.

This comprehensive coverage of biological terms makes BioLexicon a unique linguistic resource within the domain. It is primarily intended to support text mining and information retrieval in the biomedical domain, however its standards-based structure and rich content make it a valuable resource for many other kinds of application.

More information

For information about the Biolexicon, including references and information about how to obtain, please see our BioLexicon Page.

Previous itemNext item
Back to news summary page