Workshop Overview

**The workshop will feature an invited talk by Dr. Makoto Miwa of the Toyota Technological Institute, Japan **

Aims & Objectives

The specific characteristics of biomedical and clinical text motivate the need to develop specialised NLP and text mining methods, or to adapt or reengineer existing tools. Most such techniques will be reliant on access to one or more resources that provide information about domain-specific language usage, thus highlighting their central role. Accordingly, it is important to provide fora in which the current state of the art of such resources, as well as the NLP and text mining tools that make use of them, can be evaluated and discussed.

Since 2008, the biennial Workshops on Building and Evaluating Resources for Biomedical Text Mining have allowed dissemination of current developments in the field, and have provided the opportunity to discuss current problems, ideas, questions and open issues, and to understand current and potential uses of resources.

The fifth workshop in this series (BioTxtM 2016) will once again aim to bring together researchers who have developed or evaluated NLP tools and resources for biomedical or clinical applications (e.g., text mining, multilingual search, machine translation, information extraction, question-answering), as well as domain experts/health professionals who use or would benefit from the use of such resources and tools. By complementing traditional presentations with a focussed discussion session and, for the first time, a shared task sessions, the workshop aims to stimulate novel research efforts and collaborations between researchers and professionals with complementary knowledge and skills. We particularly welcome submissions that handle languages other than English, or which facilitate multilingual access to information.

BioTxtM 2016, which will take place on 12th December 2016 in Osaka, Japan, is a workshop of COLING 2016, the 26th International Conference on Computational Linguistics.


Over the past years, there has been an exponential growth in amount of textual biomedical and clinical information available in digital form. In addition to the 25 million references to biomedical literature currently available in PubMed, there is a wealth of information available in clinical records, whilst the growing popularity of social media channels has resulted in the creation of various specialised groups. Extensive information is available in languages other than English, e.g. much medical literature is written especially in Chinese, but to a certain extent also in Japanese, Korean and Russian.

With such a deluge of information at their fingertips, domain experts and health professionals have an ever-increasing need for Natural Language Processing (NLP) tools that can help them to isolate relevant nuggets of information in a timely and efficient manner, regardless of their mother tongue. However, this presents many new challenges in analysis and search. For example, given the highly multilingual nature of available information, language barriers may result in vital information being overlooked. In addition, different information sources cover varying topics and contain differing styles of language, while alternative terminology may be used by lay persons, academics and health professionals. There is also often little standardisation amongst the extensive use of abbreviations found in narrative clinical text.

There are various existing resources, such as databases/ontologies, lexica and annotated corpora, which attempt to provide an account of biomedical knowledge and domain-specific patterns of language usage. Such resources provide crucial syntactic and semantic information to NLP tools that aim to process and make sense of biomedical and clinical text. However, the variable textual characteristics of alternative information sources (e.g., different language structures and the frequent appearance of novel terms and abbreviations), combined with the requirement to take multilingual information into account, means that there is an urgent need to investigate new methods for creating and updating resources, or adapting them to new languages. New techniques may include combining semi-automatic methods, machine translation techniques, crowdsourcing or other collaborative efforts.