NaCTeM

Health and Safety Executive (HSE)

Lloyds Register Foundation

Introduction

The Lloyds HSE project is under the umbrella of the £10 million Discovering Safety programme funded by Lloyd's Register Foundation. Central to the programme is the development of new technologies to analyse data and aggregate data from sources worldwide, the key output being new learning to help prevent future accidents occurring. This ambitious programme is a collaboration between the Health and Safety Executive (HSE) and the University of Manchester, resulting in the Thomas Ashton Institute. As part of the programme, we are using state of the art in text mining and natural language processing to extract health and safety insights from free-text sources.

Aims and Objectives

HSE's rich and varied archive of health and safety data, accrued year on year from its workplace inspection, incident investigation and enforcement activities, along with the incident information reported to HSE by its duty holders, provides a ready-made research dataset to support the creation of an ecosystem for automatically generating insights and support health and safety risk assessment. To achieve this aim, the objectives of this project are:

  • to identify groups of thematically related documents and to generate informative labels to characterise the content of each topic using topic analysis and descriptive clustering;
  • to customise and integrate text mining resources and tools to enrich health and safety data with semantic metadata (terms, named entities and relations) automatically extracted from free-text reports;
  • to find and assess risk factors automatically using context classification;
  • to implement and evaluate an interactive and faceted semantic search system to support discovery based on query expansion and mapping methods;
  • to automatically summarise reports based on risks and other semantic information of relevance;
  • to demonstrate the above methodologies using a specific case study.

Framework

  • Information retrieval
    Going beyond keyword queries, we use a combination of machine learning and text mining to retrieve relevant information from health and safety incident or inspection reports.
  • Information extraction
    We use information extraction techniques to automatically extract specific entities referenced in textual data, e.g., people, plants and places relating to a workplace along with any reference to key properties, processes and underlying associations. The extracted information will be integrated with traditional root cause analysis exercises, or the routine use of operating experience classifications and taxonomies. Semantic annotations and information extraction tools will support knowledge discovery, search and other applications e.g. diagnostic tools, optimisation, predictive analysis, etc. We use APLenty, an annotation tool for creating high-quality sequence labelling datasets using active and proactive learning, for creating the labelled data.
  • Risk assessment
    Once free text has been effectively annotated, its content can then be used in more quantitative, inferential type statistical analyses. For example, historic inspection findings might be used to generate insights to help target future inspection efforts and scope.
  • Semantic/cognitive applications
    The conversion of knowledge to understanding requires comprehension, judgment and intellect, historically contingent on human input. Key cognitive application areas will include: a) semantic search to retrieve information from knowledge bases, b) classification to support risk assessment and recommender systems, c) document summarisation.

News

We achieved first place in the Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2019) at the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France

Publications

Chrysoula Zerva, Minh-Quoc Nghiem, Nhung T.H. Nguyen, and Sophia Ananiadou, 2019. NaCTeM-UoM @ CL-SciSumm 2019. In BIRNDL@ SIGIR (to appear)

Project Team

Principal Investigator: Prof. Sophia Ananiadou

Co-investigators: Mr John McNaught, Dr Tingting Mu, Prof. Goran Nenadic

Researchers: Dr. Emrah Inan, Dr. Phong Le, Dr. Minh-Quoc Nghiem

TAI DSP