The Intute project, co-funded by JISC (Joint Information Systems Committee) and AHRC (Arts and Humanities Research Council) , is a joint work between NaCTeM, MIMAS and the Intute Repository Search Project. The aim is to develop an intelligent semantic search service using NaCTeM's text mining tools, which will grant users the benefit of searching within an enhanced subset of the Intute repository, a collection of academic/technical reports under the domain-heading of Bio-medical Science or Social Science.

In particular, the Intute project considers four directions to improve the current search ability of Intute Repository Search:

  1. Enhancing the metadata using text mining technologies;
  2. Applying the technique(s) of text clustering/classification in the search system;
  3. Developing improved technique(s) for query expansion; and
  4. Involving the idea of personalisation in the search system.

Duration: May 1st, 2008 ~ April 30th, 2009
Principal Investigator: Dr. Sophia Ananiadou
Project Team (NaCTeM): Scott Piao and Brian Rea

Project Timetable
Project Flowchart
Project Documentation (Progress Reports & Presentations)

Progress of Project

1) Tools have been developed for indexing documents based on metadata (provided by UKOLN) and additional metadata generated by processing full texts. In particular, Genia POS tagger and Termine term extractor are integrated into the indexing package to extract terms from abstracts and pdf full-text documents (where available via the metadata) for indexing purpose. A sample index of over 197,000 documents, including about 3,500 full texts, has been created.

2) A demonstrator semantic document search package has been developed, in which advanced document searching functions are implemented, such as real time clustering of retrieved documents using Carrot2 package, term-based searching of similar and topic-sharing documents, complex query builing etc. In addition, the visualisation package Aduna has been integrated to graphically show the relationships between topics.

NaCTeM IRS Demo Site

Here is a video clip demonstrating the main functions of the NaCTeM IRS search demo site.

Click any of the screenshots below to access the demo site.

Figure 1: Simple search and cluster page:

Figure 2: Full document information page:

Figure 3: Document cluster visualisation page:

Figure 3: Complex query builder page: