What others are saying about us
"Big Data" is a hot topic in the business world these days. But there's a subset of this broad field that has yet to take a turn in the spotlight.
It's called "text mining" and you're probably going to be hearing a lot more about it over the coming months and years ... Academic types are at the forefront of this effort,
and at least one country is already trying to help its eggheads with their text mining needs. England's recently established
National Centre for Text Mining is the first publicly funded text mining clearinghouse in the world, with the stated aim of furthering academic research.
Gary Belsky, 20th March 2012
Quoted from an article in Time Business entitled Why Text Mining May Be The Next Big Thing.
PubMed is most scientists' first port of call for literature searches, and there are many fine tutorials on this site explaining how to get the most from this tool. However, many people don't know that there are also several promising text-mining tools that offer more sophisticated text-searching functions, such as semantic searches, and are quite accessible to experimental biologists. These tools analyse the free text of an article using publicly available Medline data and extract relationships between the search terms, index these relationships, and present their results almost instantly... Text-mining tools which were developed by the UK's National Centre for Text Mining (NaCTeM) in Manchester ... [MEDIE, Kleio and Facta] have convenient web interfaces, so it's easy to give them a try and see if they are useful to you.
Of course, text mining is not perfect yet - the English language is so rich and varied that an idea can be expressed in a myriad of ways, not all of which are captured by the heuristic rules of the text-mining algorithm.
But these tools are so easy and fast to use that they can be added to your literature-searching repertoire today!
Dr Richard Adams, Senior Software Developer, Synthsys, University of Edinbugh, 9th Jaunary 2012
Quoted from a BitesizeBio blog entitled How Text-Mining Tools Can Improve Your Literature Searches.
Text mining has exciting applications for medicine. Conventional sifting of information can take weeks, and
exciting new connections could potentially be missed. Medical research is also increasing interdisciplinary,
including biology, chemistry, economics and other sciences. Being able to access information from other
fields is a tremendous benefit and can help generate new ideas. Access to NaCTeM will be a real boost for
our research teams, and a great incentive for new recruits.
Professor Phil Baker, Director, Biomedical Research Centre (BRC), Manchester
Quoted from joint press release regarding strategic partnership between the BRC and NaCTeM.
Sophia Ananiadou discussed some of NaCTeM's flagship tools like MEDIE, FACTA and KLEIO - it does
look like they're starting to take all the pain out of text mining, by doing the difficult bits for us, so we can
use the results to do actual mining.
Dr Andrew Clegg, Research Scientist, University College London
Quoted from blog post at biotext.org.uk concerning the Semantic Enrichment of the Scientific Literature 2009 (SESL 2009) workshop.
Over the last couple of years, scientists at Pfizer's research site in Sandwich have been making use of
the text mining tools and services developed by NaCTeM. One such tool, which has proven to be valuable, is
TerMine, an automatic multi-word term recognition tool that has been used at Pfizer to enrich the labourintensive
process of building dictionaries used for text mining.
Pfizer and NaCTeM have also been collaborating on a project called DECA (Disease Extraction with
Concept Association) to extract associations between concepts in the biomedical domain such as diseases
and symptoms from collections of biomedical texts (e.g. Medline). The aim of this project is to combine the
strengths of the NaCTeM text mining tools, Kleio and FACTA to create an efficient search for associations
between biomedical concepts. Also, a considerable amount of research is being applied to the challenge of
lexical disambiguation of the biomedical terms. Pfizer values highly the world-class quality of the linguistic
and semantic extraction skills and methodologies being developed and practised at NaCTeM which is
located in the highly appropriate setting of the Manchester Centre for Integrative Systems Biology.
Ian Harrow, Senior Principal Scientist, Pfizer
Internal communication
NaCTeM has engaged closely with users in systems biology to understand their needs and to provide
cutting edge text mining services. Researchers in systems biology need integrated approaches to generate
hypotheses and the use of text mining technology is a must for facilitating scientific discovery given the
amount of textual data generated daily. NaCTeM has tapped into this potential with great success. One of
the most impressive outcomes of the work of NaCTeM are the systems MEDIE and FACTA. Such
semantically based tools are important for the discovery of new knowledge in biology.
Professor Douglas Kell, Research Chair in Bioanalytical Science, University of Manchester
Internal communication
Sophia Ananiadou from NaCTeM explained the work her group has done using text mining techniques on
Medline abstracts. This is the third time I've heard her talk about this, and it gets more interesting each time.
Her aim is to enrich the literature by automatically creating semantic metadata, and thereby to make
"undiscovered science" accessible. The MEDIE system is the most vivid example she showed, allowing you
to construct a query in the form "subject-verb-object" For instance, you can ask "what does p53
activate" by searching for "subject=p53, verb=activate".
Frank Norman, Manager, Library & Information Service, National Institute for Medical Research, London
Quoted from "Trading knowledge" blog on
nature.com.
NaCTeM offers tools with an eye to interoperability and for which workflow software is important,
for example the Unstructured Information Management Architecture, or UIMA, formerly associated with IBM
and now an open project that runs in OASIS and Apache, and protocols such as SOAP for XML-based
message exchange...Users can mix and match the tools they need.
Vivien Marx
Article in BioInform, vol. 12, no. 11, "SciWit, NaCTeM Tailor Text-Mining Tools For Varying Needs of Biomedical Research"
Anyone with experience of lists of
abbreviations and acronyms will have
spotted that theyre seldom up to
date and often contain abbreviations
and acronyms which, from a cursory
internet search, seem to exist only in
lists, rather than out in the wild. So
an abbreviation list that is somehow
automatically generated from current
material would be extremely welcome.
AcroMine has been around for
a few years but you may not have
seen it before. The idea is to take
all of PubMed and look for word
sequences that regularly co-occur with
expressions in brackets that match.
But how well does it work across
disciplines? My first attempt was
a term used in nuclear magnetic
resonance spectroscopy: INEPT.
AcroMine correctly identifies this
as "insensitive nuclei enhanced by
polarization transfer". AcroMine
offers 22 hits for "MMR"; the most
common is, surprisingly, not the
vaccine against measles, mumps,
and rubella but "mismatch repair".
AcroMine even correctly offers "Large
Hadron Collider" as an expansion for "LHC".
Editor's Webwatch, European Scientific Editing, February 2009
At the JISC Collections AGM on 20 November 2008, Sophia Ananiadou, Director of the National Centre for Text Mining (NaCTeM) gave an excellent presentation on what text mining is, why it matters for researchers and how it helps to facilitate new and innovative research.
In the context of information overload and the problem of keeping up with the increasing amount of new literature available, Sophia made the point that much information on the web is unstructured (she estimates about 80%) and/or not searchable (e.g text in pdf or PowerPoint files, which cannot be found by ordinary search engines). She explained how text mining helps with not only finding relevant information, but can make intelligent connections to scholarship from other fields and provoke questions that might not otherwise have been asked.
The Centre has tools and services to help institutions and researchers do text mining - the current areas of focus are biology and the social sciences, but the Centre has had a lot of interest from publishers and they hope to expand on the disciplines they cover.
Sarah Gentleman, Research Information Network (RIN)
Quoted from RIN Team blog
Featured News
- Call for papers: CL4Health @ NAACL 2025
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine