NaCTeM Software Tools
The National Centre for Text Mining bases its service systems on a number of text mining software tools.
- Part-of-speech (POS) taggers
- A part-of-speech tagger for English
- GENIA Tagger — Part-of-speech tagging for biomedical text (Web Service )
- Parsers
- Enju — A deep syntactic parser for English
- CFG Parser — A fast CFG parser for English
- GENIA Tagger — Shallow parsing for biomedical text. (Web Service )
- Named entitities/terms
- Named-entity Recognizer — Part of the GENIA Tagger
- NEMine — Recognizes gene/protein names in text.
- ACELA — Tool for efficient annotation of named entitites
- Smart dictionary lookup — machine learning-based gene/protein name lookup
- Smart Dictionary Lookup Tool Web Service — Looks up term variations of a given gene/protein name based on an automatically trained similarity measure
- Term Normalization Tool — Normalizes terms with string rewriting rules automatically generated based on a dictionary.
- Other tools
- Sentence and paragraph breaker — An accurate sentence and paragraph detector based on heuristic rules
- Clinical Document Classification — automatic document classification demo
- Sentiment Analysis Tool — Analyses sentiment of input text.
- Cheshire 3 — fast XML search engine, which is the latest incarnation of the Cheshire system developed at UC Berkeley.
Featured News
- Text mining enhances Educational Evidence Portal - new article and demo site
- Medal of honour awarded to Professor Tsujii
- Improved acronym disambiguation - release of updated software service and paper
- Species disambiguation of biomedical named entities- release of software, corpus and article
- Launch of new features on UKPMC website
- New Biomedical Event Corpus (GREC) released
- ELRA Distribution Agreement signed for BioLexicon





