NaCTeM

NaCTeM Software Tools

The National Centre for Text Mining bases its service systems on a number of text mining software tools.

  • Part-of-speech (POS) taggers
  • Parsers
  • Named entitities/terms
    • AnatomyTagger — an open-source entity mention tagger for anatomical entities
    • Named-entity Recognizer — Part of the GENIA Tagger
    • NEMine — Recognizes gene/protein names in text.
    • Yeast MetaboliNER — Recognizes yeast metabolite names in text.
    • ACELA — Tool for efficient annotation of named entitites
    • Smart dictionary lookup — machine learning-based gene/protein name lookup
    • Smart Dictionary Lookup Tool Web Service — Looks up term variations of a given gene/protein name based on an automatically trained similarity measure
    • Term Normalization Tool — Normalizes terms with string rewriting rules automatically generated based on a dictionary.
    • DECA — A species disambiguation system for biological named entities
    • RF-TermAlign — a bilingual dictionary extraction tool that uses a Random Forest method to learn string similarity of terms between a source and target language.
  • Other tools
    • APLenty — An annotation tool for creating high-quality sequence labelling datasets using active and proactive learning
    • Paladin — A document classification annotation web application which supports active/proactive learning.
    • RobotAnalyst — A tool to minimise the human workload involved in the study identification phase of systematic reviews.
    • EventMine — A machine learning-based event extraction system.
    • brat — A free, open-source, web-based tool for text annotation visualisation and editing.
    • Cafetiere — An easy-to-use text mining system for carrying text mining on your own document collection
    • Sentence and paragraph breaker — An accurate sentence and paragraph detector based on heuristic rules
    • Clinical Document Classification — automatic document classification demo
    • Sentiment Analysis Tool — Analyses sentiment of input text.