Batch submission to TerMine (request access)
Terms of Use
By using the TerMine service, you agree to the general Terms and Conditions of Use for the NaCTeM Website, in addition to the following Terms of Use for TerMine:- Please let us know that you are using TerMine by email.
- Please cite the following when publishing work that uses TerMine:
Frantzi, K., Ananiadou, S. and Mima, H. (2000) Automatic recognition of multi-word terms. International Journal of Digital Libraries 3(2), pp.117-132. - Please credit and link to the NaCTeM website (http://www.nactem.ac.uk/) in any electonic services beased on the TerMine service or resulting data.
- Please contact us in advance if you plan to use the service for bulk processing. TerMine is a freely available service from the academic domain. This means that it is necessary to limit server load and give preference to individual users. Excessive server load may result in IP addresses or institutions being blocked from using the TerMine service. There is a limit enforced on how many times unregistered users may use this service per day.
When you submit a batch request, your job will enter a queue. When your job is complete, you will receive an email containing the URL where you can view the results.
Please note: if you want to analyze a PDF document, you must specify a URL. PDF uploading is not currently supported.
About the C-value and TerMine ...
Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available. A domain independent method for term recognition is very useful to automatically recognize terms from documents.
C-value is a domain-independent method for automatic term recognition (ATR) of candidate multiword terms which combines linguistic and statistical analyses; emphasis being placed on the statistical part. The linguistic analysis enumerates all candidate terms in a given text by applying part-of-speech tagging, extracting word sequences of adjectives/nouns based, and stop-list. The statistical analysis assigns a termhood to a candidate term by using the following four characteristics:
- the occurrence frequency of the candidate term
- the frequency of the candidate term as part of other longer candidate terms
- the number of these longer candidate terms
- the length of the candidate term
We have been developing a system for terminological management called TerMine. It employs the C-value method to extract terms. The implementation is optimized for scalability and processing speed: given a set of 1.3 million MEDLINE abstracts (2GB text), TerMine (standalone version) extracts 9.8 million term candidates and their termhood scores in about ten minutes.
Featured News
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Congratulations to PhD student Panagiotis Georgiades
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- Advances in Data Science and Artificial Intelligence Conference 2024
- New review article on emotion detection for misinformation