Termine Web Service (Request Access)
Termine Web Service provides a SOAP interface, where you can use the candidate multiword term extraction component from your client programs.
To use this service, you must add a software key to your soap client.
WSDL
Client sample
- Python with ZSI 2.0 [source] [annotated]
- PHP5 with SOAP extension enabled [source] [annotated]
- Java (written by Irena Spasic) [source] [annotated]
Overview of the API
Termine Web Service provides the SOAP interface that exports a function analyze. It takes a source text as a string and returns the term-extraction result as a string. A source string can be either a natural English text, e.g.,
Technical terms are important for knowledge mining, especially in the bio-medical area where vast amount of documents are available.or the one with part-of-speech tags,
Technical Technical JJ terms term NNS are be VBP important important JJ for for IN knowledge knowledge NN mining mining NN , , , especially especially RB in in IN the the DT bio-medical bio-medical JJ area area NN where where WRB vast vast JJ amount amount NN of of IN documents document NNS are be VBP available available JJ . . . EOS
It is recommended that a natural English text consists of multiple lines each of which represents a sentence. The result of term extraction may be inaccurate if a sentence spans several lines.
A text with part-of-speech tags consists of multiple lines each of which presents a word, its base-form, and part-of-speech separated by TAB characters. A line with "EOS" represents an end of a sentence. The part-of-speech tags should be compatible with the Penn Treebank Project.
The function analyze takes string arguments as follows.
- src (required): a source text with or without part-of-speech tags
- key (required): the key for access to this service. This is provided to you when you apply for access.
- input_format (optional; default="plain.genia"):
- "plain.genia": The source text presents natural English sentences (the server will process the text by using GENIA tagger)
- "post.genia": The source text presents sentences with part-of-speech tags annotated by GENIA tagger
- output_format (optional; default="plain"):
- "plain": The result will be returned in plain text
- "xml": The result will be returned in XML
- stoplist (optional; default=""): A list of stop words separated by whitespace characters. The service does not apply a stoplist without this argument specified.
- filter (optional; default="{JJ}*{NN}+"): A part-of-speech patter to extract terms
The function returns the analysis result in result variable in a SOAP response. An analysis result will be a list of candidate multiword terms and their C-Value scores. This is an example of the result for the sample sentence.
1.000000 technical term 1.000000 knowledge mining 1.000000 bio-medical area 1.000000 vast amount