Version 2.4.2 is available since Jun. 16th, 2011
Online demo is available!
Enju is a syntactic parser for English. With a wide-coverage probabilistic HPSG grammar [1-7] and an efficient parsing algorithm [8-11], this parser can effectively analyze syntactic/semantic structures of English sentences and provide a user with phrase structures and predicate-argument structures. Those outputs would be especially useful for high-level NLP applications, including information extraction, automatic summarization, and question answering, where the "meaning" of a sentence plays a central role.
The main features of the Enju parser are:
Other useful features are:
"-xml"
. The parser adds XML tags to an original
text, and it is useful when parse results are merged with other
processing results (e.g. named entities). A stand-off format is
also available (specify "-so"
)."-genia"
."-brown"
."mogura -super"
"enju2ptb/convert < ENJU_XML_OUTPUT > PTB_STYLE_OUTPUT"
"-A"
. Parsing accuracy improves, while parsing
speed gets slower."-N"
. This is an experimental function, and
parsing speed gets slower.For any inquiry, contact us.
The source package and pre-trained models of Enju are available at GitHub
You can try Enju before download via online demo.
To parse sentences, put a file (having one sentence per line) to the standard input. For example, when you have the file "RAWTEXT" that contains:
He runs the company.
The company that he runs is small.
Run the following command.
> enju < RAWTEXT > RESULTS
Parsing results are output to the file "RESULTS". "Demo and web interface" shows you some examples of parsing results.
You can alternatively use a high-speed parser by using the command "mogura"
> mogura < RAWTEXT > RESULTS
These commands work in mostly the same way.
When you want to parse texts already tagged with Penn Treebank-style POS tags,
> enju -nt < TAGGEDTEXT > RESULTS
The default output of the parser is a set of predicate-argument relations. Alternatively, you can get both the phrase structures and predicate-argument relations either in a quasi-XML format or in a stand-off format.
> enju -xml < RAWTEXT > RESULTS
> enju -so < RAWTEXT > RESULTS
You can also use Enju as a CGI server.
> enju -cgi PORT_NUMBER
You can access to the port PORT_NUMBER
with a CGI query,
and receive parsing results in the XML format.
http://localhost:PORT_NUMBER/cgi-lilfes/enju?sentence=he+runs+the+company
For further details on the output formats, see the manuals and the technical report.
Unlike conventional parsers using CFGs, the default output of the parser is a set of predicate-argument relations, so the user can easily acquire semantic relations among words in an input sentence without the burden of analyzing its deep-syntactic structure.
Parsing examples are shown below. Each line in the output represents a predicate-argument relation between two words. For instance, the second line in the first example indicates that there is an "ARG1 (logical subject)" relation between the predicate "run" and the argument "he". Note that the same semantic relations holding among the three words, "he", "run", and "company", are obtained from sentences written in different syntactic structures.
ROOT | ROOT | ROOT | ROOT | -1 | ROOT | ROOT | runs | run | VBZ | VB | 1 |
runs | run | VBZ | VB | 1 | verb_arg12 | ARG1 | He | he | PRP | PRP | 0 |
runs | run | VBZ | VB | 1 | verb_arg12 | ARG2 | company | company | NN | NN | 3 |
the | the | DT | DT | 2 | det_arg1 | ARG1 | company | company | NN | NN | 3 |
ROOT | ROOT | ROOT | ROOT | -1 | ROOT | ROOT | is | be | VBZ | VB | 5 |
is | be | VBZ | VB | 5 | verb_arg12 | ARG1 | company | company | NN | NN | 1 |
is | be | VBZ | VB | 5 | verb_arg12 | ARG2 | small | small | JJ | JJ | 6 |
small | small | JJ | JJ | 6 | adj_arg1 | ARG1 | company | company | NN | NN | 1 |
The | the | DT | DT | 0 | det_arg1 | ARG1 | company | company | NN | NN | 1 |
that | that | IN | IN | 2 | relative_arg1 | ARG1 | company | company | NN | NN | 1 |
runs | run | VBZ | VB | 4 | verb_arg12 | ARG1 | he | he | PRP | PRP | 3 |
runs | run | VBZ | VB | 4 | verb_arg12 | ARG2 | company | company | NN | NN | 1 |
Enju can also output both phrase structures and predicate-argument structures in a quasi-XML format. The following pages show the phrase structure and the predicate argument structure for the sentence "It's falling like a stone, said Danny Linger, a pit trader who was standing outside the London International Financial Futures Exchange."
Note: Firefox shows a graphical view, while Internet Explorer shows a bare XML document.The online demo is available to see how Enju works.
UIMA Web Interface for Enju is also available. You can embed Enju in UIMA workflows.