Seminar — Stephen Clark
Speaker: | Dr Stephen Clark, Oxford University Computing Laboratory |
Title: | Porting a Lexicalized-Grammar Parser to the Biomedical Domain |
Date: | 11th June at 14:00 |
Location: | Room LG.010 in the MIB building |
Abstract: | In this talk I will describe the C&C parser, a a state-of-the-art, linguistically motivated statistical parser based on Combinatory Categorial Grammar, and describe some experiments on adapting the parser to the biomedical domain. The parser was originally developed using the Penn Treebank and is therefore tuned to newspaper text. The proposed porting approach takes advantage of the lexicalized nature of CCG to train the parser at a lower level of representation than full syntactic derivations. The CCG parser uses three levels of representation: a first level consisting of part-of-speech tags; a second level consisting of more fine-grained CCG lexical categories; and a third, hierarchical level consisting of CCG derivations. We find that simply retraining the POS tagger on biomedical data leads to a large improvement in performance, and that using annotated data at the intermediate lexical category level of representation improves parsing accuracy further. The parsing accuracies obtained for biomedical data are in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical resource on which the parser is evaluated. The conclusion is that porting newspaper parsers to the biomedical domain, at least for parsers which use lexicalized grammars, may not be as difficult as first thought.
|
Featured News
- ELLIS Workshop on Misinformation Detection - 16th June 2025
- 1st Workshop on Misinformation Detection in the Era of LLMs (MisD)- 23rd June 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- New Named Entity Corpus for Occupational Substance Exposure Assessment
Other News & Events
- CL4Health @ NAACL 2025 - Extended submission deadline - 04/02/2025
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens