Seminar — Jian Su
Speaker: | Dr Jian Su, Institute for Infocomm Research (I2R), Singapore |
Title: | 1) Coreference Resolution in Biology Literature: a Machine Learning Approach 2) An effective method of using Web based information for Relation Extraction |
Date: | 23rd April 2008 at 12:00 |
Location: | Room MLG.001 (Lecture Theatre) in the MIB Building |
Abstract: | 1) Coreference Resolution Coreference resolution, the process of identifying different mentions of an entity, is a very important technology in a text-mining system. Only with it, text mining systems such as a Protein Protein Interaction extraction system, could capture and link information expressed with norminal mentions (eg. "this protein") and pronoun mentions (eg besides "it") name mentions (eg, "P50"). Compared with the work in news articles, the existing study of coreference resolution in biomedical texts is quite preliminary by only focusing on specific types of anaphors like pronouns or definite noun phrases, using heuristic methods, and running on small data sets. Therefore, there is a need for an in-depth exploration of this task in the biomedical domain. In this talk, I'll present a learning-based approach to coreference resolution in the biomedical domain. In this study, we annotated a large scale coreference corpus, MedCo, which consists of 1,999 medline abstracts in the GENIA data set. We further proposed a detailed framework for the coreference resolution task, in which we augmented the traditional learning model by incorporating non-anaphors into training. Besides, we also explored various sources of knowledge for coreference resolution, particularly, those that can deal with the complexity of biomedical texts. The evaluation on our corpus showed promising results. We achieved a high precision of 86.2% with a reasonable recall of 63.9%, obtaining an F-measure of 73.4%. The results also suggested that our augmented learning model significantly boosts precision (up to 23.7%) without much loss in recall (less than 5%), which brings a gain of 8% in F-measure. 2) Relation Extraction In this talk, I'll address our method that incorporates paraphrase information from the Web to boost the performance of a supervised relation extraction system. Contextual information is extracted from the Web using a semi-supervised process, and summarized by skip-bigram overlap measures over the entire extract. This allows the capture of local contextual information as well as more distant associations. A statistically significant boost in relation extraction performance is observed. Two extensions, thematic clustering and hypernym expansion are investigated. In tandem with thematic clustering to reduce noise in the paraphrase extraction, we attempt to increase the coverage of our search for paraphrases using hypernym expansion. Evaluation of our method on the ACE 2004 corpus shows that it out-performs the baseline SVM-based supervised learning algorithm across almost all major ACE relation types, by a margin of up to 31%. This approach could be extend to relation extraction in biology literature, such as protein protein interaction extraction as well. |
Featured News
- ELLIS Workshop on Misinformation Detection - 16th June 2025
- 1st Workshop on Misinformation Detection in the Era of LLMs (MisD)- 23rd June 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- New Named Entity Corpus for Occupational Substance Exposure Assessment
Other News & Events
- CL4Health @ NAACL 2025 - Extended submission deadline - 04/02/2025
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens