Seminar — Jian Su
Speaker: | Dr Jian Su, Institute for Infocomm Research (I2R), Singapore |
Title: | 1) Coreference Resolution in Biology Literature: a Machine Learning Approach 2) An effective method of using Web based information for Relation Extraction |
Date: | 23rd April 2008 at 12:00 |
Location: | Room MLG.001 (Lecture Theatre) in the MIB Building |
Abstract: | 1) Coreference Resolution Coreference resolution, the process of identifying different mentions of an entity, is a very important technology in a text-mining system. Only with it, text mining systems such as a Protein Protein Interaction extraction system, could capture and link information expressed with norminal mentions (eg. "this protein") and pronoun mentions (eg besides "it") name mentions (eg, "P50"). Compared with the work in news articles, the existing study of coreference resolution in biomedical texts is quite preliminary by only focusing on specific types of anaphors like pronouns or definite noun phrases, using heuristic methods, and running on small data sets. Therefore, there is a need for an in-depth exploration of this task in the biomedical domain. In this talk, I'll present a learning-based approach to coreference resolution in the biomedical domain. In this study, we annotated a large scale coreference corpus, MedCo, which consists of 1,999 medline abstracts in the GENIA data set. We further proposed a detailed framework for the coreference resolution task, in which we augmented the traditional learning model by incorporating non-anaphors into training. Besides, we also explored various sources of knowledge for coreference resolution, particularly, those that can deal with the complexity of biomedical texts. The evaluation on our corpus showed promising results. We achieved a high precision of 86.2% with a reasonable recall of 63.9%, obtaining an F-measure of 73.4%. The results also suggested that our augmented learning model significantly boosts precision (up to 23.7%) without much loss in recall (less than 5%), which brings a gain of 8% in F-measure. 2) Relation Extraction In this talk, I'll address our method that incorporates paraphrase information from the Web to boost the performance of a supervised relation extraction system. Contextual information is extracted from the Web using a semi-supervised process, and summarized by skip-bigram overlap measures over the entire extract. This allows the capture of local contextual information as well as more distant associations. A statistically significant boost in relation extraction performance is observed. Two extensions, thematic clustering and hypernym expansion are investigated. In tandem with thematic clustering to reduce noise in the paraphrase extraction, we attempt to increase the coverage of our search for paraphrases using hypernym expansion. Evaluation of our method on the ACE 2004 corpus shows that it out-performs the baseline SVM-based supervised learning algorithm across almost all major ACE relation types, by a margin of up to 31%. This approach could be extend to relation extraction in biology literature, such as protein protein interaction extraction as well. |
Featured News
- Congratulations to PhD student Panagiotis Georgiades
- 24-month postdoctoral research position in Athens, Greece
- PhD opportunity in collaboration with Athens Univ. of Economics and Business
- iCASE EPSRC funded PhD- multimodal NLP - UoM & BAE - Application deadline 30th April 2024
- CFP: BIONLP 2024 and Shared Tasks @ ACL 2024
- Advances in Data Science and Artificial Intelligence Conference 2024
Other News & Events
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine
- New review article on emotion detection for misinformation
- BioNLP 2024 accepted as workshop at ACL 2024