Seminar - Sreeram Balakrishnan
Speaker: | Sreeram Balakrishnan (Manager, Unstructured Information Management Architecture group, IBM India Research Lab) |
Title: | Factoid Extraction from the Web |
Date: | 12:30, Thursday 26th May |
Location: | Room F10, MSS Building |
Abstract: | The World Wide Web has grown into an information-mesh, with the most important facts being reported through websites. While the information is in plenty, its form is heavily unstructured, making it difficult to deploy an automated information retrieval system that could extract useful factoids. We present a new method capable of extracting relevant factoids from unstructured Web data (hypertext). A factoid is a news-item that might be of interest with respect to particular category such as change in leadership; in our case they are motivated by corporate or market changes that can be used for market intelligence purposes. We associate a factoid with a snippet of natural language text. Factoid extraction, for a given category, is formulated as a two-class classification problem. Feature abstraction using named entity annotations is used to ameliorate the data sparsity problem We present a method for learning a category specific classifier from a set of pure hand labelled positives and noisy positive instances generated by smartly querying the Web. The system is evaluated on two particular factoid categories, corporate leadership changes and mergers & acquisitions. The experiments yield promising empirical results. Time permitting I would also like to discuss IBM's open-source text analytics platform UIMA. |
Featured News
- ELLIS Workshop on Misinformation Detection - 16th June 2025
- 1st Workshop on Misinformation Detection in the Era of LLMs (MisD)- 23rd June 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- New Named Entity Corpus for Occupational Substance Exposure Assessment
Other News & Events
- CL4Health @ NAACL 2025 - Extended submission deadline - 04/02/2025
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens