CHR Dataset
The CHemical Reactions dataset (CHR) is a distantly supervised dataset dealing with binary interactions between chemicals.
Description
The dataset consists of 12,094 abstracts and their titles from PubMed. The annotation of chemicals was performed using the back-end of the semantic faceted search engine Thalia. Chemical compounds were selected from the annotated entities and aligned with the graph database Biochem4j, a freely available database that integrates several resources such as UniProt, KEGG and the NCBI Taxonomy. If two chemical entities were identified as related in Biochem4j, they were considered as positive instances in the dataset, otherwise as negative. In total, the corpus contains over 100,000 annotated chemicals and 30,000 reactions.Download
The CHR dataset is available for download. Please observe the terms of the licence if you use the dataset.Licence
The annotations in the CHR dataset were created at the National Centre for Text Mining (NaCTeM), School of Computer Science, University of Manchester, UK. They are licensed under a Creative Commons Attribution 4.0 International License.
PLEASE ATTRIBUTE NaCTeM WHEN USING THE CORPUS, AND PLEASE CITE THE FOLLOWING ARTICLE:
Sunil K Sahu, Fenia Christopoulou, Makoto Miwa and Sophia Ananiadou. 2019. Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network. In Proceedings of ACL.
References
Sunil K Sahu, Fenia Christopoulou, Makoto Miwa and Sophia Ananiadou. 2019. (In Press). Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network. In Proceedings of ACL.
Axel J Soto, Piotr Przybyła and Sophia Ananiadou. 2018. Thalia: Semantic search engine for biomedical abstracts. Bioinformatics, 35(10): 1799-1801
Neil Swainston, Riza Batista-Navarro, Pablo Carbonell, Paul D Dobson, Mark Dunstan, Adrian J Jervis, Maria Vinaixa, Alan R Williams, Sophia Ananiadou, Jean-Loup Faulon et al. 2017. biochem4j: Integrated and extensible biochemical knowledge through graph databases. PloS ONE, 12(7): e0179130
Featured News
- Call for papers: CL4Health @ NAACL 2025
- Prof. Sophia Ananiadou accepted as an ELLIS fellow
- Invited talk at the 15th Marbach Castle Drug-Drug Interaction Workshop
- BioNLP 2025 and Shared Tasks accepted for co-location at ACL 2025
- Prof. Junichi Tsujii honoured as Person of Cultural Merit in Japan
- Participation in panel at Cyber Greece 2024 Conference, Athens
- Shared Task on Financial Misinformation Detection at FinNLP-FNP-LLMFinLegal
- New Named Entity Corpus for Occupational Substance Exposure Assessment
- FinNLP-FNP-LLMFinLegal @ COLING-2025 - Call for papers
Other News & Events
- Keynote talk at Manchester Law and Technology Conference
- Keynote talk at ACM Summer School on Data Science, Athens
- Invited talk at the 8th Annual Women in Data Science Event at the American University of Beirut
- Invited talk at the 2nd Symposium on NLP for Social Good (NSG), University of Liverpool
- Invited talk at Annual Meeting of the Danish Society of Occupational and Environmental Medicine