학술논문

Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Document Type
Source
Datorlabbet resultaten i staten Proceedings of the First Workshop on Natural Language Processing for Political Sciences (PoliticalNLP). :46-55
Subject
Causality detection
dataset
cross-lingual transfer
Datorlingvistik
Computational Linguistics
Language
English
Abstract
Causality detection is the task of extracting information about causal relations from text. It is an important task for different types of document analysis, including political impact assessment. We present two new data sets for causality detection in Swedish. The first data set is annotated with binary relevance judgments, indicating whether a sentence contains causality information or not. In the second data set, sentence pairs are ranked for relevance with respect to a causality query, containing a specific hypothesized cause and/or effect. Both data sets are carefully curated and mainly intended for use as test data. We describe the data sets and their annotation, including detailed annotation guidelines. In addition, we present pilot experiments on cross-lingual zero-shot and few-shot causality detection, using training data from English and German.