Corpus-based Contrastive Analysis of Discourse Connectives: Methods and Findings
In my talk, I will present empirical investigations on discourse connectives that are performed within the COMTIS project, a large SNF-funding project that deals with textual coherence in statistical machine translation (SMT). Discourse connectives such as English "while", "because", or French "puisque", "alors que", "tant que" are frequently used to mark textual coherence. They are very often multifunctional (they can convey different discourse relations at the same time), and the description of their meaning(s) is a real challenge, particularly in multilingual contexts. After a brief presentation of the COMTIS project, I will show why a corpus-based contrastive analysis is needed as a first step for SMT implementation, in order to highlight similarities and divergences of discourse connectives in various languages. I will discuss methodological issues of using corpora in contrastive analysis, and will show the different steps we have developed. With the examples of two distinct analyses (the English connective while and the causal connectives in French and English), I will argue that contrastive analysis may not only account for the equivalence of discourse connectives between two languages, but also brings new descriptive information on the compared languages taken individually.