Speculation and Negation Annotation for Arabic Biomedical Texts: BioArabic Corpus

By | August 11, 2018

Fatima T. AL-Khawaldeh
Department of Computer Science, Al-Albayt University, Al-Mafraq, Jordan.

Abstract—Negation and speculation are two common linguistic concepts in natural language processing field, need more semantic understanding of texts. They are used to definite factuality of text. Negation is used to express the opposite of the text and the Speculation is used to determine the degree of certainty. Biomedical text mining is the main natural language processing application concerns with negation and speculation to distinguish between facts and uncertain or negated information in biomedical text. To our knowledge, there is no previous research on annotating Arabic biomedical text to identify the negative or speculative expression and no publicly available standard corpora of suitable size that are practical for evaluating the automatic detection of negation and speculation tools and scope determination. This paper presents produced corpus handling negation and speculative in Arabic biomedical texts with the main annotation (we call this corpus the BioArabic corpus). The goal of building BioArabic corpus is to help biologists and computational linguistics, who develop tools for identifying the negation and speculation, to train and evaluate these tools since in
biomedical texts language, assumptions, experimental results and negative results are used extensively. We will report our statistics on corpus size and the consistency of annotations.

Keywords-Arabic NLP; negation; speculation; biomedical (medical and biological); cues; certainty.

