1) Doctor of Philology, Professor, Head of the Laboratory of Language Con-vergence, Higher School of Economics – Saint-Petersburg, Russia, Saint-Petersburg, akolmogorova@hse.ru 2) Junior Testing Specialist, Yandex Crowd Limited Liability Company, Rus-sia, Moscow, va.khleb@yandex.ru
The paper examines the results of a methodology known as Bayesian Truth Serum (BTS) when applied in the task of emotional markup of texts for neural network models training. Bayesian truth serum is used in experimental surveys for which the category of truth is not applicable – the results obtained in such surveys cannot be verified by comparing them with a certain standard, since the latter does not exist. The framework presupposes that informants are first asked to evaluate a certain phenomenon from their own point of view, and then – to predict which answer (or assessment) the largest percentage of other respondents to the same questionnaire will choose. Emotional markup of texts is also a task in which we do not know the true emotion, but we can stimulate the truthfulness of informants using the BTS method. We applied this methodology to evaluate 300 emotional texts retrieved from the group “Overheard” on VKontakte. 120 informants took part in the emo-tional markup procedure carried out with tenfold coverage. The markup design was based on the Russell–Mehrabian PAD model: informants were asked to assess what emotions the author of the text was experiencing, using three eleven–point scales (Pleasure, Arousal, Dominance). When processing the results, we compared the average values of the standard deviation in personal and predicted estimates given by the informants on each of the three scales. Then we formed a subcorpus of texts with the greatest inconsistency of personal and predicted estimates and analyzed the words frequency in each of the subcorpora. In addition, according to the emotional tags under which the texts were published in VK, we calculated the “weight” of eight emotions for these texts with the greatest discrepancies. The main hypothesis was that the greatest inconsistency of personal and predicted assessment will be found in those texts that describe a certain situation of social interaction, for which the demonstration of emotional behavior is regulated by some implicit rules. The study resulted in the following conclusions: 1) the spread in the set of personal esti-mates and predicted ones has no statistically significant differences; 2) texts having the greatest discrepancy between personal and predicted emotional assessment con-cern three main type of situations: relationships within a couple (husband – wife, boyfriend – girlfriend), mother-child relationships and deviant behavior that poses a threat to the safety of the family and other members of society; 3) the largest num-ber of texts showing most of the discrepancies in ratings, is marked with emotional hashtags associated with the emotions of fear, disgust, surprise, excitement and sad-ness. The prospect of the study is the continuation of experiments with markup on different samples of informants.
emotional texts; markup; Bayesian truth serum; emotion detection; emotion PAD model.
Download textFor citing: Kolmogorova A.V., Khlebnikova V.A. (2025) Will Bayesian truth serum help to increase the reliability of markup of emotional texts? (case study). Human being: Image and essence. Humanitarian aspects. Moscow. INION RAN.Vol. 2 (62). pp. 45-68. DOI: 10.31249/chel/2025.02.03