<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title type="main" level="a">An experimental annotation task to investigate annotators’ subjectivity in a Misogyny dataset</title>
        <author>
          <persName n="1">
            <forename>Alice</forename>
            <surname>Tontodimamma</surname>
            <placeName type="affiliation">University of Chieti-Pescara G. D'Annunzio, Italy</placeName>
          </persName>
          <persName n="2" ref="https://orcid.org/0009-0000-5408-0104" type="ORCID">
            <forename>Stefano</forename>
            <surname>Anzani</surname>
            <placeName type="affiliation">University of Chieti-Pescara G. D'Annunzio, Italy</placeName>
          </persName>
          <persName n="3" ref="https://orcid.org/0000-0001-9337-7250" type="ORCID">
            <forename>Marco Antonio</forename>
            <surname>Stranisci</surname>
            <placeName type="affiliation">University of Turin, Italy</placeName>
          </persName>
          <persName n="4" ref="https://orcid.org/0000-0001-8110-6832" type="ORCID">
            <forename>Valerio</forename>
            <surname>Basile</surname>
            <placeName type="affiliation">University of Turin, Italy</placeName>
          </persName>
          <persName n="5">
            <forename>Elisa</forename>
            <surname>Ignazzi</surname>
            <placeName type="affiliation">University of Chieti-Pescara G. D'Annunzio, Italy</placeName>
          </persName>
          <persName n="6" ref="https://orcid.org/0000-0002-5441-0035" type="ORCID">
            <forename>Lara</forename>
            <surname>Fontanella</surname>
            <placeName type="affiliation">University of Chieti-Pescara G. D'Annunzio, Italy</placeName>
          </persName>
        </author>
        <respStmt>
          <resp>This is a section of <title>ASA 2022 Data-Driven Decision Making</title>(DOI: <idno type="DOI">10.36253/979-12-215-0106-3</idno>) by </resp>
          <name>Enrico di Bella, Luigi Fabbris, Corrado Lagazio</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>Firenze University Press</publisher>
        <pubPlace>Firenze</pubPlace>
        <date when="2023">2023</date>
        <idno type="DOI">https://doi.org/10.36253/979-12-215-0106-3.49</idno>
        <availability>
          <p>Available for academic research purposes</p>
          <p>Open Access</p>
          <p>Copyright Author(s)</p>
          <licence source="text" target="https://creativecommons.org/licenses/by/4.0/legalcode">
            <p>Content licence CC BY 4.0</p>
          </licence>
          <licence source="metadata" target="https://creativecommons.org/publicdomain/zero/1.0/legalcode">
            <p>Metadata licence CC0 1.0</p>
          </licence>
        </availability>
      </publicationStmt>
      <sourceDesc>
        <p>This is original content, published for academic research purposes</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <appInfo>
        <application version="2.2" ident="Booksflow">
          <desc>Digital edition XML powered by Booksflow</desc>
        </application>
      </appInfo>
    </encodingDesc>
    <profileDesc>
      <abstract xml:lang="en">
        <p>In recent years, hatred directed against women has spread exponentially, especially in online social media. Although this alarming phenomenon has given rise to many studies both from the viewpoint of computational linguistics and from that of machine learning, less effort has been devoted to analysing whether models for the detection of misogyny are affected by bias. An emerging topic that challenges traditional approaches for the creation of corpora is the presence of social bias in natural language processing (NLP). Many NLP tasks are subjective, in the sense that a variety of valid beliefs exist about what the correct data labels should be; some tasks, for example misogyny detection, are highly subjective, as different people have very different views about what should or should not be labelled as misogynous. An increasing number of scholars have proposed strategies for assessing the subjectivity of annotators, in order to reduce bias both in computational resources and in NLP models. In this work, we present two corpora: a corpus of messages posted on Twitter after the liberation of Silvia Romano on the 9th of May, 2020 and corpus of comments constructed starting from posts on Facebook that contained misogyny, developed through an experimental annotation task, to explore annotators’ subjectivity. For a given comment, the annotation procedure consists in selecting one or more chunk from each text that is regarded as misogynistic and establishing whether a gender stereotype is present. Each comment is annotated by at least three annotators in order to better analyse their subjectivity. The annotation process was carried by trainees who are engaged in an internship program. We propose a qualitative-quantitative analysis of the resulting corpus, which may include non-harmonised annotations.</p>
      </abstract>
      <textClass>
        <keywords>
          <list>
            <item>subjectivity</item>
            <item>misogyny</item>
            <item>disagreement</item>
            <item>social bias</item>
          </list>
        </keywords>
      </textClass>
    </profileDesc>
  </teiHeader>
  <text>
    <body>
      <p>It is available online at https://doi.org/10.36253/979-12-215-0106-3.49<ref target="https://doi.org/10.36253/979-12-215-0106-3.49" /></p>
      <div>
        <listBibl>
          <head>References</head>
          <bibl n="112233">Basile, V. (2020). It’s the end of the gold standard as we know it. on the impact of pre-aggregation on the evaluation of highly subjective tasks. In 2020 AIxIA Discussion Papers Workshop, AIxIA 2020 DP (Vol. 2776, pp. 31-40). CEUR-WS.</bibl>
          <bibl n="112234">Basile, V., Fell, M., Fornaciari, T., Hovy, D., Paun, S., Plank, B., ... &amp;amp; Uma, A. (2021). We Need to consider disagreement in evaluation. In 1st Workshop on Benchmarking: Past, Present and Future (pp. 15-21). Association for Computational Linguistics.</bibl>
          <bibl n="112235">Beigman Klebanov B., Beigman E., and Diermeier D. 2008. Analyzing disagreements. In Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics, pages 2–7, Manchester, UK. Coling 2008 Organizing Committee.</bibl>
          <bibl n="112236">Bowman, S. R., &amp;amp; Dahl, G. E. (2021). What Will it Take to Fix Benchmarking in Natural Language Understanding?. arXiv preprint arXiv:2104.02145.</bibl>
          <bibl n="112237">Davani, A. M., D&amp;#237;az, M., &amp;amp; Prabhakaran, V. (2022). Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10, 92-110.</bibl>
          <bibl n="112238">Fleiss, J. L., Cohen, J., &amp;amp; Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological bulletin, 72(5), 323.</bibl>
          <bibl n="112239">Landis JR., Koch GG. 1977. The measurement of observer agreement for categorical data. Biometrics. 1977 Mar;33(1):159-74. PMID: 843571.</bibl>
          <bibl n="112240">Lehnert, W., Cardie, C., Fisher, D., McCarthy, J., Riloff, E., &amp;amp; Soderland, S. (1992). University of Massachusetts: MUC-4 test results and analysis. In Fourth Message Uunderstanding Conference (MUC-4): Proceedings of a Conference Held in McLean, Virginia,</bibl>
          <bibl n="112241">Nozza, D., Volpetti, C., &amp;amp; Fersini, E. (2019, October). Unintended bias in misogyny detection. In Ieee/wic/acm international conference on web intelligence (pp. 149-155).</bibl>
          <bibl n="112242">Pavlopoulos, J., Sorensen, J., Laugier, L., &amp;amp; Androutsopoulos, I. (2021, August). Semeval-2021 task 5: Toxic spans detection. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 59-69).</bibl>
          <bibl n="112243">Uma, A., Fornaciari, T., Dumitrache, A., Miller, T., Chamberlain, J., Plank, B., ... &amp;amp; Poesio, M. (2021). Semeval-2021 task 12: Learning with disagreements. In Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) (pp. 338-3</bibl>
          <bibl n="112244">
            <bibl>Tontodimamma A., Fontanella L., Anzani S., Basile V. (2022). An Italian lexical resource for incivility detection in online discourses. Quality &amp;amp; Quantity.</bibl>
            <idno type="DOI">10.1007/s11135-022-01494-7</idno>
          </bibl>
        </listBibl>
      </div>
    </body>
  </text>
</TEI>