Evaluating Rhyme Annotations for Large Corpora: Metrics and Data

Main author: Baley, Julien
Format: Journal Article           
Online access: Click here to view record


id eprints-39551
recordtype eprints
institution SOAS, University of London
collection SOAS Research Online
language English
language_search English
description Recent methods have been proposed to produce automatic rhyme annotators for large rhymed corpora. These methods, such as Baley (2022b) greatly reduce the cost of annotating rhymed material, allowing historical linguists to focus on the analysis of the rhyme patterns. However, evidence for the quality of those annotations has been anecdotal, consisting of a handful of individual poem case studies. This paper proposes to address the issue: first, we discuss previously proposed metrics that evaluate the quality of an annotator’s output against a ground-truth annotation (List, Hill, and Foster (2019)) and we propose an alternative metric that is better suited to the task. Then, sampling from Baley’s published annotated corpus and re-annotating it by hand, we use the sample to demonstrate the lacunae in the original approach and show how to fix them. Finally, the hand-annotated sample and source code are published as additional data, so that other researchers can compare the performance of their own annotators.
format Journal Article
author Baley, Julien
author_facet Baley, Julien
authorStr Baley, Julien
author_letter Baley, Julien
title Evaluating Rhyme Annotations for Large Corpora: Metrics and Data
publisher Brill
publishDate 2023
url https://eprints.soas.ac.uk/39551/