Staff view:

Cover Image

Printed Text Recognition for Lexical Lists in Chinese- International Phonetic Alphabet (IPA) Glossing

Click here to view record

Main author:	Hill, Nathan W.
Other authors:	Li, Shihua
Format:	Journal Article
Online access:	Click here to view record

id	eprints-40719
recordtype	eprints
institution	SOAS, University of London
collection	SOAS Research Online
language	English
language_search	English
description	This study presents a dataset serving as a benchmark for the recognition of printed text in lexical lists using Chinese-IPA glossing. The paper provides an overview of the baseline model, transcription model, and PyLaia engines employed in the research. Furthermore, it elucidates the specific need for digitizing the aforementioned lexical lists, outlines the methodology employed for training the baseline model for layout analysis, and describes the training process of the transcription model using the ground truth data generated on Transkribus. This comprehensive approach encompasses both the images of the lexical list content and their corresponding transcriptions as input. Additionally, the study highlights the limitations of the model and identifies avenues for future development. By making this dataset openly accessible, it can be utilized by researchers seeking to digitize lexical lists using Chinese-IPA glossing. Moreover, since the model can recognize both Chinese characters and IPA symbols, it has the potential to contribute to linguistic analysis of languages documented in Chinese-IPA glossing.
format	Journal Article
author	Hill, Nathan W.
author_facet	Hill, Nathan W. Li, Shihua
authorStr	Hill, Nathan W.
author_letter	Hill, Nathan W.
author2	Li, Shihua
author2Str	Li, Shihua
title	Printed Text Recognition for Lexical Lists in Chinese- International Phonetic Alphabet (IPA) Glossing
publisher	Ubiquity Press
publishDate	2023
url	https://eprints.soas.ac.uk/40719/