Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan

Main author: Lugli, Ligeia
Format: Book Chapters           
Online access: Click here to view record


id eprints-31849
recordtype eprints
institution SOAS, University of London
collection SOAS Research Online
language English
language_search English
description Traditional lexicography requires titanic efforts and enormous resources. For many languages, such resources have never been available. As a result, they have received only limited lexicographic coverage. Today, these languages can take advantage of many of the same digital tools and strategies that have simplified and expedited dictionary-making for mainstream languages. However, the resource gap remains evident even in the digital era, with basic corpus processing tasks that lie at the foundation of contemporary ‘smart lexicography’ still constituting a challenge for many under-resourced languages. Drawing on my own experience in Sanskrit and Tibetan lexicography, this paper aims to offer some guidance as to the advantages and limitations of the application of smart lexicography to under-resourced languages. In particular, this paper suggests that in order to optimize resources, it may be advisable to prioritize high-quality lexical annotation of the corpus over highly curated dictionary entries, and to let digital tools take care of the lexicographic representation of the annotated linguistic information.
author_additional Kosem, Iztok
author_additionalStr Kosem, Iztok
format Book Chapters
author Lugli, Ligeia
author_facet Lugli, Ligeia
authorStr Lugli, Ligeia
author_letter Lugli, Ligeia
title Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan
publisher Lexical Computing CZ
publishDate 2019
url https://eprints.soas.ac.uk/31849/