Segmenting and POS tagging Classical Tibetan using a memory-based tagger

Main author: Hill, Nathan W.
Other authors: Meelen, Marieke
Format: Journal Article           
Online access: Click here to view record


id eprints-25373
recordtype eprints
institution SOAS, University of London
collection SOAS Research Online
language English
language_search English
description This paper presents a new approach to two challenging NLP tasks in Classical Tibetan: word segmentation and Part-of-Speech (POS) tagging. We demonstrate how both these problems can be approached in the same way, by generating a memory-based tagger that assigns 1) segmentation tags and 2) POS tags to a test corpus consisting of unsegmented lines of Tibetan characters. We propose a three-stage workflow and evaluate the results of both the segmenting and the POS tagging tasks. We argue that the Memory-Based Tagger (MBT) and the proposed workflow not only provide an adequate solution to these NLP challenges, they are also highly efficient tools for building larger annotated corpora of Tibetan.
format Journal Article
author Hill, Nathan W.
author_facet Hill, Nathan W.
Meelen, Marieke
authorStr Hill, Nathan W.
author_letter Hill, Nathan W.
author2 Meelen, Marieke
author2Str Meelen, Marieke
title Segmenting and POS tagging Classical Tibetan using a memory-based tagger
publisher University of California
publishDate 2017
url https://eprints.soas.ac.uk/25373/