id |
eprints-25373
|
recordtype |
eprints
|
institution |
SOAS, University of London
|
collection |
SOAS Research Online
|
language |
English
|
language_search |
English
|
description |
This paper presents a new approach to two challenging NLP tasks in Classical Tibetan: word segmentation and Part-of-Speech (POS) tagging. We demonstrate how both these problems can be approached in the same way, by generating a memory-based tagger that assigns 1) segmentation tags and 2) POS tags to a test corpus consisting of unsegmented lines of Tibetan characters. We propose a three-stage workflow and evaluate the results of both the segmenting and the POS tagging tasks. We argue that the Memory-Based Tagger (MBT) and the proposed workflow not only provide an adequate solution to these NLP challenges, they are also highly efficient tools for building larger annotated corpora of Tibetan.
|
format |
Journal Article
|
author |
Hill, Nathan W.
|
author_facet |
Hill, Nathan W.
Meelen, Marieke
|
authorStr |
Hill, Nathan W.
|
author_letter |
Hill, Nathan W.
|
author2 |
Meelen, Marieke
|
author2Str |
Meelen, Marieke
|
title |
Segmenting and POS tagging Classical Tibetan using a memory-based tagger
|
publisher |
University of California
|
publishDate |
2017
|
url |
https://eprints.soas.ac.uk/25373/
|