A complex NLP system consists of a pipeline of several fundamental subtasks, e.g. word segmentation, POS tagging, named entity recognition, syntactic parsing, etc. In the traditional method, each subtask is trained separately, assuming that it will perform with the best accuracy. As a result, the upper bound of the system’s accuracy is the product of these subtasks’ accuracies. This is a bottleneck in boosting the system’s accuracy.
One way to alleviate this performance bottleneck is to share information among these relevant subtasks by constructing a joint model. For example, word segmentation will perform better if it can determine the context pattern in terms of POS tags. In the meantime, POS tagging will also perform better if word segmentation is correct. A joint model of word segmentation and POS tagging will allow bidirectional information flow between these subtasks, yielding better accuracy.
This tutorial will teach you step-by-step how to develop a joint neural model of word segmentation and POS tagging. We will guide you through the development process using easy-to-understand PyTorch. There are three sections in this tutorial:
1) Deep NLP with PyTorch
2) Joint models of word segmentation and POS tagging
On behalf of the organizing committee of iSAI-NLP 2019, we are excited to hold iSAI-NLP Challenge 2019. We are inviting all participants to develop a joint model of word segmentation and POS tagging for Thai and Myanmar as inspired by the methods taught in this tutorial. We will evaluate the performance of all submitted code in terms of F1 scores (the geometric mean of precision and recall) of word segmentation and POS tagging.
Thai: 10,000 sentences annotated with word boundaries and POS tags
Download: You will get dataset after your submit form via email
Myanmar: 10,000 sentences annotated with word boundaries and POS tags
Follow this like to the submission form