Share
Export Citation
Enhancing Bugis Language POS Tagging Using Recurrent Neural Networks and Semi-Supervised Self-Training
Latief A.D.
International Conference on Computer Control Informatics and Its Applications Ic3ina
Abstract
This research employed a semi-supervised learning approach to increase the dataset size for the part-of-speech (POS) tagging task in Bugis, a low-resource language. One of the main challenges in performing NLP tasks for low-resource languages is the scarcity of data. To address this, we used self-training, a method in which the model iteratively labeled its predictions on unlabeled data, combined with a Recurrent Neural Network (RNN) model. The RNN model was trained using word embeddings generated by a FastText model specifically for the Bugis language and was then applied to predict POS tags on unlabeled data. We utilized the cosine similarity method to improve prediction accuracy, which helped the model identify the most similar tags when encountering uncertainty. We separated prediction data based on confidence levels using a threshold value of 0.5, enabling us to distinguish between high-confidence and low-confidence predictions. The evaluation results indicate that the RNN model achieved an accuracy of 97.93%. Additionally, we conducted experiments by applying the model to predict various POS tags across different sizes of unlabeled data. The findings reveal that this approach effectively expanded the dataset size, improved the model’s accuracy, and enhanced predictive performance on previously unlabeled data.