Share

Export Citation

APA
MLA
Chicago
Harvard
Vancouver
BIBTEX
RIS
Universitas Hasanuddin
Research output:Contribution to journalArticlepeer-review

Enhancing Bugis Language POS Tagging Using Recurrent Neural Networks and Semi-Supervised Self-Training

Latief A.D.

International Conference on Computer Control Informatics and Its Applications Ic3ina

Published: 2024Citations: 4

Abstract

This research employed a semi-supervised learning approach to increase the dataset size for the part-of-speech (POS) tagging task in Bugis, a low-resource language. One of the main challenges in performing NLP tasks for low-resource languages is the scarcity of data. To address this, we used self-training, a method in which the model iteratively labeled its predictions on unlabeled data, combined with a Recurrent Neural Network (RNN) model. The RNN model was trained using word embeddings generated by a FastText model specifically for the Bugis language and was then applied to predict POS tags on unlabeled data. We utilized the cosine similarity method to improve prediction accuracy, which helped the model identify the most similar tags when encountering uncertainty. We separated prediction data based on confidence levels using a threshold value of 0.5, enabling us to distinguish between high-confidence and low-confidence predictions. The evaluation results indicate that the RNN model achieved an accuracy of 97.93%. Additionally, we conducted experiments by applying the model to predict various POS tags across different sizes of unlabeled data. The findings reveal that this approach effectively expanded the dataset size, improved the model’s accuracy, and enhanced predictive performance on previously unlabeled data.

Other files and links

Fingerprint

Computer scienceSciences
Artificial intelligenceSciences
Natural language processingSciences
Artificial neural networkSciences
Training (meteorology)Sciences
Machine learningSciences
PhysicsSciences
MeteorologySciences