# The Anoa-L01 Benchmark: Prompt-Based Zero-Shot Evaluation for Sulawesi's Regional Languages Detection in LLMs
> Yuyun

URL kanonis: https://discover.unhas.ac.id/publications/the-anoa-l01-benchmark-prompt-based-zero-shot-evaluation-for-sulawesis-regional
Jurnal / Konferensi: International Conference on Computer Control Informatics and Its Applications Ic3ina
Tahun terbit: 2025
DOI: https://doi.org/10.1109/IC3INA68387.2025.11325480
ISSN: 29945933
Citations: 0

## Authors
- Yuyun

## Abstract
In recent years, large language models (LLMs) have demonstrated impressive performance in a wide range of tasks of natural language processing. However, their performance on low-resource languages remains largely underexplored. This paper proposes the Language Detection Prompting (LDP) framework, a prompt-based zero-shot strategy designed to identify languages in input text without requiring fine-tuning for each target language. We introduce Anoa, a term we use to refer regional languages spoken in Southern, Western, and Southeastern Sulawesi, Indonesia. To support this effort, we collected a dataset of 13 languages by extracting traditional folktale books from these regions. We evaluate the performance of sevens pretrained LLM models, such as Gemma 7B, LLaMA 2 7B, LLaMA 3.1 8B, and Mistral 7B Instruct, as well as three variants of Gemini: Gemini 1.5 Flash, Gemini 1.5 Pro, and Gemini 2.0 Flash. Two distinct types of prompts were utilized: the first was designed to identify the primary language of a given text, while the second aimed to identify the language names of the provided sentence. We evaluate model predictions by comparing the output of prompt-based inference against the gold standard labels (ground truth). Our experiments show that the Gemini model demonstrates superior zero-shot capabilities in identifying the primary language of texts. Our findings further reveal that the model not only succeeds in language identification but also detects a high degree of linguistic relatedness among the identified languages.

## Keywords
- Computer science
- Natural language processing
- Language identification
- Artificial intelligence
- Language model
- Inference
- Identification (biology)
- Natural language
- Term (time)
- Range (aeronautics)
- Linguistics
- Spoken language
- Natural (archaeology)
- Written language

---
Sumber: Discover Unhas — RIMS Universitas Hasanuddin.
Saat mengutip, gunakan DOI bila tersedia atau URL kanonis di atas.