Share

Export Citation

APA
MLA
Chicago
Harvard
Vancouver
BIBTEX
RIS
Universitas Hasanuddin
Research output:Contribution to journalArticlepeer-review

NLP-Based Extraction of Bird Morphological Features from Indonesian Texts

Arifin S.R.

Proceedings International Seminar on Intelligent Technology and Its Applications Isitia

Published: 2025

Abstract

This study presents a systematic computational approach for extracting, analyzing, and categorizing morphological features of birds from Indonesian language descriptions. The methodology emphasizes empirical validation and statistical rigor, implementing comprehensive quantitative evaluation including cross-validation reliability assessment and statistical significance testing. Using the Indonesian-translated CUB-200-2011 dataset, our analysis of 117,853 descriptions across 200 bird species achieved excellent cross-validation reliability (0.848) and statistical significance (p < 0.001) across all feature categories. Findings reveal statistically validated patterns in Indonesian ornithological terminology, highlighting the prominence of beak morphology (53.50%, 95% CI: [0.532, 0.538]), high-contrast coloration (black 50.93%, white 48.65%), and significant size asymmetry patterns in Indonesian bird descriptions. Feature co-occurrence analysis with statistical validation unveils semantic relationships between anatomical features and their visual characteristics, with “perut putih” (white belly) emerging as the most common combination (9.21%). The framework consists of five main stages: dataset preparation and preprocessing, morphological feature extraction, statistical validation and reliability assessment, feature analysis and categorization, and visualization and database generation. The preprocessing pipeline standardizes Indonesian bird descriptions while maintaining domain-specific terminology, while feature extraction employs context-aware pattern matching optimized for Indonesian language morphology. Statistical validation through 5-fold crossvalidation and chi-square significance testing ensures methodology reliability and reproducibility. This empirically validated approach contributes essential groundwork to biodiversity informatics by providing reliable baseline measurements and linguistic insights that can inform the development of more advanced computational systems for multilingual biodiversity databases.

Other files and links

Fingerprint

IndonesianSciences
Computer scienceSciences
Natural language processingSciences
Artificial intelligenceSciences
Feature extractionSciences
Information retrievalSciences
LinguisticsSciences
PhilosophySciences