NLP-Based Extraction of Bird Morphological Features from Indonesian Texts

Arifin S.R.

doi:10.1109/ISITIA66279.2025.11137485

Share

Export Citation

APA

MLA

Chicago

Harvard

Vancouver

BIBTEX

RIS

Universitas Hasanuddin

Research output:Contribution to journal›Article›peer-review

NLP-Based Extraction of Bird Morphological Features from Indonesian Texts

Arifin S.R.

Proceedings International Seminar on Intelligent Technology and Its Applications Isitia

Published: 2025

Abstract

This study presents a systematic computational approach for extracting, analyzing, and categorizing morphological features of birds from Indonesian language descriptions. The methodology emphasizes empirical validation and statistical rigor, implementing comprehensive quantitative evaluation including cross-validation reliability assessment and statistical significance testing. Using the Indonesian-translated CUB-200-2011 dataset, our analysis of 117,853 descriptions across 200 bird species achieved excellent cross-validation reliability (0.848) and statistical significance (p < 0.001) across all feature categories. Findings reveal statistically validated patterns in Indonesian ornithological terminology, highlighting the prominence of beak morphology (53.50%, 95% CI: [0.532, 0.538]), high-contrast coloration (black 50.93%, white 48.65%), and significant size asymmetry patterns in Indonesian bird descriptions. Feature co-occurrence analysis with statistical validation unveils semantic relationships between anatomical features and their visual characteristics, with “perut putih” (white belly) emerging as the most common combination (9.21%). The framework consists of five main stages: dataset preparation and preprocessing, morphological feature extraction, statistical validation and reliability assessment, feature analysis and categorization, and visualization and database generation. The preprocessing pipeline standardizes Indonesian bird descriptions while maintaining domain-specific terminology, while feature extraction employs context-aware pattern matching optimized for Indonesian language morphology. Statistical validation through 5-fold crossvalidation and chi-square significance testing ensures methodology reliability and reproducibility. This empirically validated approach contributes essential groundwork to biodiversity informatics by providing reliable baseline measurements and linguistic insights that can inform the development of more advanced computational systems for multilingual biodiversity databases.

Access to Document

10.1109/ISITIA66279.2025.11137485

Fingerprint

IndonesianSciences

Computer scienceSciences

Natural language processingSciences

Artificial intelligenceSciences

Feature extractionSciences

Information retrievalSciences

LinguisticsSciences

PhilosophySciences

Share

Export Citation

NLP-Based Extraction of Bird Morphological Features from Indonesian Texts

Abstract

Access to Document

Other files and links

Related Papers

Fingerprint